视觉语音识别的分辨率限制

327

收藏 2022-03-08

摘要翻译：
纯视觉语音识别依赖于许多难以控制的因素，例如：照明；身份；动议；情感和表达。但有些因素，如视频分辨率是可控的，因此目前还没有系统的研究分辨率对唇读的影响。在这里，我们使用一个新的数据集Rosetta Raven数据来训练和测试识别器，从而测量视频分辨率对识别精度的影响。我们的结论是，与通常的做法相反，自动唇读的分辨率不需要那么高。然而，当下唇底部和上唇顶部之间的距离在静止状态下小于四个像素时，自动唇读不太可能可靠地工作。
---
英文标题：
《Resolution limits on visual speech recognition》
---
作者：
Helen L. Bear, Richard Harvey, Barry-John Theobald, and Yuxuan Lan
---
最新提交年份：
2017
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Computer Vision and Pattern Recognition 计算机视觉与模式识别
分类描述：Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.
涵盖图像处理、计算机视觉、模式识别和场景理解。大致包括ACM课程I.2.10、I.4和I.5中的材料。
--
一级分类：Electrical Engineering and Systems Science 电气工程与系统科学
二级分类：Image and Video Processing 图像和视频处理
分类描述：Theory, algorithms, and architectures for the formation, capture, processing, communication, analysis, and display of images, video, and multidimensional signals in a wide variety of applications. Topics of interest include: mathematical, statistical, and perceptual image and video modeling and representation; linear and nonlinear filtering, de-blurring, enhancement, restoration, and reconstruction from degraded, low-resolution or tomographic data; lossless and lossy compression and coding; segmentation, alignment, and recognition; image rendering, visualization, and printing; computational imaging, including ultrasound, tomographic and magnetic resonance imaging; and image and video analysis, synthesis, storage, search and retrieval.
用于图像、视频和多维信号的形成、捕获、处理、通信、分析和显示的理论、算法和体系结构。感兴趣的主题包括：数学，统计，和感知图像和视频建模和表示；线性和非线性滤波、去模糊、增强、恢复和重建退化、低分辨率或层析数据；无损和有损压缩编码；分割、对齐和识别；图像渲染、可视化和打印；计算成像，包括超声、断层和磁共振成像；以及图像和视频的分析、合成、存储、搜索和检索。
--

---
英文摘要：
Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression. But some factors, such as video resolution are controllable, so it is surprising that there is not yet a systematic study of the effect of resolution on lip-reading. Here we use a new data set, the Rosetta Raven data, to train and test recognizers so we can measure the affect of video resolution on recognition accuracy. We conclude that, contrary to common practice, resolution need not be that great for automatic lip-reading. However it is highly unlikely that automatic lip-reading can work reliably when the distance between the bottom of the lower lip and the top of the upper lip is less than four pixels at rest.
---
PDF链接：
https://arxiv.org/pdf/1710.01073

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群