360TextDegree{}视频的场景感知音频

nandehutu2022

311

收藏 2022-03-08

摘要翻译：
尽管360\TextDegree{}摄像机简化了全景镜头的捕获，但添加真实的360\TextDegree{}音频仍然具有挑战性，该音频融合到捕获的场景中，并与摄像机运动同步。本文提出了一种在典型的室内场景中为360度视频添加场景感知空间音频的方法，该方法仅使用传统的单声道麦克风和扬声器。我们观察到房间脉冲响应的后期混响通常在空间和方向上是扩散的。利用这一事实，我们提出了一种方法，通过将合成的早期混响部分和测量的后期混响尾部相结合来合成任意源和收听位置之间的方向冲激响应。早期混响用几何声学模拟来模拟，然后用频率调制方法来增强以捕捉房间谐振。从记录的脉冲响应中提取后期混响，并仔细选择将后期混响从早期混响中分离出来的持续时间。在我们的验证中，我们表明我们合成的空间音频与使用ambisonic麦克风的录音非常匹配。最后，我们在几个应用中证明了该方法的有效性。
---
英文标题：
《Scene-Aware Audio for 360\textdegree{} Videos》
---
作者：
Dingzeyu Li and Timothy R. Langlois and Changxi Zheng
---
最新提交年份：
2018
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Graphics 图形学
分类描述：Covers all aspects of computer graphics. Roughly includes material in all of ACM Subject Class I.3, except that I.3.5 is is likely to have Computational Geometry as the primary subject area.
涵盖了计算机图形学的各个方面。大致包括所有ACM课程I.3的材料，除了I.3.5可能有计算几何作为主要的学科领域。
--
一级分类：Computer Science 计算机科学
二级分类：Computer Vision and Pattern Recognition 计算机视觉与模式识别
分类描述：Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.
涵盖图像处理、计算机视觉、模式识别和场景理解。大致包括ACM课程I.2.10、I.4和I.5中的材料。
--
一级分类：Computer Science 计算机科学
二级分类：Emerging Technologies 新兴技术
分类描述：Covers approaches to information processing (computing, communication, sensing) and bio-chemical analysis based on alternatives to silicon CMOS-based technologies, such as nanoscale electronic, photonic, spin-based, superconducting, mechanical, bio-chemical and quantum technologies (this list is not exclusive). Topics of interest include (1) building blocks for emerging technologies, their scalability and adoption in larger systems, including integration with traditional technologies, (2) modeling, design and optimization of novel devices and systems, (3) models of computation, algorithm design and programming for emerging technologies.
涵盖基于硅CMOS技术替代品的信息处理（计算、通信、传感）和生物化学分析方法，如纳米级电子、光子、自旋、超导、机械、生物化学和量子技术（此列表不是唯一的）。感兴趣的主题包括：（1）新兴技术的构建块、其可伸缩性和在大型系统中的采用，包括与传统技术的集成；（2）新型设备和系统的建模、设计和优化；（3）新兴技术的计算模型、算法设计和编程。
--
一级分类：Computer Science 计算机科学
二级分类：Sound 声音
分类描述：Covers all aspects of computing with sound, and sound as an information channel. Includes models of sound, analysis and synthesis, audio user interfaces, sonification of data, computer music, and sound signal processing. Includes ACM Subject Class H.5.5, and intersects with H.1.2, H.5.1, H.5.2, I.2.7, I.5.4, I.6.3, J.5, K.4.2.
涵盖了声音计算的各个方面，以及声音作为一种信息通道。包括声音模型、分析和合成、音频用户界面、数据的可听化、计算机音乐和声音信号处理。包括ACM学科类H.5.5，并与H.1.2、H.5.1、H.5.2、I.2.7、I.5.4、I.6.3、J.5、K.4.2交叉。
--
一级分类：Electrical Engineering and Systems Science 电气工程与系统科学
二级分类：Audio and Speech Processing 音频和语音处理
分类描述：Theory and methods for processing signals representing audio, speech, and language, and their applications. This includes analysis, synthesis, enhancement, transformation, classification and interpretation of such signals as well as the design, development, and evaluation of associated signal processing systems. Machine learning and pattern analysis applied to any of the above areas is also welcome. Specific topics of interest include: auditory modeling and hearing aids; acoustic beamforming and source localization; classification of acoustic scenes; speaker separation; active noise control and echo cancellation; enhancement; de-reverberation; bioacoustics; music signals analysis, synthesis and modification; music information retrieval; audio for multimedia and joint audio-video processing; spoken and written language modeling, segmentation, tagging, parsing, understanding, and translation; text mining; speech production, perception, and psychoacoustics; speech analysis, synthesis, and perceptual modeling and coding; robust speech recognition; speaker recognition and characterization; deep learning, online learning, and graphical models applied to speech, audio, and language signals; and implementation aspects ranging from system architecture to fast algorithms.
处理代表音频、语音和语言的信号的理论和方法及其应用。这包括分析、合成、增强、转换、分类和解释这些信号，以及相关信号处理系统的设计、开发和评估。机器学习和模式分析应用于上述任何领域也是受欢迎的。感兴趣的具体主题包括：听觉建模和助听器；声波束形成与声源定位；声场景分类；说话人分离；有源噪声控制和回声消除；增强；去混响；生物声学；音乐信号的分析、合成与修饰；音乐信息检索；多媒体音频和联合音视频处理；口语和书面语建模、切分、标注、句法分析、理解和翻译；文本挖掘；言语产生、感知和心理声学；语音分析、合成、感知建模和编码；鲁棒语音识别；说话人识别与特征描述；应用于语音、音频和语言信号的深度学习、在线学习和图形模型；以及从系统架构到快速算法的实现方面。
--

---
英文摘要：
Although 360\textdegree{} cameras ease the capture of panoramic footage, it remains challenging to add realistic 360\textdegree{} audio that blends into the captured scene and is synchronized with the camera motion. We present a method for adding scene-aware spatial audio to 360\textdegree{} videos in typical indoor scenes, using only a conventional mono-channel microphone and a speaker. We observe that the late reverberation of a room's impulse response is usually diffuse spatially and directionally. Exploiting this fact, we propose a method that synthesizes the directional impulse response between any source and listening locations by combining a synthesized early reverberation part and a measured late reverberation tail. The early reverberation is simulated using a geometric acoustic simulation and then enhanced using a frequency modulation method to capture room resonances. The late reverberation is extracted from a recorded impulse response, with a carefully chosen time duration that separates out the late reverberation from the early reverberation. In our validations, we show that our synthesized spatial audio matches closely with recordings using ambisonic microphones. Lastly, we demonstrate the strength of our method in several applications.
---
PDF链接：
https://arxiv.org/pdf/1805.04792

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群