全部版块 我的主页
论坛 经济学人 二区 外文文献专区
282 0
2022-03-15
摘要翻译:
寻求将语言和副语言功能与韵律形式联系起来的语调综合生成模式一直是言语交际研究的一个长期挑战。传统的语调模型已经被深度学习(DL)技术的压倒性性能所取代,该技术使用数百万个可调参数来训练通用的端到端映射。然而,向黑匣子机器学习模型的转变提出了相反的问题--发现知识、解释、可视化和解释的迫切需求。我们的工作在语调的综合生成模型和最先进的DL技术之间架起了桥梁。我们建立在功能轮廓叠加模型(SFC)模型的基础上,提出了一种变分韵律模型(VPM),该模型使用变分轮廓生成器网络来捕捉构成的基本韵律轮廓的上下文敏感变化。我们发现VPM可以通过学习一个有意义的韵律潜在空间表征结构来洞察这些韵律原型的内在变异性。我们还表明VPM能够捕捉具有多个维度的基于上下文的变异性的韵律现象。由于VPM是基于叠加原理的,因此不需要使用精心制作的语料库进行分析,从而为使用大数据进行韵律分析开辟了可能性。在语音合成场景中,该模型可以用来生成一个动态的、自然的韵律轮廓,该韵律轮廓没有平均效应。
---
英文标题:
《A Variational Prosody Model for Mapping the Context-Sensitive Variation
  of Functional Prosodic Prototypes》
---
作者:
Branislav Gerazov, G\'erard Bailly, Omar Mohammed, Yi Xu, and Philip
  N. Garner
---
最新提交年份:
2019
---
分类信息:

一级分类:Electrical Engineering and Systems Science        电气工程与系统科学
二级分类:Audio and Speech Processing        音频和语音处理
分类描述:Theory and methods for processing signals representing audio, speech, and language, and their applications. This includes analysis, synthesis, enhancement, transformation, classification and interpretation of such signals as well as the design, development, and evaluation of associated signal processing systems. Machine learning and pattern analysis applied to any of the above areas is also welcome.  Specific topics of interest include: auditory modeling and hearing aids; acoustic beamforming and source localization; classification of acoustic scenes; speaker separation; active noise control and echo cancellation; enhancement; de-reverberation; bioacoustics; music signals analysis, synthesis and modification; music information retrieval;  audio for multimedia and joint audio-video processing; spoken and written language modeling, segmentation, tagging, parsing, understanding, and translation; text mining; speech production, perception, and psychoacoustics; speech analysis, synthesis, and perceptual modeling and coding; robust speech recognition; speaker recognition and characterization; deep learning, online learning, and graphical models applied to speech, audio, and language signals; and implementation aspects ranging from system architecture to fast algorithms.
处理代表音频、语音和语言的信号的理论和方法及其应用。这包括分析、合成、增强、转换、分类和解释这些信号,以及相关信号处理系统的设计、开发和评估。机器学习和模式分析应用于上述任何领域也是受欢迎的。感兴趣的具体主题包括:听觉建模和助听器;声波束形成与声源定位;声场景分类;说话人分离;有源噪声控制和回声消除;增强;去混响;生物声学;音乐信号的分析、合成与修饰;音乐信息检索;多媒体音频和联合音视频处理;口语和书面语建模、切分、标注、句法分析、理解和翻译;文本挖掘;言语产生、感知和心理声学;语音分析、合成、感知建模和编码;鲁棒语音识别;说话人识别与特征描述;应用于语音、音频和语言信号的深度学习、在线学习和图形模型;以及从系统架构到快速算法的实现方面。
--
一级分类:Computer Science        计算机科学
二级分类:Sound        声音
分类描述:Covers all aspects of computing with sound, and sound as an information channel. Includes models of sound, analysis and synthesis, audio user interfaces, sonification of data, computer music, and sound signal processing. Includes ACM Subject Class H.5.5, and intersects with H.1.2, H.5.1, H.5.2, I.2.7, I.5.4, I.6.3, J.5, K.4.2.
涵盖了声音计算的各个方面,以及声音作为一种信息通道。包括声音模型、分析和合成、音频用户界面、数据的可听化、计算机音乐和声音信号处理。包括ACM学科类H.5.5,并与H.1.2、H.5.1、H.5.2、I.2.7、I.5.4、I.6.3、J.5、K.4.2交叉。
--

---
英文摘要:
  The quest for comprehensive generative models of intonation that link linguistic and paralinguistic functions to prosodic forms has been a longstanding challenge of speech communication research. Traditional intonation models have given way to the overwhelming performance of deep learning (DL) techniques for training general purpose end-to-end mappings using millions of tunable parameters. The shift towards black box machine learning models has nonetheless posed the reverse problem -- a compelling need to discover knowledge, to explain, visualise and interpret. Our work bridges between a comprehensive generative model of intonation and state-of-the-art DL techniques. We build upon the modelling paradigm of the Superposition of Functional Contours (SFC) model and propose a Variational Prosody Model (VPM) that uses a network of variational contour generators to capture the context-sensitive variation of the constituent elementary prosodic contours. We show that the VPM can give insight into the intrinsic variability of these prosodic prototypes through learning a meaningful prosodic latent space representation structure. We also show that the VPM is able to capture prosodic phenomena that have multiple dimensions of context based variability. Since it is based on the principle of superposition, the VPM does not necessitate the use of specially crafted corpora for the analysis, opening up the possibilities of using big data for prosody analysis. In a speech synthesis scenario, the model can be used to generate a dynamic and natural prosody contour that is devoid of averaging effects.
---
PDF链接:
https://arxiv.org/pdf/1806.08685
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群