摘要翻译:
我们研究了两个密切相关的非参数假设检验问题。在第一个问题(即存在性问题)中,我们测试一个测试数据流是否由一组复合分布中的一个生成。在第二个问题(即关联问题)中,我们测试多个发行版中哪一个生成测试数据流。我们假设集合中的一些分布是未知的,只有由相应的分布生成的训练序列是可用的。对于这两个问题,我们构造了广义似然(GL)检验,并刻画了最大误差概率的误差指数。对于存在性问题,我们证明了误差指数主要是由复合分布和交替分布集合之间的Chernoff信息捕获的。对于关联问题,我们证明了误差指数是由每对分布之间的最小Chernoff信息以及近似分布(通过训练序列)与真实分布之间的KL差所捕获的。我们还证明了训练序列和测试序列长度之比在决定误差衰减率方面起着重要的作用。
---
英文标题:
《Data-Driven Nonparametric Existence and Association Problems》
---
作者:
Yixian Liu, Yingbin Liang and Shuguang Cui
---
最新提交年份:
2017
---
分类信息:
一级分类:Electrical Engineering and Systems Science 电气工程与系统科学
二级分类:Signal Processing 信号处理
分类描述:Theory, algorithms, performance analysis and applications of signal and data analysis, including physical modeling, processing, detection and parameter estimation, learning, mining, retrieval, and information extraction. The term "signal" includes speech, audio, sonar, radar, geophysical, physiological, (bio-) medical, image, video, and multimodal natural and man-made signals, including communication signals and data. Topics of interest include: statistical signal processing, spectral estimation and system identification; filter design, adaptive filtering / stochastic learning; (compressive) sampling, sensing, and transform-domain methods including fast algorithms; signal processing for machine learning and machine learning for signal processing applications; in-network and graph signal processing; convex and nonconvex optimization methods for signal processing applications; radar, sonar, and sensor array beamforming and direction finding; communications signal processing; low power, multi-core and system-on-chip signal processing; sensing, communication, analysis and optimization for cyber-physical systems such as power grids and the Internet of Things.
信号和数据分析的理论、算法、性能分析和应用,包括物理建模、处理、检测和参数估计、学习、挖掘、检索和信息提取。“信号”一词包括语音、音频、声纳、雷达、地球物理、生理、(生物)医学、图像、视频和多模态自然和人为信号,包括通信信号和数据。感兴趣的主题包括:统计信号处理、谱估计和系统辨识;滤波器设计;自适应滤波/随机学习;(压缩)采样、传感和变换域方法,包括快速算法;用于机器学习的信号处理和用于信号处理应用的
机器学习;网络与图形信号处理;信号处理中的凸和非凸优化方法;雷达、声纳和传感器阵列波束形成和测向;通信信号处理;低功耗、多核、片上系统信号处理;信息物理系统的传感、通信、分析和优化,如电网和物联网。
--
---
英文摘要:
We investigate two closely related nonparametric hypothesis testing problems. In the first problem (i.e., the existence problem), we test whether a testing data stream is generated by one of a set of composite distributions. In the second problem (i.e., the association problem), we test which one of the multiple distributions generates a testing data stream. We assume that some distributions in the set are unknown with only training sequences generated by the corresponding distributions are available. For both problems, we construct the generalized likelihood (GL) tests, and characterize the error exponents of the maximum error probabilities. For the existence problem, we show that the error exponent is mainly captured by the Chernoff information between the set of composite distributions and alternative distributions. For the association problem, we show that the error exponent is captured by the minimum Chernoff information between each pair of distributions as well as the KL divergences between the approximated distributions (via training sequences) and the true distributions. We also show that the ratio between the lengths of training and testing sequences plays an important role in determining the error decay rate.
---
PDF链接:
https://arxiv.org/pdf/1711.0842