数据采集的基本限制：样本间的权衡复杂度和查询难度

388

收藏 2022-03-06

摘要翻译：
我们考虑了基于查询的数据获取和相应的信息恢复问题，其目标是从这些变量的奇偶性度量中恢复$k$二进制变量（信息比特）。利用喷泉码的编码规则设计查询和相应的奇偶校验度量。通过使用喷泉码，我们可以设计潜在的无限次查询和相应的奇偶校验度量，并保证从任何足够大的度量集合中以很高的概率恢复原始的$k$信息比特。在查询设计中，与一个奇偶校验度量相关联的平均信息比特数称为查询难度($\bar{d}$)，为固定$\bar{d}$恢复$k$信息比特所需的最小度量数称为样本复杂性($n$)。我们分析了查询难度和样本复杂度之间的基本折衷，并指出对于某个常数$C>0$，样本复杂度为$n=c\max\{k,(k\logk)/\bar{d}\}$对于以$k\to\infty$的高概率恢复$k$信息位是必要的和足够的。
---
英文标题：
《Fundamental Limits on Data Acquisition: Trade-offs between Sample
Complexity and Query Difficulty》
---
作者：
Hye Won Chung, Ji Oon Lee, Alfred O. Hero
---
最新提交年份：
2018
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Information Theory 信息论
分类描述：Covers theoretical and experimental aspects of information theory and coding. Includes material in ACM Subject Class E.4 and intersects with H.1.1.
涵盖信息论和编码的理论和实验方面。包括ACM学科类E.4中的材料，并与H.1.1有交集。
--
一级分类：Electrical Engineering and Systems Science 电气工程与系统科学
二级分类：Signal Processing 信号处理
分类描述：Theory, algorithms, performance analysis and applications of signal and data analysis, including physical modeling, processing, detection and parameter estimation, learning, mining, retrieval, and information extraction. The term "signal" includes speech, audio, sonar, radar, geophysical, physiological, (bio-) medical, image, video, and multimodal natural and man-made signals, including communication signals and data. Topics of interest include: statistical signal processing, spectral estimation and system identification; filter design, adaptive filtering / stochastic learning; (compressive) sampling, sensing, and transform-domain methods including fast algorithms; signal processing for machine learning and machine learning for signal processing applications; in-network and graph signal processing; convex and nonconvex optimization methods for signal processing applications; radar, sonar, and sensor array beamforming and direction finding; communications signal processing; low power, multi-core and system-on-chip signal processing; sensing, communication, analysis and optimization for cyber-physical systems such as power grids and the Internet of Things.
信号和数据分析的理论、算法、性能分析和应用，包括物理建模、处理、检测和参数估计、学习、挖掘、检索和信息提取。“信号”一词包括语音、音频、声纳、雷达、地球物理、生理、（生物）医学、图像、视频和多模态自然和人为信号，包括通信信号和数据。感兴趣的主题包括：统计信号处理、谱估计和系统辨识；滤波器设计；自适应滤波/随机学习；（压缩）采样、传感和变换域方法，包括快速算法；用于机器学习的信号处理和用于信号处理应用的机器学习；网络与图形信号处理；信号处理中的凸和非凸优化方法；雷达、声纳和传感器阵列波束形成和测向；通信信号处理；低功耗、多核、片上系统信号处理；信息物理系统的传感、通信、分析和优化，如电网和物联网。
--
一级分类：Mathematics 数学
二级分类：Information Theory 信息论
分类描述：math.IT is an alias for cs.IT. Covers theoretical and experimental aspects of information theory and coding.
它是cs.it的别名。涵盖信息论和编码的理论和实验方面。
--

---
英文摘要：
We consider query-based data acquisition and the corresponding information recovery problem, where the goal is to recover $k$ binary variables (information bits) from parity measurements of those variables. The queries and the corresponding parity measurements are designed using the encoding rule of Fountain codes. By using Fountain codes, we can design potentially limitless number of queries, and corresponding parity measurements, and guarantee that the original $k$ information bits can be recovered with high probability from any sufficiently large set of measurements of size $n$. In the query design, the average number of information bits that is associated with one parity measurement is called query difficulty ($\bar{d}$) and the minimum number of measurements required to recover the $k$ information bits for a fixed $\bar{d}$ is called sample complexity ($n$). We analyze the fundamental trade-offs between the query difficulty and the sample complexity, and show that the sample complexity of $n=c\max\{k,(k\log k)/\bar{d}\}$ for some constant $c>0$ is necessary and sufficient to recover $k$ information bits with high probability as $k\to\infty$.
---
PDF链接：
https://arxiv.org/pdf/1712.00157

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群