摘要翻译:
非典型性的目的是从大数据中提取小的、罕见的、不寻常的和有趣的片段。这补充了关于典型数据的统计数据,以洞察数据。为了找到这些“有趣”的部分数据,需要通用的方法,因为我们事先不知道我们在寻找什么。因此,我们把非典型性判据建立在代码的基础上。在以前的一篇论文中,我们发展了离散值数据的方法,并将其扩展到实数值数据。这是通过使用最小描述长度(MDL)来完成的。我们证明了这与离散值情形具有许多相同的理论性质。我们开发了一些“通用”信号处理模型的方法,并最终将其应用于水听器记录数据。
---
英文标题:
《Data Discovery and Anomaly Detection Using Atypicality: Signal
Processing Methods》
---
作者:
Elyas Sabeti, Anders H{\o}st-Madsen
---
最新提交年份:
2017
---
分类信息:
一级分类:Electrical Engineering and Systems Science 电气工程与系统科学
二级分类:Signal Processing 信号处理
分类描述:Theory, algorithms, performance analysis and applications of signal and data analysis, including physical modeling, processing, detection and parameter estimation, learning, mining, retrieval, and information extraction. The term "signal" includes speech, audio, sonar, radar, geophysical, physiological, (bio-) medical, image, video, and multimodal natural and man-made signals, including communication signals and data. Topics of interest include: statistical signal processing, spectral estimation and system identification; filter design, adaptive filtering / stochastic learning; (compressive) sampling, sensing, and transform-domain methods including fast algorithms; signal processing for machine learning and machine learning for signal processing applications; in-network and graph signal processing; convex and nonconvex optimization methods for signal processing applications; radar, sonar, and sensor array beamforming and direction finding; communications signal processing; low power, multi-core and system-on-chip signal processing; sensing, communication, analysis and optimization for cyber-physical systems such as power grids and the Internet of Things.
信号和数据分析的理论、算法、性能分析和应用,包括物理建模、处理、检测和参数估计、学习、挖掘、检索和信息提取。“信号”一词包括语音、音频、声纳、雷达、地球物理、生理、(生物)医学、图像、视频和多模态自然和人为信号,包括通信信号和数据。感兴趣的主题包括:统计信号处理、谱估计和系统辨识;滤波器设计;自适应滤波/随机学习;(压缩)采样、传感和变换域方法,包括快速算法;用于机器学习的信号处理和用于信号处理应用的
机器学习;网络与图形信号处理;信号处理中的凸和非凸优化方法;雷达、声纳和传感器阵列波束形成和测向;通信信号处理;低功耗、多核、片上系统信号处理;信息物理系统的传感、通信、分析和优化,如电网和物联网。
--
一级分类:Computer Science 计算机科学
二级分类:Information Theory 信息论
分类描述:Covers theoretical and experimental aspects of information theory and coding. Includes material in ACM Subject Class E.4 and intersects with H.1.1.
涵盖信息论和编码的理论和实验方面。包括ACM学科类E.4中的材料,并与H.1.1有交集。
--
一级分类:Mathematics 数学
二级分类:Information Theory 信息论
分类描述:math.IT is an alias for cs.IT. Covers theoretical and experimental aspects of information theory and coding.
它是cs.it的别名。涵盖信息论和编码的理论和实验方面。
--
---
英文摘要:
The aim of atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such "interesting" parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we developed the methodology for discrete-valued data, and the the current paper extends this to real-valued data. This is done by using minimum description length (MDL). We show that this shares a number of theoretical properties with the discrete-valued case. We develop the methodology for a number of "universal" signal processing models, and finally apply them to recorded hydrophone data.
---
PDF链接:
https://arxiv.org/pdf/1709.03191