摘要翻译:
为了有效,最先进的机器学习技术需要大量的注释数据。在医疗保健领域,有许多引人注目的应用可以从深度学习技术提供的高性能自动决策支持系统中受益,但它们缺乏应用复杂
机器学习模型所需的全面数据资源。此外,由于经济原因,很难证明为这些应用程序创建大型注释语料库是合理的。因此,自动标注技术变得越来越重要。在本研究中,我们研究了使用一种主动学习算法对一个大型脑电语料库进行自动标注的有效性。该算法设计用于标注六种类型的脑电事件。对基于阈值和基于体积的两种模型训练方案进行了评价。基于阈值的方案在初始训练迭代中优化置信度得分的阈值,而基于体积的方案在每次迭代后只保留一定量的数据。识别性能绝对提高了2%,系统能够自动批注以前未标记的数据。鉴于临床脑电数据的解释是一项极其困难的任务,本研究提供了一些证据,表明所提出的方法是昂贵的手工注释的可行替代方案。
---
英文标题:
《Semi-automated Annotation of Signal Events in Clinical EEG Data》
---
作者:
Scott Yang, Silvia Lopez, Meysam Golmohammadi, Iyad Obeid and Joseph
Picone
---
最新提交年份:
2018
---
分类信息:
一级分类:Electrical Engineering and Systems Science 电气工程与系统科学
二级分类:Signal Processing 信号处理
分类描述:Theory, algorithms, performance analysis and applications of signal and data analysis, including physical modeling, processing, detection and parameter estimation, learning, mining, retrieval, and information extraction. The term "signal" includes speech, audio, sonar, radar, geophysical, physiological, (bio-) medical, image, video, and multimodal natural and man-made signals, including communication signals and data. Topics of interest include: statistical signal processing, spectral estimation and system identification; filter design, adaptive filtering / stochastic learning; (compressive) sampling, sensing, and transform-domain methods including fast algorithms; signal processing for machine learning and machine learning for signal processing applications; in-network and graph signal processing; convex and nonconvex optimization methods for signal processing applications; radar, sonar, and sensor array beamforming and direction finding; communications signal processing; low power, multi-core and system-on-chip signal processing; sensing, communication, analysis and optimization for cyber-physical systems such as power grids and the Internet of Things.
信号和数据分析的理论、算法、性能分析和应用,包括物理建模、处理、检测和参数估计、学习、挖掘、检索和信息提取。“信号”一词包括语音、音频、声纳、雷达、地球物理、生理、(生物)医学、图像、视频和多模态自然和人为信号,包括通信信号和数据。感兴趣的主题包括:统计信号处理、谱估计和系统辨识;滤波器设计;自适应滤波/随机学习;(压缩)采样、传感和变换域方法,包括快速算法;用于机器学习的信号处理和用于信号处理应用的机器学习;网络与图形信号处理;信号处理中的凸和非凸优化方法;雷达、声纳和传感器阵列波束形成和测向;通信信号处理;低功耗、多核、片上系统信号处理;信息物理系统的传感、通信、分析和优化,如电网和物联网。
--
一级分类:Computer Science 计算机科学
二级分类:Databases 数据库
分类描述:Covers database management, datamining, and data processing. Roughly includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1.
涵盖数据库管理、
数据挖掘和数据处理。大致包括ACM学科类E.2、E.5、H.0、H.2和J.1中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Machine Learning 机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Statistics 统计学
二级分类:Machine Learning 机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
---
英文摘要:
To be effective, state of the art machine learning technology needs large amounts of annotated data. There are numerous compelling applications in healthcare that can benefit from high performance automated decision support systems provided by deep learning technology, but they lack the comprehensive data resources required to apply sophisticated machine learning models. Further, for economic reasons, it is very difficult to justify the creation of large annotated corpora for these applications. Hence, automated annotation techniques become increasingly important. In this study, we investigated the effectiveness of using an active learning algorithm to automatically annotate a large EEG corpus. The algorithm is designed to annotate six types of EEG events. Two model training schemes, namely threshold-based and volume-based, are evaluated. In the threshold-based scheme the threshold of confidence scores is optimized in the initial training iteration, whereas for the volume-based scheme only a certain amount of data is preserved after each iteration. Recognition performance is improved 2% absolute and the system is capable of automatically annotating previously unlabeled data. Given that the interpretation of clinical EEG data is an exceedingly difficult task, this study provides some evidence that the proposed method is a viable alternative to expensive manual annotation.
---
PDF链接:
https://arxiv.org/pdf/1801.02476