摘要翻译:
分析像视频这样的时空数据是一项具有挑战性的任务,它要求有效地处理视觉和时间信息。卷积神经网络通过迁移学习作为基线固定特征提取器显示出了希望,迁移学习是一种帮助最小化视觉信息训练成本的技术。时间信息通常使用手工制作的特征或递归神经网络来处理,但这可能过于特定或复杂。建立一个完全可训练的系统,能够有效地分析时空数据,而无需手工制作的特征或复杂的训练,这是一个开放的挑战。我们提出了一种新的神经网络结构来解决这一挑战,卷积漂移网络(CDN)。我们的CDN体系结构结合了深层卷积
神经网络的视觉特征提取能力和储层计算提供的内在高效的时态处理。在这篇关于CDN的介绍性论文中,我们提供了一个非常简单的基线实现,在两个以自我为中心(第一人称)的视频活动数据集上进行了测试,我们获得了与现有方法相当的视频级活动分类结果。值得注意的是,在这个复杂的时空任务上,只训练CDN中的一个前馈层就能产生性能。
---
英文标题:
《Convolutional Drift Networks for Video Classification》
---
作者:
Dillon Graham, Seyed Hamed Fatemi Langroudi, Christopher Kanan, and
Dhireesha Kudithipudi
---
最新提交年份:
2017
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Computer Vision and Pattern Recognition 计算机视觉与模式识别
分类描述:Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.
涵盖图像处理、计算机视觉、模式识别和场景理解。大致包括ACM课程I.2.10、I.4和I.5中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Neural and Evolutionary Computing 神经与进化计算
分类描述:Covers neural networks, connectionism, genetic algorithms, artificial life, adaptive behavior. Roughly includes some material in ACM Subject Class C.1.3, I.2.6, I.5.
涵盖神经网络,连接主义,遗传算法,人工生命,自适应行为。大致包括ACM学科类C.1.3、I.2.6、I.5中的一些材料。
--
一级分类:Electrical Engineering and Systems Science 电气工程与系统科学
二级分类:Image and Video Processing 图像和视频处理
分类描述:Theory, algorithms, and architectures for the formation, capture, processing, communication, analysis, and display of images, video, and multidimensional signals in a wide variety of applications. Topics of interest include: mathematical, statistical, and perceptual image and video modeling and representation; linear and nonlinear filtering, de-blurring, enhancement, restoration, and reconstruction from degraded, low-resolution or tomographic data; lossless and lossy compression and coding; segmentation, alignment, and recognition; image rendering, visualization, and printing; computational imaging, including ultrasound, tomographic and magnetic resonance imaging; and image and video analysis, synthesis, storage, search and retrieval.
用于图像、视频和多维信号的形成、捕获、处理、通信、分析和显示的理论、算法和体系结构。感兴趣的主题包括:数学,统计,和感知图像和视频建模和表示;线性和非线性滤波、去模糊、增强、恢复和重建退化、低分辨率或层析数据;无损和有损压缩编码;分割、对齐和识别;图像渲染、可视化和打印;计算成像,包括超声、断层和磁共振成像;以及图像和视频的分析、合成、存储、搜索和检索。
--
---
英文摘要:
Analyzing spatio-temporal data like video is a challenging task that requires processing visual and temporal information effectively. Convolutional Neural Networks have shown promise as baseline fixed feature extractors through transfer learning, a technique that helps minimize the training cost on visual information. Temporal information is often handled using hand-crafted features or Recurrent Neural Networks, but this can be overly specific or prohibitively complex. Building a fully trainable system that can efficiently analyze spatio-temporal data without hand-crafted features or complex training is an open challenge. We present a new neural network architecture to address this challenge, the Convolutional Drift Network (CDN). Our CDN architecture combines the visual feature extraction power of deep Convolutional Neural Networks with the intrinsically efficient temporal processing provided by Reservoir Computing. In this introductory paper on the CDN, we provide a very simple baseline implementation tested on two egocentric (first-person) video activity datasets.We achieve video-level activity classification results on-par with state-of-the art methods. Notably, performance on this complex spatio-temporal task was produced by only training a single feed-forward layer in the CDN.
---
PDF链接:
https://arxiv.org/pdf/1711.01201