摘要翻译:
理解人的活动和对象的负担是两项非常重要的技能,尤其是对于在人类环境中操作的个人机器人来说。在这项工作中,我们考虑了提取一个描述标记的问题,该标记描述了由人执行的子活动序列,更重要的是,以相关负担的形式描述了它们与对象的相互作用。在给定一个RGB-D视频的情况下,我们将人类活动和物体的承受能力联合建模为一个马尔可夫随机场,其中节点代表物体和子活动,边缘代表物体承受能力之间的关系、它们与子活动之间的关系以及它们随时间的演化。我们使用结构支持向量机(SSVM)方法来描述学习问题,其中在各种交替的时间分段上的标记被认为是潜在变量。我们在一个具有挑战性的数据集上测试了我们的方法,该数据集包括从4个受试者收集的120个活动视频,获得了提供性79.4%、子活动63.4%、高水平活动标记75.0%的准确率。然后我们演示了这种描述性标记在PR2机器人执行辅助任务中的使用。
---
英文标题:
《Learning Human Activities and Object Affordances from RGB-D Videos》
---
作者:
Hema Swetha Koppula, Rudhir Gupta, Ashutosh Saxena
---
最新提交年份:
2013
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Robotics 机器人学
分类描述:Roughly includes material in ACM Subject Class I.2.9.
大致包括ACM科目I.2.9类的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
一级分类:Computer Science 计算机科学
二级分类:Computer Vision and Pattern Recognition 计算机视觉与模式识别
分类描述:Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.
涵盖图像处理、计算机视觉、模式识别和场景理解。大致包括ACM课程I.2.10、I.4和I.5中的材料。
--
---
英文摘要:
Understanding human activities and object affordances are two very important skills, especially for personal robots which operate in human environments. In this work, we consider the problem of extracting a descriptive labeling of the sequence of sub-activities being performed by a human, and more importantly, of their interactions with the objects in the form of associated affordances. Given a RGB-D video, we jointly model the human activities and object affordances as a Markov random field where the nodes represent objects and sub-activities, and the edges represent the relationships between object affordances, their relations with sub-activities, and their evolution over time. We formulate the learning problem using a structural support vector machine (SSVM) approach, where labelings over various alternate temporal segmentations are considered as latent variables. We tested our method on a challenging dataset comprising 120 activity videos collected from 4 subjects, and obtained an accuracy of 79.4% for affordance, 63.4% for sub-activity and 75.0% for high-level activity labeling. We then demonstrate the use of such descriptive labeling in performing assistive tasks by a PR2 robot.
---
PDF链接:
https://arxiv.org/pdf/1210.1207