摘要翻译:
将词表示为向量的技术在计算语言学的许多应用中被证明是有用的,然而目前还没有一种通用的语义形式来表示向量的意义。我们提出了一个自然语言语义学框架,其中词、短语和句子都表示为向量,基于一个假设意义由上下文决定的理论分析。在理论分析中,我们将语料库模型定义为文本语料库的数学抽象。在语料库模型中,一串词的意义被假定为表示其所处上下文的向量。基于这个假设,我们可以证明词的向量表示可以看作域上代数的元素。我们注意到,在向量空间表示词义的应用中,存在一个潜在的格结构;我们把格的偏序解释为描述意义之间的蕴涵。我们还定义了一个字符串的上下文理论概率,并根据这个概率和格结构,定义了字符串之间的蕴涵度。我们将该框架与现有的基于向量的意义表示的合成方法联系起来,并表明我们的方法推广了其中的许多方法,包括向量加法、分量乘法和张量积。
---
英文标题:
《A Context-theoretic Framework for Compositionality in Distributional
Semantics》
---
作者:
Daoud Clarke
---
最新提交年份:
2011
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Computation and Language 计算与语言
分类描述:Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.
涵盖自然语言处理。大致包括ACM科目I.2.7类的材料。请注意,人工语言(编程语言、逻辑学、形式系统)的工作,如果没有明确地解决广义的自然语言问题(自然语言处理、计算语言学、语音、文本检索等),就不适合这个领域。
--
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
Techniques in which words are represented as vectors have proved useful in many applications in computational linguistics, however there is currently no general semantic formalism for representing meaning in terms of vectors. We present a framework for natural language semantics in which words, phrases and sentences are all represented as vectors, based on a theoretical analysis which assumes that meaning is determined by context. In the theoretical analysis, we define a corpus model as a mathematical abstraction of a text corpus. The meaning of a string of words is assumed to be a vector representing the contexts in which it occurs in the corpus model. Based on this assumption, we can show that the vector representations of words can be considered as elements of an algebra over a field. We note that in applications of vector spaces to representing meanings of words there is an underlying lattice structure; we interpret the partial ordering of the lattice as describing entailment between meanings. We also define the context-theoretic probability of a string, and, based on this and the lattice structure, a degree of entailment between strings. We relate the framework to existing methods of composing vector-based representations of meaning, and show that our approach generalises many of these, including vector addition, component-wise multiplication, and the tensor product.
---
PDF链接:
https://arxiv.org/pdf/1101.4479