摘要翻译:
本文描述了单独识别单个名字的语言或在用不同语言编写的文档中识别单个名字的语言的实验。一个新的语料库已经编制和提供,根据语言匹配名称。该语料库被用于一系列实验,测量通用语言模型和只使用名字的语言模型在语言识别任务中的性能。通过比较使用通用语言模型和只使用名称的语言模型,以及识别孤立名称的语言和识别非常短的文档片段的语言,得出了结论。展望了今后的研究方向。
---
英文标题:
《What's in a Name?》
---
作者:
Stasinos Konstantopoulos
---
最新提交年份:
2007
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Computation and Language 计算与语言
分类描述:Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.
涵盖自然语言处理。大致包括ACM科目I.2.7类的材料。请注意,人工语言(编程语言、逻辑学、形式系统)的工作,如果没有明确地解决广义的自然语言问题(自然语言处理、计算语言学、语音、文本检索等),就不适合这个领域。
--
一级分类:Computer Science 计算机科学
二级分类:Artificial Intelligence
人工智能
分类描述:Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域,除了视觉、机器人、机器学习、多智能体系统以及计算和语言(自然语言处理),这些领域有独立的学科领域。特别地,包括专家系统,定理证明(尽管这可能与计算机科学中的逻辑重叠),知识表示,规划,和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--
---
英文摘要:
This paper describes experiments on identifying the language of a single name in isolation or in a document written in a different language. A new corpus has been compiled and made available, matching names against languages. This corpus is used in a series of experiments measuring the performance of general language models and names-only language models on the language identification task. Conclusions are drawn from the comparison between using general language models and names-only language models and between identifying the language of isolated names and the language of very short document fragments. Future research directions are outlined.
---
PDF链接:
https://arxiv.org/pdf/0710.1481