基于单语语料库的含噪句子校正

nandehutu2022

258

收藏 2022-03-08

摘要翻译：
噪声自然语言文本的校正是自然语言处理中一个重要的研究课题。它在统计机器翻译、第二语言学习和自然语言生成等领域有着广泛的应用。在这项工作中，我们考虑了一些用于文本校正的统计技术。我们定义了文本中常见的错误类别，并描述了纠正它们的算法。这些数据来自一个训练不足的机器翻译系统。这些算法只使用目标语言中的一个语言模型来纠正句子。在这两种算法中，我们都使用了基于短语的校正方法。这些短语被替换和组合，给我们最后更正的句子。我们还给出了不同类型错误的建模方法，以及这些算法在测试集上的工作结果。我们证明了其中一种方法不能达到预期的目标，而另一种方法则取得了很好的成功。最后，我们分析了造成这一趋势的可能原因。
---
英文标题：
《Correction of Noisy Sentences using a Monolingual Corpus》
---
作者：
Diptesh Chatterhee
---
最新提交年份：
2011
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Digital Libraries 数字图书馆
分类描述：Covers all aspects of the digital library design and document and text creation. Note that there will be some overlap with Information Retrieval (which is a separate subject area). Roughly includes material in ACM Subject Classes H.3.5, H.3.6, H.3.7, I.7.
涵盖了数字图书馆设计和文献及文本创作的各个方面。注意，与信息检索（这是一个单独的主题领域）会有一些重叠。大致包括ACM课程H.3.5、H.3.6、H.3.7、I.7中的材料。
--
一级分类：Computer Science 计算机科学
二级分类：Artificial Intelligence 人工智能
分类描述：Covers all areas of AI except Vision, Robotics, Machine Learning, Multiagent Systems, and Computation and Language (Natural Language Processing), which have separate subject areas. In particular, includes Expert Systems, Theorem Proving (although this may overlap with Logic in Computer Science), Knowledge Representation, Planning, and Uncertainty in AI. Roughly includes material in ACM Subject Classes I.2.0, I.2.1, I.2.3, I.2.4, I.2.8, and I.2.11.
涵盖了人工智能的所有领域，除了视觉、机器人、机器学习、多智能体系统以及计算和语言（自然语言处理），这些领域有独立的学科领域。特别地，包括专家系统，定理证明（尽管这可能与计算机科学中的逻辑重叠），知识表示，规划，和人工智能中的不确定性。大致包括ACM学科类I.2.0、I.2.1、I.2.3、I.2.4、I.2.8和I.2.11中的材料。
--

---
英文摘要：
Correction of Noisy Natural Language Text is an important and well studied problem in Natural Language Processing. It has a number of applications in domains like Statistical Machine Translation, Second Language Learning and Natural Language Generation. In this work, we consider some statistical techniques for Text Correction. We define the classes of errors commonly found in text and describe algorithms to correct them. The data has been taken from a poorly trained Machine Translation system. The algorithms use only a language model in the target language in order to correct the sentences. We use phrase based correction methods in both the algorithms. The phrases are replaced and combined to give us the final corrected sentence. We also present the methods to model different kinds of errors, in addition to results of the working of the algorithms on the test set. We show that one of the approaches fail to achieve the desired goal, whereas the other succeeds well. In the end, we analyze the possible reasons for such a trend in performance.
---
PDF链接：
https://arxiv.org/pdf/1105.4318

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群