全部版块 我的主页
论坛 经济学人 二区 外文文献专区
372 0
2022-03-07
摘要翻译:
动机:生物文献是一个主要的知识库。许多生物数据库从对这些文献的仔细整理中提取了许多内容。然而,随着文献量的增加,策展的负担也随之增加。文本挖掘可以提供有用的工具来帮助策展过程。迄今为止,由于缺乏标准,无法确定文本挖掘技术是否足够成熟,是否有用。结果:我们报告了我们为知识发现和数据挖掘(KDD)挑战杯创建的挑战评估任务。我们提供了一个862篇文章的训练语料库,其中包括在FlyBase中策划的期刊文章,以及相关的基因和基因产品列表,以及来自FlyBase的相关数据字段。为了测试,我们提供了一个213个新(`盲')文章的语料库;18个参与小组提供了标记文章的系统,以文章是否包含基因表达产物的实验证据为基础。我们报告了评估结果,并描述了顶级表现组所使用的技术。联系人:asy@mitre.org关键词:文本挖掘、评估、策展、基因组学、数据管理
---
英文标题:
《Evaluation of text data mining for database curation: lessons learned
  from the KDD Challenge Cup》
---
作者:
Alexander S. Yeh, Lynette Hirschman, Alexander A. Morgan
---
最新提交年份:
2003
---
分类信息:

一级分类:Computer Science        计算机科学
二级分类:Computation and Language        计算与语言
分类描述:Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.
涵盖自然语言处理。大致包括ACM科目I.2.7类的材料。请注意,人工语言(编程语言、逻辑学、形式系统)的工作,如果没有明确地解决广义的自然语言问题(自然语言处理、计算语言学、语音、文本检索等),就不适合这个领域。
--
一级分类:Quantitative Biology        数量生物学
二级分类:Other Quantitative Biology        其他定量生物学
分类描述:Work in quantitative biology that does not fit into the other q-bio classifications
不适合其他q-bio分类的定量生物学工作
--

---
英文摘要:
  MOTIVATION: The biological literature is a major repository of knowledge. Many biological databases draw much of their content from a careful curation of this literature. However, as the volume of literature increases, the burden of curation increases. Text mining may provide useful tools to assist in the curation process. To date, the lack of standards has made it impossible to determine whether text mining techniques are sufficiently mature to be useful.   RESULTS: We report on a Challenge Evaluation task that we created for the Knowledge Discovery and Data Mining (KDD) Challenge Cup. We provided a training corpus of 862 articles consisting of journal articles curated in FlyBase, along with the associated lists of genes and gene products, as well as the relevant data fields from FlyBase. For the test, we provided a corpus of 213 new (`blind') articles; the 18 participating groups provided systems that flagged articles for curation, based on whether the article contained experimental evidence for gene expression products. We report on the the evaluation results and describe the techniques used by the top performing groups.   CONTACT: asy@mitre.org   KEYWORDS: text mining, evaluation, curation, genomics, data management
---
PDF链接:
https://arxiv.org/pdf/cs/0308032
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群