治疗候选的历时文本挖掘研究新冠肺炎

nandehutu2022

376

收藏 2022-03-08

摘要翻译：
历时文本挖掘经常被应用于词义和用法随时间变化的长期语言学调查。在本文中，我们将短时历时文本挖掘应用于CORD-19数据集中捕获的快速增长的关于COVID-19的科学出版物语料库，以识别共现并分析潜在候选治疗的行为。我们使用了橡树岭国家实验室的一项新冠肺炎药物重新用途研究相关的数据集。这项研究确定了现有的候选冠状病毒治疗方法，包括药物和批准的化合物，并根据其阻断新型冠状病毒病毒入侵人类细胞的能力的潜力进行了分析和排名。我们调查了这些候选词在CORD-19语料库时间实例中的出现情况。我们发现，至少有25%的识别词出现在语料库的时间实例中，其频率和上下文动态可以被评估。我们确定了三类行为：频率和上下文转移很小且呈正相关的行为；那些频率和上下文变化之间没有相关性的；和那些频率与语境转移呈负相关的。我们推测，后两种模式表明一个目标候选疗法正在接受积极的评估。我们检测到的模式证明了使用历时文本挖掘技术和大型动态文本语料库来跟踪国际临床和实验室环境中药物再用途活动的潜在好处。
---
英文标题：
《Diachronic Text Mining Investigation of Therapeutic Candidates for
COVID-19》
---
作者：
James Powell, Kari Sentz
---
最新提交年份：
2021
---
分类信息：

一级分类：Computer Science 计算机科学
二级分类：Computation and Language 计算与语言
分类描述：Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.
涵盖自然语言处理。大致包括ACM科目I.2.7类的材料。请注意，人工语言（编程语言、逻辑学、形式系统）的工作，如果没有明确地解决广义的自然语言问题（自然语言处理、计算语言学、语音、文本检索等），就不适合这个领域。
--
一级分类：Computer Science 计算机科学
二级分类：Information Retrieval 信息检索
分类描述：Covers indexing, dictionaries, retrieval, content and analysis. Roughly includes material in ACM Subject Classes H.3.0, H.3.1, H.3.2, H.3.3, and H.3.4.
涵盖索引，字典，检索，内容和分析。大致包括ACM主题课程H.3.0、H.3.1、H.3.2、H.3.3和H.3.4中的材料。
--
一级分类：Quantitative Biology 数量生物学
二级分类：Other Quantitative Biology 其他定量生物学
分类描述：Work in quantitative biology that does not fit into the other q-bio classifications
不适合其他q-bio分类的定量生物学工作
--

---
英文摘要：
Diachronic text mining has frequently been applied to long-term linguistic surveys of word meaning and usage shifts over time. In this paper we apply short-term diachronic text mining to a rapidly growing corpus of scientific publications on COVID-19 captured in the CORD-19 dataset in order to identify co-occurrences and analyze the behavior of potential candidate treatments. We used a data set associated with a COVID-19 drug re-purposing study from Oak Ridge National Laboratory. This study identified existing candidate coronavirus treatments, including drugs and approved compounds, which had been analyzed and ranked according to their potential for blocking the ability of the SARS-COV-2 virus to invade human cells. We investigated the occurrence of these candidates in temporal instances of the CORD-19 corpus. We found that at least 25% of the identified terms occurred in temporal instances of the corpus to the extent that their frequency and contextual dynamics could be evaluated. We identified three classes of behaviors: those where frequency and contextual shifts were small and positively correlated; those where there was no correlation between frequency and contextual changes; and those where there was a negative correlation between frequency and contextual shift. We speculate that the latter two patterns are indicative that a target candidate therapeutics is undergoing active evaluation. The patterns we detected demonstrate the potential benefits of using diachronic text mining techniques with a large dynamic text corpus to track drug-repurposing activities across international clinical and laboratory settings.
---
PDF链接：
https://arxiv.org/pdf/2110.13971

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群