摘要翻译:
传统的以对齐为基础的系统发生学在揭示新型冠状病毒的进化轨迹方面面临挑战。这项研究开发了一个新的无比对系统,并从100多万个基因组序列中揭示了新型冠状病毒氏病的进化轨迹。该系统包含Fr和人工递归神经网络。Fr计算变异体与参考基因组之间的差异,将变异体分解为84个特征(4个单核苷酸,16个二核苷酸和64个密码子)。递归
神经网络对时间序列Fr轨迹进行预测和预测,推断新型冠状病毒进化轨迹和起源。在新冠肺炎疫情期间,新型冠状病毒病毒基因组通过缺失迅速变异。单核苷酸中,C突变快而T变化慢。C-前缀二核苷酸(如CG和CT)在进化过程中也会急剧丢失。类似地,病毒基因组在进化过程中也会删除几个以C为前缀的密码子(如CCT),但会获得几个T和A为前缀的密码子(如TTA和ATT)。有趣的是,密码子CCT和CT集中控制着整个新型冠状病毒基因组,它们的进化轨迹符合新冠肺炎病例。因此,C前缀特征轨迹标志着新型冠状病毒进化。本研究进一步鉴定了34个SARS-Co-2变异体,将其分为3组,轻度突变组、中度缺失组和高度缺失组。轻度缺失组和高度缺失组感染能力较低。中间缺失组以一定的节律轨迹逐渐删除他们的基因组,对应于大流行高峰,这造成了全球大部分新冠肺炎病例。貂是SARS-Co-2的起源,起源路径遵循这样的顺序:貂、猫、虎、鼠、蝙蝠、穿山甲。这种水貂来源的SARS-Co-2随着C驱动的节律缺失进化而感染人类。
---
英文标题:
《Evolutionary trajectory and origin of SARS-CoV-2》
---
作者:
Anyou Wang
---
最新提交年份:
2021
---
分类信息:
一级分类:Quantitative Biology 数量生物学
二级分类:Other Quantitative Biology 其他定量生物学
分类描述:Work in quantitative biology that does not fit into the other q-bio classifications
不适合其他q-bio分类的定量生物学工作
--
---
英文摘要:
Traditionally alignment-based phylogenetics faces challenges to uncover the evolutionary trajectory of SARS-CoV-2. This study develops a novel alignment-free system and reveals the evolutionary trajectory of SARS-CoV-2 from more than one million of genome sequences. This new system contains Fr\'echet distance(Fr) and artificial recurrent neural network. Fr computes the dissimilarity between variant and reference genome, which is decomposed into 84 features (4 single nucleotides, 16 dinucleotides and 64 codons). Recurrent neural network predicts and forecasts time-series Fr trajectory, inferring SARS-CoV-2 evolutionary trajectory and origin. Generally SARS-CoV-2 genome mutates rapidly via deletion during COVID-19 pandemic. Among single nucleotides, C mutates fast but T changes slowly. C-prefix dinucleotide (e.g. CG and CT) also loses dramatically during evolution. Similarly, the virus genome also deletes several codons prefixed by C (e.g. CCT) but gains several T and A prefix codons (e.g. TTA and ATT) during its evolution. Interestingly, codon CCT and CT centrally control the entire SARS-CoV-2 genome, and their evolutionary trajectories fit COVID-19 cases spike. Therefore C-prefix feature trajectory marks SARS-CoV-2 evolution. This study further identifies total 34 SARS-Co-2 variants, which can be classified into 3 groups, slight mutation group, middle level deletion, and high deletion. The slight deletion group and the high deletion group have low infection capacity. The middle deletion group gradually deletes their genome with a certain rhythm trajectory, corresponding to the pandemic peaks, which causes most of the global COVID-19 cases. Mink is the origin of SARS-Co-2, and the origin path follows this order: mink, cat, tiger, mouse, bat and pangolin. Together, this mink-origin SARS-Co-2 evolves with C-driven rhythm deletions to infect humans.
---
PDF链接:
https://arxiv.org/pdf/2110.07696