全部版块 我的主页
论坛 经济学人 二区 外文文献专区
387 0
2022-03-05
摘要翻译:
复杂性度量和机器学习(ML)模型已经被用来分析22条人类染色体中每条染色体中的片段基因组实体的长度,如:外显子、内含子、基因间和重复/唯一DNA序列。研究的目的是评估这些序列的大小分布中可能隐藏的信息和顺序。为此,我们开发了一种创新的集成方法。我们的分析是基于重构相空间定理、Tsallis的非扩展统计理论、ML技术和一个新的技术指标,综合生成的信息,我们引入了复杂性因子(COFA)。DNA序列具有低维、确定性、非线性、混沌性和非广泛性的统计特征,具有很强的多重分形特征和长程相关性,每个基因组实体和每个染色体都有显著的变异。分析结果揭示了每个基因组实体和染色体的复杂性行为随单个基因组片段大小分布的变化。内含子区长度比外显子区长度在所有指标上表现出更大的复杂性行为,具有更长的范围相关性和更强的记忆效应,对所有染色体来说。从我们的分析中我们得出结论,染色体内基因组区域的大小分布不是随机的,而是遵循一种特定的模式,具有特征性特征,这里通过它的复杂性特征可以看出,根据复杂性理论,它是整个基因组动力学的一部分。这张图片显示了DNA中冗余信息的动态变化,从用于聚类、分类和预测的ML工具中识别出来。
---
英文标题:
《Information and order of genomic sequences within chromosomes as
  identified by complexity theory. An integrated methodology》
---
作者:
L. P. Karakatsanis, E. G. Pavlos, G. Tsoulouhas, G. L. Stamokostas, T.
  L. Mosbruger, J. L. Duke, G. P. Pavlos, and D. S. Monos
---
最新提交年份:
2020
---
分类信息:

一级分类:Physics        物理学
二级分类:Biological Physics        生物物理学
分类描述:Molecular biophysics, cellular biophysics, neurological biophysics, membrane biophysics, single-molecule biophysics, ecological biophysics, quantum phenomena in biological systems (quantum biophysics), theoretical biophysics, molecular dynamics/modeling and simulation, game theory, biomechanics, bioinformatics, microorganisms, virology, evolution, biophysical methods.
分子生物物理、细胞生物物理、神经生物物理、膜生物物理、单分子生物物理、生态生物物理、生物系统中的量子现象(量子生物物理)、理论生物物理、分子动力学/建模与模拟、博弈论、生物力学、生物信息学、微生物、病毒学、进化论、生物物理方法。
--
一级分类:Quantitative Biology        数量生物学
二级分类:Other Quantitative Biology        其他定量生物学
分类描述:Work in quantitative biology that does not fit into the other q-bio classifications
不适合其他q-bio分类的定量生物学工作
--

---
英文摘要:
  Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities like: exons, introns, intergenic and repeat/unique DNA sequences, in each of the 22 human chromosomes. The purpose of the study was to assess information and order that may be concealed within the size distribution of these sequences. For this purpose, we developed an innovative integrated methodology. Our analysis is based upon the reconstructed phase space theorem, the non-extensive statistical theory of Tsallis, ML techniques and a new technical index, integrating the generated information, which we introduce and named it Complexity Factor (COFA). The low-dimensional deterministic nonlinear chaotic and non-extensive statistical character of the DNA sequences was verified with strong multifractal characteristics and long-range correlations with significant variations per genomic entity and per chromosome. The results of the analysis reveal changes in complexity behavior per genomic entity and chromosome regarding the size distribution of individual genomic segment. The lengths of intron regions show greater complexity behavior in all metrics than the exonic ones, with longer range correlations, and stronger memory effects, for all chromosomes. We conclude from our analysis, that the size distribution of the genomic regions within chromosomes, are not random, but follow a specific pattern with characteristic features, that have been seen here through its complexity character, and it is part of the dynamics of the whole genome according to complexity theory. This picture of dynamics of the redundancy of information in DNA recognized from ML tools for clustering, classification and prediction.
---
PDF链接:
https://arxiv.org/pdf/2004.11287
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群