全部版块 我的主页
论坛 经济学人 二区 外文文献专区
239 0
2022-03-07
摘要翻译:
本文提供了一个从人类基因组中对序列进行统计建模的框架,它允许一个制剂来合成基因序列。我们首先通过哈夫曼编码将基因组的字母序列转换为十进制序列。然后,这个十进制序列被HP滤波器分解成趋势和循环两个分量。然后,对具有异方差的趋势分量进行统计建模ARIMA-GARCH,自回归积分滑动平均(ARIMA)来捕捉序列的线性特征,然后将广义自回归条件异方差(GARCH)用于基因组序列的统计非线性。这种建模方法综合给定基因组序列的统计特征。最后,利用高斯混合模型估计给定序列的PDF,并根据估计的PDF确定一个新的PDF呈现序列,该序列在统计上抵消原始序列。我们的策略在几个基因和HIV核苷酸序列上执行,并给出了相应的结果。
---
英文标题:
《A New Framework For Spatial Modeling And Synthesis of Genome Sequence》
---
作者:
Salman Mohamadi, Farhang Yeganegi, Hamidreza Amindavar
---
最新提交年份:
2019
---
分类信息:

一级分类:Quantitative Biology        数量生物学
二级分类:Other Quantitative Biology        其他定量生物学
分类描述:Work in quantitative biology that does not fit into the other q-bio classifications
不适合其他q-bio分类的定量生物学工作
--

---
英文摘要:
  This paper provides a framework in order to statistically model sequences from human genome, which is allowing a formulation to synthesize gene sequences. We start by converting the alphabetic sequence of genome to decimal sequence by Huffman coding. Then, this decimal sequence is decomposed by HP filter into two components, trend and cyclic. Next, a statistical modeling, ARIMA-GARCH, is implemented on trend component exhibiting heteroskedasticity, autoregressive integrated moving average (ARIMA) to capture the linear characteristics of the sequence and later, generalized autoregressive conditional heteroskedasticity (GARCH) is then appropriated for the statistical nonlinearity of genome sequence. This modeling approach synthesizes a given genome sequence regarding to its statistical features. Finally, the PDF of a given sequence is estimated using Gaussian mixture model and based on estimated PDF, we determine a new PDF presenting sequences that counteract statistically the original sequence. Our strategy is performed on several genes as well as HIV nucleotide sequence and corresponding results is presented.
---
PDF链接:
https://arxiv.org/pdf/1908.03342
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群