全部版块 我的主页
论坛 数据科学与人工智能 数据分析与数据科学 数据分析与数据挖掘
5608 3
2015-12-30
微生物数据分析已经涉及到非常广泛的专业领域,今天为大家分享一本微生物信息数据分析新作,与大家共同学习。


目录

Contents

Preface, xvii

Acknowledgments, xxi

Authors, xxiii

Chapter 1 ◾ Introduction to RNA-seq 1

1.1 INTRODUCTION 1

1.2 ISOLATION OF RNAs 3

1.3 QUALITY CONTROL OF RNA 4

1.4 LIBRARY PREPARATION 6

1.5 MAJOR RNA-SEQ PLATFORMS 9

1.5.1 Illumina 9

1.5.2 SOLID 10

1.5.3 Roche 454 11

1.5.4 Ion Torrent 11

1.5.5 Pacific Biosciences 12

1.5.6 Nanopore Technologies 13

1.6 RNA-SEQ APPLICATIONS 14

1.6.1 Protein Coding Gene Structure 14

1.6.2 Novel Protein-Coding Genes 16

1.6.3 Quantifying and Comparing Gene Expression 16

1.6.4 Expression Quantitative Train Loci (eQTL) 17

1.6.5 Single-Cell RNA-seq 18

1.6.6 Fusion Genes 18

viii ◾ Contents

1.6.7 Gene Variations 19

1.6.8 Long Noncoding RNAs 19

1.6.9 Small Noncoding RNAs (miRNA-seq) 20

1.6.10 Amplification Product Sequencing (Ampli-seq) 20

1.7 CHOOSING AN RNA-SEQ PLATFORM 21

1.7.1 Eight General Principles for Choosing an RNA-seq Platform and Mode of Sequencing 21

1.7.1.1 Accuracy: How Accurate Must the Sequencing Be? 21

1.7.1.2 Reads: How Many Do I Need? 22

1.7.1.3 Length: How Long Must the Reads Be? 23

1.7.1.4 SR or PE: Single Read or Paired End? 23

1.7.1.5 RNA or DNA: Am I Sequencing RNA or DNA? 23

1.7.1.6 Material: How Much Sample Material Do I Have? 24

1.7.1.7 Costs: How Much Can I Spend? 24

1.7.1.8 Time: When Does the Work Need to Be Completed? 24

1.7.2 Summary 25

REFERENCES 25

Chapter 2 ◾ Introduction to RNA-seq Data Analysis 27

2.1 INTRODUCTION 27

2.2 DIFFERENTIAL EXPRESSION ANALYSIS WORKFLOW 30

2.2.1 Step 1: Quality Control of Reads 31

2.2.2 Step 2: Preprocessing of Reads 31

2.2.3 Step 3: Aligning Reads to a Reference Genome 31

2.2.4 Step 4: Genome-Guided Transcriptome Assembly 32

2.2.5 Step 5: Calculating Expression Levels 32

2.2.6 Step 6: Comparing Gene Expression between Conditions 33

2.2.7 Step 7: Visualization of Data in Genomic Context 33

Contents ◾ ix

2.3 DOWNSTREAM ANALYSIS 34

2.3.1 Gene Annotation 34

2.3.2 Gene Set Enrichment Analysis 34

2.4 AUTOMATED WORKFLOWS AND PIPELINES 35

2.5 HARDWARE REQUIREMENTS 35

2.6 FOLLOWING THE EXAMPLES IN THE BOOK 36

2.6.1 Using Command Line Tools and R 36

2.6.2 Using the Chipster Software 37

2.6.3 Example Data Sets 39

2.7 SUMMARY 40

REFERENCES 40

Chapter 3 ◾ Quality Control and Preprocessing 41

3.1 INTRODUCTION 41

3.2 SOFTWARE FOR QUALITY CONTROL AND PREPROCESSING 42

3.2.1 FastQC 42

3.2.2 PRINSEQ 43

3.2.3 Trimmomatic 44

3.3 READ QUALITY ISSUES 44

3.3.1 Base Quality 44

3.3.1.1 Filtering 45

3.3.1.2 Trimming 49

3.3.2 Ambiguous Bases 52

3.3.3 Adapters 54

3.3.4 Read Length 55

3.3.5 Sequence-Specific Bias and Mismatches Caused by Random Hexamer Priming 56

3.3.6 GC Content 57

3.3.7 Duplicates 57

3.3.8 Sequence Contamination 59

3.3.9 Low-Complexity Sequences and PolyA Tails 59x ◾ Contents

3.4 SUMMARY 60

REFERENCES 61

Chapter 4 ◾ Aligning Reads to Reference 63

4.1 INTRODUCTION 63

4.2 ALIGNMENT PROGRAMS 64

4.2.1 Bowtie 64

4.2.2 TopHat 68

4.2.3 STAR 73

4.3 ALIGNMENT STATISTICS AND UTILITIES FOR MANIPULATING ALIGNMENT FILES 77

4.4 VISUALIZING READS IN GENOMIC CONTEXT 81

4.5 SUMMARY 82

REFERENCES 83

Chapter 5 ◾ Transcriptome Assembly 85

5.1 INTRODUCTION 85

5.2 METHODS 87

5.2.1 Transcriptome Assembly Is Different from Genome Assembly 87

5.2.2 Complexity of Transcript Reconstruction 88

5.2.3 Assembly Process 89

5.2.4 de Bruijn Graph 90

5.2.5 Use of Abundance Information 91

5.3 DATA PREPROCESSING 92

5.3.1 Read Error Correction 93

5.3.2 Seecer 93

5.4 MAPPING-BASED ASSEMBLY 95

5.4.1 Cufflinks 95

5.4.2 Scripture 97

5.5 DE NOVO ASSEMBLY 98

5.5.1 Velvet + Oases 98

5.5.2 Trinity 100Contents ◾ xi

5.6 SUMMARY 104

REFERENCES 106

Chapter 6 ◾ Quantitation and Annotation-Based Quality Control 109

6.1 INTRODUCTION 109

6.2 ANNOTATION-BASED QUALITY METRICS 110

6.2.1 Tools for Annotation-Based Quality Control 111

6.3 QUANTITATION OF GENE EXPRESSION 116

6.3.1 Counting Reads per Genes 117

6.3.1.1 HTSeq 117

6.3.2 Counting Reads per Transcripts 120

6.3.2.1 Cufflinks 122

6.3.2.2 eXpress 122

6.3.3 Counting Reads per Exons 126

6.4 SUMMARY 128

REFERENCES 129

Chapter 7 ◾ RNA-seq Analysis Framework in R and Bioconductor 131

7.1 INTRODUCTION 131

7.1.1 Installing R and Add-on Packages 132

7.1.2 Using R 133

7.2 OVERVIEW OF THE BIOCONDUCTOR PACKAGES 134

7.2.1 Software Packages 134

7.2.2 Annotation Packages 134

7.2.3 Experiment Packages 135

7.3 DESCRIPTIVE FEATURES OF THE BIOCONDUCTOR PACKAGES 135

7.3.1 OOP Features in R 135

7.4 REPRESENTING GENES AND TRANSCRIPTS IN R 138

7.5 REPRESENTING GENOMES IN R 141

7.6 REPRESENTING SNPs IN R 143xii ◾ Contents

7.7 FORGING NEW ANNOTATION PACKAGES 143

7.8 SUMMARY 146

REFERENCES 146

Chapter 8 ◾ Differential Expression Analysis 147

8.1 INTRODUCTION 147

8.2 TECHNICAL VS. BIOLOGICAL REPLICATES 148

8.3 STATISTICAL DISTRIBUTIONS IN RNA-SEQ DATA 149

8.3.1 Biological Replication, Count Distributions, and Choice of Software 150

8.4 NORMALIZATION 152

8.5 SOFTWARE USAGE EXAMPLES 154

8.5.1 Using Cuffdiff 154

8.5.2 Using Bioconductor Packages: DESeq, edgeR, limma 158

8.5.3 Linear Models, the Design Matrix, and the Contrast Matrix 158

8.5.3.1 Design Matrix 159

8.5.3.2 Contrast Matrix 160

8.5.4 Preparations Ahead of Differential Expression Analysis 161

8.5.4.1 Starting from BAM Files 162

8.5.4.2 Starting from Individual Count Files 162

8.5.4.3 Starting from an Existing Count Table 163

8.5.4.4 Independent Filtering 163

8.5.5 Code Example for DESeq(2) 163

8.5.6 Visualization 164

8.5.7 For Reference: Code Examples for Other Bioconductor Packages 168

8.5.8 Limma 169

8.5.9 SAMSeq (samr package) 170

8.5.10 edgeR 171Contents ◾ xiii

8.5.11 DESeq2 Code Example for a Multifactorial Experiment 171

8.5.12 For Reference: edgeR Code Example 174

8.5.13 Limma Code Example 175

8.6 SUMMARY 176

REFERENCES 177

Chapter 9 ◾ Analysis of Differential Exon Usage 181

9.1 INTRODUCTION 181

9.2 PREPARING THE INPUT FILES FOR DEXSeq 183

9.3 READING DATA IN TO R 184

9.4 ACCESSING THE ExonCountSet OBJECT 185

9.5 NORMALIZATION AND ESTIMATION OF THE VARIANCE 187

9.6 TEST FOR DIFFERENTIAL EXON USAGE 190

9.7 VISUALIZATION 193

9.8 SUMMARY 198

REFERENCES 198

Chapter 10 ◾ Annotating the Results 199

10.1 INTRODUCTION 199

10.2 RETRIEVING ADDITIONAL ANNOTATIONS 200

10.2.1 Using an Organism-Specific Annotation Package to Retrieve Annotations for Genes 201

10.2.2 Using BioMart to Retrieve Annotations for Genes 205

10.3 USING ANNOTATIONS FOR ONTOLOGICAL ANALYSIS OF GENE SETS 208

10.4 GENE SET ANALYSIS IN MORE DETAIL 210

10.4.1 Competitive Method Using GOstats Package 211

10.4.2 Self-Contained Method Using Globaltest Package 213

10.4.3 Length Bias Corrected Method 215

10.5 SUMMARY 216

REFERENCES 216xiv ◾ Contents

Chapter 11 ◾ Visualization 217

11.1 INTRODUCTION 217

11.1.1 Image File Types 218

11.1.2 Image Resolution 218

11.1.3 Color Models 219

11.2 GRAPHICS IN R 219

11.2.1 Heatmap 220

11.2.2 Volcano Plot 224

11.2.3 MA Plot 226

11.2.4 Idiogram 228

11.2.5 Visualizing Gene and Transcript Structures 230

11.3 FINALIZING THE PLOTS 232

11.4 SUMMARY 234

REFERENCES 235

Chapter 12 ◾ Small Noncoding RNAs 237

12.1 INTRODUCTION 237

12.2 MICRORNAs (miRNAs) 239

12.3 MICRORNA OFF-SET RNAS (moRNAs) 243

12.4 PIWI-ASSOCIATED RNAS (piRNAs) 243

12.5 ENDOGENOUS SILENCING RNAs (endo-siRNAs) 244

12.6 EXOGENOUS SILENCING RNAs (exo-siRNAs) 244

12.7 TRANSFER RNAs (tRNAs) 245

12.8 SMALL NUCLEOLAR RNAs (snoRNAs) 245

12.9 SMALL NUCLEAR RNAs (snRNAs) 245

12.10 ENHANCER-DERIVED RNAs (eRNA) 246

12.11 OTHER SMALL NONCODING RNAs 246

12.12 SEQUENCING METHODS FOR DISCOVERY OF SMALL NONCODING RNAs 248

12.12.1 microRNA-seq 248

12.12.2 CLIP-seq 251

12.12.3 Degradome-seq 254

12.12.4 Global Run-On Sequencing (GRO-seq) 254Contents ◾ xv

12.13 SUMMARY 255

REFERENCES 255

Chapter 13 ◾ Computational Analysis of Small Noncoding RNA Sequencing Data 259

13.1 INTRODUCTION 259

13.2 DISCOVERY OF SMALL RNAs—miRDeep2 260

13.2.1 GFF files 260

13.2.2 FASTA Files of Known miRNAs 263

13.2.3 Setting up the Run Environment 263

13.2.4 Running miRDeep2 266

13.2.4.1 miRDeep2 Output 266

13.3 miRANALYZER 268

13.3.1 Running miRanalyzer 271

13.4 miRNA TARGET ANALYSIS 271

13.4.1 Computational Prediction Methods 272

13.4.2 Artificial Intelligence Methods 274

13.4.3 Experimental Support-Based Methods 275

13.5 miRNA-SEQ AND mRNA-SEQ DATA INTEGRATION 276

13.6 SMALL RNA DATABASES AND RESOURCES 277

13.6.1 RNA-seq Reads of miRNAs in miRBase 277

13.6.2 Expression Atlas of miRNAs 279

13.6.3 Database for CLIP-seq and Degradome-seq Data 281

13.6.4 Databases for miRNAs and Disease 281

13.6.5 General Databases for the Research Community and Resources 282

13.6.6 miRNAblog 282

13.7 SUMMARY 284

REFERENCES 284

INDEX 287


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2016-12-2 17:16:51
非常感谢楼主!!!!!!!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2017-12-31 23:00:13
RNA-seq Data Analysis-A Practical Approach
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2019-8-2 14:53:38
谢谢楼主的分享。好书一个。正需要
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群