This book is devoted to the fundamentals of text mining using Perl, an open-source programming tool that is freely available via the Internet (
www.perl.org). It covers mining ideas from several perspectives--statistics, data mining, linguistics, and information retrieval--and provides readers with the means to successfully complete text mining tasks on their own.
The book begins with an introduction to regular expressions, a text pattern methodology, and quantitative text summaries, all of which are fundamental tools of analyzing text. Then, it builds upon this foundation to explore:
Probability and texts, including the bag-of-words model
Information retrieval techniques such as the TF-IDF similarity measure
Concordance lines and corpus linguistics
Multivariate techniques such as correlation, principal components analysis, and clusteringPerl modules, German, and permutation tests
目录:
1 Introduction
2 Text Patterns
3 Quantitative Text Summaries
4 Probability and Text Sampling
5 Applying Information Retrieval to Text Mining
6 Concordance Lines and Corpus Linguistics
7 Multivariate Techniques with Text
8 Text Clustering
9 A Sample of Additional Topics
Appendix A: Overview of Perl for Text Mining
Appendix B: Summary of R used in this Book
PDF下载回复可见: