Data Mining: Concepts and Techniques
Jiawei Han and Micheline Kamber, Simon Fraser University
Note: This manuscript is based on a forthcoming book by Jiawei Han and Micheline Kamber, c2000 (c) Morgan Kaufmann Publishers.
------------------------------------------------
https://bbs.pinggu.org/thread-28773-1-1.html
[此贴子已经被作者于2006-1-11 12:15:39编辑过]
The book is organized as follows.
Chapter 1 provides an introduction to the multidisciplinary field of data mining. It discusses the evolutionary path of database technology which led up to the need for data mining, and the importance of its application potential. The basic architecture of data mining systems is described, and a brief introduction to the concepts of database systems and data warehouses is given. A detailed classification of data mining tasks is presented, based on the different kinds of knowledge to be mined. A classification of data mining systems is presented, and major challenges in the field are discussed.
Chapter 2 is an introduction to data warehouses and OLAP (On-Line Analytical Processing). Topics include the concept of data warehouses and multidimensional databases, the construction of data cubes, the implementation of on-line analytical processing, and the relationship between data warehousing and data mining.
Chapter 3 describes techniques for preprocessing the data prior to mining. Methods of data cleaning, data integration and transformation, and data reduction are discussed, including the use of concept hierarchies for dynamic and static discretization. The automatic generation of concept hierarchies is also described.
Chapter 4 introduces the primitives of data mining which define the specification of a data mining task. It describes a data mining query language (DMQL), and provides examples of data mining queries. Other topics include the construction of graphical user interfaces, and the specification and manipulation of concept hierarchies.
Chapter 5 describes techniques for concept description, including characterization and discrimination. An attribute-oriented generalization technique is introduced, as well as its different implementations including a generalized relation technique and a multidimensional data cube technique. Several forms of knowledge presentation and visualization are illustrated. Relevance analysis is discussed. Methods for class comparison at multiple abstraction levels, and methods for the extraction of characteristic rules and discriminant rules with interestingness measurements are presented. In addition, statistical measures for descriptive mining are discussed.
Chapter 6 presents methods for mining association rules in transaction databases as well as relational databases and data warehouses. It includes a classification of association rules, a presentation of the basic Apriori algorithm and its variations, and techniques for mining multiple-level association rules, multidimensional association rules, quantitative association rules, and correlation rules. Strategies for finding interesting rules by constraint-based mining and the use of interestingness measures to focus the rule search are also described.
Chapter 7 describes methods for data classification and predictive modeling. Major methods of classification and prediction are explained, including decision tree induction, Bayesian classification, the neural network technique of backpropagation, k-nearest neighbor classifiers, case-based reasoning, genetic algorithms, rough set theory, and fuzzy set approaches. Association-based classification, which applies association rule mining to the problem of classification, is presented. Methods of regression are introduced, and issues regarding classifier accuracy are discussed.
Chapter 8 describes methods of clustering analysis. It first introduces the concept of data clustering and then presents several major data clustering approaches, including partition-based clustering, hierarchical clustering, and model-based clustering. Methods for clustering continuous data, discrete data, and data in multidimensional data cubes are presented. The scalability of clustering algorithms is discussed in detail.
Chapter 9 discusses methods for data mining in advanced database systems. It includes data mining in object-oriented databases, spatial databases, text databases, multimedia databases, active databases, temporal databases, heterogeneous and legacy databases, and resource and knowledge discovery in the Internet information base.
Finally, in Chapter 10, we summarize the concepts presented in this book and discuss applications of data mining and some challenging research issues.
<<Data Mining: Concepts and Techniques>>
Preface
Our capabilities of both generating and collecting data have been increasing rapidly in the last several decades. Contributing factors include the widespread use of bar codes for most commercial products, the computerization of many business, scientific and government transactions and managements, and advances in data collection tools ranging from scanned texture and image platforms, to on-line instrumentation in manufacturing and shopping, and to satellite remote sensing systems. In addition, popular use of the World Wide Web as a global information system has flooded us with a tremendous amount of data and information. This explosive growth in stored data has generated an urgent need for new techniques and automated tools that can intelligently assist us in transforming the vast amounts of data into useful information and knowledge.
This book explores the concepts and techniques of data mining, a promising andourishing frontier in database systems and new database applications. Data mining, also popularly referred to as knowledge discovery in databases (KDD), is the automated or convenient extraction of patterns representing knowledge implicitly stored in large databases, data warehouses, and other massive information repositories.
Data mining is a multidisciplinary field, drawing work from areas including database technology, artificial intelligence, machine learning, neural networks, statistics, pattern recognition, knowledge based systems, knowledge acquisition, information retrieval, high performance computing, and data visualization. We present the material in this book from a database perspective. That is, we focus on issues relating to the feasibility, usefulness, efficiency, and scalability of techniques for the discovery of patterns hidden in large databases. As a result, this book is not intended as an introduction to database systems, machine learning, or statistics, etc., although we do provide the background necessary in these areas in order to facilitate the reader's comprehension of their respective roles in data mining. Rather, the book is a comprehensive introduction to data mining, presented with database issues in focus. It should be useful for computing science students, application developers, and business professionals, as well as researchers involved in any of the disciplines listed above.
Data mining emerged during the late 1980's, has made great strides during the 1990's, and is expected to continue toourish into the new millennium. This book presents an overall picture of the field from a database researcher's point of view, introducing interesting data mining techniques and systems, and discussing applications and research directions. An important motivation for writing this book was the need to build an organized framework for the study of data mining | a challenging task owing to the extensive multidisciplinary nature of this fast developing field. We hope that this book will encourage people with different backgrounds and experiences to exchange their views regarding data mining so as to contribute towards the further promotion and shaping of this exciting and dynamic field.
![]() |
[此贴子已经被作者于2006-1-11 12:17:23编辑过]
Ming-Syan Chen, Jiawei Han, Philip S. Yu
Abstract: Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many di#erent #elds have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining
只需: 1 个论坛币 马上下载
本附件包括:
[此贴子已经被作者于2005-5-8 13:15:41编辑过]
![]() | ![]() | Data Mining: Concepts, Models, Methods, and Algorithms Mehmed Kantardzic ISBN: 0-471-22852-4 Paperback 360 pages October 2002, Wiley-IEEE Press |
CDN $96.99
“...clear and well understandable...recommended as basic guidance...practitioners will profit from the author's long experience..." (Zentralblatt Math, Vol. 1027, 2004)
“...reviews state-of-the-art techniques for analyzing enormous quantities of raw data...” (Quarterly of Applied Mathematics, Vol. LXI, No. 3, September 2003)
"…this is a comprehensive textbook that describes the process and methodologies of data mining in an unbiased manner…serves as an excellent starting point for anyone wishing to learn about data mining.” (Journal of Proteome Research, May/ June 2003)
"...a valuable book.... I truly enjoyed reading the book and I am glad to recommend it to anyone working in this fascinating field." (IIE Transactions)
"...detailed, well illustrated, and easy to understand...comprehensive…a good book..." (Mathematical Reviews 2003h)
"...this is probably the first data-mining book that I would select from my bookshelf as reading material for a statistician..." (Technometrics, Vol. 45, No. 3, August 2003)
只需: 25 个论坛币 马上下载
本附件包括:
[此贴子已经被作者于2005-5-8 21:10:52编辑过]
Managing Data Mining Technologies in Organizations: Techniques and Applications | |
by Parag Pendharkar (ed) | ISBN:1591400570 |
Idea Group Publishing © 2003 (288 pages) | |
This book details the state-of-the-art data mining research, which reflects in a potpourri of chapters that demonstrate diverse use of techniques and their applications for data mining. |
只需: 25 个论坛币 马上下载
本附件包括:
[UserName=winslow][/UserName]
[此贴子已经被作者于2005-5-8 21:12:50编辑过]
只需: 20 个论坛币 马上下载
本附件包括:
[此贴子已经被作者于2005-5-8 19:36:11编辑过]
Principles of Data Mining David J. Hand, Heikki Mannila and Padhraic Smyth
Full Contents | ||
List of Tables | ||
List of Figures | ||
Series Foreword | ||
Preface ![]() | ||
1 | Introduction ![]() | |
2 | Measurement and Data | |
3 | Visualizing and Exploring Data | |
4 | Data Analysis and Uncertainty | |
5 | A Systematic Overview of Data Mining Algorithms | |
6 | Models and Patterns | |
7 | Score Functions for Data Mining Algorithms | |
8 | Search and Optimization Methods | |
9 | Descriptive Modeling | |
10 | Predictive Modeling for Classification | |
11 | Predictive Modeling for Regression | |
12 | Data Organization and Databases | |
13 | Finding Patterns and Rule | |
14 | Retrieval by Content | |
Appendix: Random Variables | ||
References | ||
Index ![]() | ||
[此贴子已经被作者于2005-6-7 8:23:40编辑过]
Data Quality : The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems) (Paperback) by Jack E. Olson
Concepts and technical approaches for analyzing and improving the usefullness of source data are presented in a well organized and logical sequence. Not quite a roadmap/algorithm for achieving data quality; it's balanced a little more on the conceptual side. The techniques and concepts presented are ones you _will_ want to begin using if you have any data quality issues in your source data (and who doesn't), I guarantee that. As a working ETL developer referencing disparate, low quality data, this book has had an impact on our approach, our results, and our project. Short and to the point, it's an easy study also!
Jack Olson coined the term "data profiling" and essentially founded this important new field in the area of data quality assessment. His revolutionary techniques, outlined in this book, can provide professionals with an important new set of tools for analyzing data quality. This is a must read for anyone working in the data quality field today. I also recommend it for people in related fields such as data warehousing, Enterprise Application Integration, and database design. With so many "me too" books in the computer field, it's a real joy to find a book that really does break new ground.
[此贴子已经被作者于2005-8-6 7:24:01编辑过]