Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.
Example 1: Analyzing and preparing a transaction data set
In this example, we show how a data set can be analyzed and manipulated before associations are mined. This is important for finding problems in the data set which could make the mined associations useless or at least inferior to associations mined on a properly prepared data set.
For the example, we look at the Epub transaction data contained in package arules. This data set contains downloads of documents from the Electronic Publication platform of the Vienna University of Economics and Business available via http://epub.wu-wien.ac.at from January 2003 to December 2008.
Example 2: Preparing and mining a questionnaire data set
As a second example, we prepare and mine questionnaire data. We use the Adult data set from the UCI machine learning repository (Asuncion and Newman 2007) provided by package arules. This data set is similar to the marketing data set used by Hastie et al. (2001) in their chapter about association rule mining. The data originates from the U.S. census bureau database and contains 48842 instances with 14 attributes like age, work class, education, etc. In the original applications of the data, the attributes were used to predict the income level of individuals. We added the attribute income with levels small and large,
representing an income of ≤ USD 50,000 and > USD 50,000, respectively. This data is included in arules as the data set AdultUCI.