全部版块 我的主页
论坛 休闲区 十二区 休闲灌水
1108 1
2014-12-10

Machine Learning - WEKA


David Gilbert and José Antonio Reyes




The aim of this lab is to give you practical experience in the use of WEKA for Machine Learning applications from the lecture on machine Learning for micro-array classification.Resources:Some useful resources about WEKA are at the website www.cs.waikato.ac.nz/ml/weka

The WEKA datafiles for this tutorial can be found here.

Exercises:

  • Practice WEKA with the classification example about Play Golf
    • Data format: the Datasets for WEKA are formatted according to the arff format. For this example you will use the file weather.nominal.arff as a training file to construct a classification model. Save the file in your workspace for example (C:\WEKA_Tutorial), and open it in a text processor to see an example of the arff format; note that the last attribute corresponds to the class.
    • Run WEKA in the Windows environment:
      Find the WEKA directory in your machine (C:\Program Files\Weka-3-4)
      Double click in the file"weka.jar"; Select the option "Simple CLI"
      Now you are ready tu run WEKA using some commands in this window.
    • Probe the example with different classifiers, and compare the results obtained with each of the classifiers for example in terms of and number of examples correctly and incorrectly classified:Decision Trees: In order to probe decision tree you will use the Id3 classifier. Type the following command
      java weka.classifiers.trees.Id3 -t PATH/weather.nominal.arff(note that the option -t calls the training file according the PATH location of this file in your machine)Support Vector Machines: In order to probe the SVM classifier, type the following command
      java weka.classifiers.functions.SMO -t PATH/weather.nominal.arffNeural Networks: In order to probe the NNs classifier, type the following command
      java weka.classifiers.functions.VotedPerceptron -t PATH/weather.nominal.arffNaive Bayes: In order to probe the NB classifier, type the following command
      java weka.classifiers.bayes.NaiveBayes -t PATH/weather.nominal.arff
    • Save the classification model and then use it to classify new examples: You can save the classification model generated by each one of the above classifiers by using the option -d in the following way:
      java weka.classifiers.TYPE.CLASSIFIER_NAME -t PATH/weather.nominal.arff -d PATH/modelname.modelYou should generate a file that contains the model; this can be named for example in the form: weather_Id3.model
      weather_SVM.model
      weather_NN.model
      weather_NB.model
      e.g. by
      java weka.classifiers.trees.Id3 -t PATH/weather.nominal.arff -d PATH/weather_Id3.modelIn order to use the stored model to classify new examples, use the file "test_weather.arff" (save this file in the same folder than weather.nominal.arff and *.model files). In this file you have two examples without classification. Then classify these examples using the models previously generated in the following way:
      java weka.classifiers.~.classifier_name -T PATH/test_weather.arff -l PATH/modelname.model -p 0In this case you use the options: -T that calls a test file (test_weather.arff); and -l that call the model file to be used. Compare the results obtained using the four models generated.
  • Classification of breast cancer examples.
    Download the file Breast_Cancer.arff that include a set of 699 cases, 9 attributes and the class attribute related to the type of cancer cell (in this dataset class 4 is equivalent to malignant cells and class 2 is equivalent to benign cells). This dataset is from the Wisconsin Breast Cancer Database (January 8, 1991). You can look for this and others examples of dataset in this linkClassify the examples in the "Breast_Cancer.arff" dataset (benign and malignant cells) using the four classifiers mentioned in the exercise 1, and compare the results.
    NOTE: This dataset contains numerical data, so you you can not use Id3 classifier (Id3 only support nominal attributes). In this case try decision trees with J48 classifier with the following command
    java weka.classifiers.trees.J48 -t PATH/Breast_Cancer.arff
  • Classification of Gene expression data.
    Download the file ALLAML.arff (Golub et al 1999) gene expression data that include 72 examples, 7129 genes (attributes) and 2 clases "acute myeloid leukemia (AML)" and "acute lymphoblastic leukemia (ALL)". For more information you can read the gene list in the file ALLAML.gene_names.txt, and in the paper Golub et al 1999Classify the examples in this dataset (ALL or AML class) using the four classifiers mentioned in the exercise 1, and compare the results.
    Interpretation: Go to PubMed and search the selected genes, do they have any biological meaning? Can you identify the unknown gene function? (Try using other bioinformatics tools)



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2014-12-10 08:51:17

Machine Learning - WEKA


David Gilbert and José Antonio Reyes



The aim of this lab is to give you more practical experience in the use of WEKA for Machine Learning applications from the lecture on machine Learning for micro-array classification.


Resources:
Some useful resources about WEKA are at the website www.cs.waikato.ac.nz/ml/weka

The WEKA datafiles for this tutorial can be found here.

Exercises:

  • Ensure that you have worked through the previous Weka tutorial.
  • Look at one of the confusion matrices output by Weka from e.g.java weka.classifiers.trees.Id3 -t PATH/weather.nominal.arffCompute the values for
    • TP (True Positives)
    • FP (False Positives)
    • TN (True Negatives)
    • FN (False Negatives)
    <p
  • Use these values to compute the following measures of performance
    • Accuracy
    • Positive Predicted Value -- PPV
    • Negative Predicted Value -- NPV
    • TP-rate (Sensitivity / Recall)
    • TN-rate (Specificity)
    • F-measure (van Rijsbergen)
    • Correlation Coefficient
    </p
  • Repeat these for confusion matrices generated by other classifiers, to see how their performance measurements differ from that for Id3.
  • Use different cross-validation fold values [with the -x option], e.g.java weka.classifiers.trees.Id3 -t PATH/weather.nominal.arff -x 5and then recompute the values for the measures of performance above.
  • Area Under the Curve (AUC) computations:
    Download these files to a workspace on your XP machine.
    Make sure that you can understand the files auc_execute.bat, AUC.pl, and yeast.txt.
    Then run auc_execute by clicking on the icon. Have a look at the results file auc_results.txt.


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群