[Case Study]Classifying High-dimensional Data using Matlab

Lisrelchen

971

收藏 2015-12-26

Classifying High-dimensional Data using Matlab

This example shows how to select features for classifying high-dimensional data. More specifically, it shows how to perform sequential feature selection, which is one of the most popular feature selection algorithms. It also shows how to use holdout and cross-validation to evaluate the performance of the selected features.

Reducing the number of features (dimensionality) is important in statistical learning. For many data sets with a large number of features and a limited number of observations, such as bioinformatics data, usually many features are not useful for producing a desired learning result and the limited observations may lead the learning algorithm to overfit to the noise. Reducing features can also save storage and computation time and increase comprehensibility.

There are two main approaches to reducing features: feature selection and feature transformation. Feature selection algorithms select a subset of features from the original feature set; feature transformation methods transform data from the original high-dimensional feature space to a new space with reduced dimensionality.

Loading the Data

Serum proteomic pattern diagnostics can be used to differentiate observations from patients with and without disease. Profile patterns are generated using surface-enhanced laser desorption and ionization (SELDI) protein mass spectrometry. These features are ion intensity levels at specific mass/charge values.

The data in this example is from the FDA-NCI Clinical Proteomics Program Databank. This example uses the high-resolution ovarian cancer data set that was generated using the WCX2 protein array. After some pre-processing steps, similar to those shown in the Bioinformatics Toolbox™ example Pre-processing Raw Mass Spectrometry Data, the data set has two variables obs and grp. The obs variable consists 216 observations with 5000 features. Each element in grp defines the group to which the corresponding row of obsbelongs.

load [size=13.6499996185303px]ovariancancer;whos

Name Size Bytes Class Attributes grp 216x1 26784 cell obs 216x4000 3456000 single

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

Lisrelchen

2015-12-26 07:25:23

复制代码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

Lisrelchen

2015-12-26 07:26:23

复制代码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

Lisrelchen

2015-12-26 07:27:23

复制代码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

Lisrelchen

2015-12-26 07:28:40

复制代码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

Lisrelchen

2015-12-26 07:30:01

复制代码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群

扫码加我拉你入群