Selecting informative features – a novel approach for the classification of large dataset

Ace p.

Order the writing of a tailor-made Computer science Term papers

Free quote online

Thesis Format .doc

Selecting informative features – a novel approach for the classification of large dataset

Download

Read an extract

Reader
Abstract
Contents
Extract

Abstract

Feature subset selection plays an essential role in all data mining applications. It speeds up a data mining algorithm and improves mining performance. This paper proposes a novel approach of feature subset selection for classification problem. This approach aims to find common features from different feature selection methods: filter, wrapper and hybrid to evaluate classification performance in terms of accuracy, runtime and number of features selected. Particularly, this approach identifies the most suitable features by considering the frequently occurring features selected by filter, wrapper and hybrid feature selection methods with best first, greedy stepwise and genetic searches on traditional classifiers. 10-fold cross validation was used to evaluate the performance. The empirical study using public domain dataset shows the comparable performance of our approach.

Keywords: Feature Subset Selection, Classification, Hybrid method, Information Gain, Genetic Search, and Cross Validation.

Abstract
Introduction
Feature selection methods
Classification algorithm
1. Radial basis function (RBF)
2. Naive Bayes
3. 10-fold cross validation
Experimental setup, results and discussions
1. Dataset description
2. Hybrid methods
3. Classification using frequently occurring features
Conclusion and future research
References

Get this table of contents for free after login.

Extract

[...] For feature selection problem, representation of the space of all possible subsets of the given feature set is a fixed length of binary string. Each feature in the feature set is considered as a binary gene. The value of each position in the string represents the presence or absence of a particular feature. Crossover and mutation operators are used to generate new feature subset for the next generation. Mutation operates randomly to add or delete the features in a subset. [...]

[...] For applying classification and feature selection methods we have used WEKA software The results reported in this section were obtained with 10-fold cross validation over the dataset. Combinations of feature selection and classification methods were examined for the datasets. The selected features for each classifier using feature selection methods are shown 3.3 CLASSIFICATION USING FREQUENTLY OCCURRING FEATURES We calculated the frequency of the occurrence of each feature from the datasets. A feature is selected as the part of the subset if it is selected by more number of times by the combination of feature selection and the traditional classification methods. [...]

[...] When the features are selected based on their frequency of occurrence there is an improvement in accuracy of RBF while slight decrease in JRIP classifier. The run time of the classifiers remain same for both top ranked and frequently occurring features CONCLUSION AND FUTURE RESEARCH In this work we have proposed a new frame work of feature selection by identifying frequently occurring features from different hybrid feature selection methods. The feature selection results are compared with top ranked feature subset. [...]

[...] Our method indicate that the new frequently occurring features performed the same as the traditional feature selection methods in terms of accuracy with a negligible difference in execution time REFERENCES R. Kohavi, G.John, “Wrappers for Feature Subset Selection”, Artificial Intelligence, vol.97, pp.271- Daniel M. Santoro et al., “Selecting Feature Subsets for Inducing Classifiers using a Committee of Heterogeneous Methods”, Proceedings of IEEE International Conference on Systems, Man and Cybernetics, vol.1, pp.372- A. Blum & P. Langely, “Selection of Relevant Features and Examples Machine Learning”, Artificial Intelligence, vol.97, no.1-2, pp.245- E.G. [...]

[...] It is common to approach the problem of feature subset as a search problem in a search space defined by all possible subsets of a feature set There are a number of different approaches to feature subset selection found in the literature, which aims to reduce the number of features describing training set. They can be broadly categorized as the filter approach and the wrapper approach Filter methods use independent evaluation criteria based on general characteristics of the data without involving any datamining algorithm. [...]

doc