Evaluate semi-supervised classification with collaborative learning algorithm

Ace p.

Order the writing of a tailor-made Mathematics Term papers

Free quote online

Thesis Format .pdf

Evaluate semi-supervised classification with collaborative learning algorithm

Download

Read an extract

Reader
Abstract
Contents
Extract

Abstract

Conventional classification models are built based on supervised learning which invariably requires class labels of entities to be classified along with their description. Even the instance based classifiers require class labels of their training examples to predict the class labels of unknown data. Charles et al proposed an innovative algorithm namely Classification from Multiple Sources (CMS) for building classifiers based on data without class labels. CMS adopts collaborative learning approach that involves multiple sources each capable of classifying the data independently. In this algorithm separate partition trees are built in accordance with data descriptions provided by each source and their leaves are merged to get uni-class labeled groups of examples. While building the partition tree, attributes are prioritized based on the conditional probability of the attribute to assume its most frequent value given the rest of the attributes.

Keywords: Semi-Supervised Learning, Collaborative Learning, Classification, Partitional Clustering, Data Description

Abstract
Introduction
Classification from multiple sources
Proposed algorithm: CLASSC
1. Model development phase
2. Classification phase
Experimental results
1. Classification of unknown data
Conclusion
References

Get this table of contents for free after login.

Extract

[...] The resulting large number of clusters of each modality were further merged to form small number of larger clusters using collaborative merging algorithm, with β= 0.1 that results in three uni-class clusters for each of the modalities, excluding singleton clusters. The results of the CLASSC algorithm are tabulated in Table 1 and it can be observed that the clusters formed under both modalities are totally consistent with each other. The data taken is originally an imbalanced data, where most of the instances correspond to normal situations, which was reflected by the wide variation in cluster sizes from 3 to 503. [...]

[...] Hence, it was observed that CMS algorithm is confined to data with discrete attributes. In this paper, authors propose a different algorithm for building a classifier based on the descriptions of entities obtained from multiple collaborative sources. This algorithm overcomes the limitations of CMS algorithm in dealing with continuous valued attributes and the performance of the proposed algorithm is analyzed. The rest of the paper is organized as follows: Section 2 presents the features and limitations of the algorithm suggested by Charles. [...]

[...] Specifically, there is no classified set of test examples to compare the outcome generated by clustering algorithms However, the quality of the clusters can be improved by collaborative learning provided, there exists two or more data sources with an ability to independently classify a set of data descriptions. It is also necessary that the results of classification done by the multiple data sources be consistent for a successful collaborative learning. The collaborative learning involves merging and refining clusters generated using the partial descriptions provided by each source in accordance with information available from the results of the clustering done on the partial description from the other sources. [...]

[...] If the data is imbalanced, the classification rules that can identify the elements of minority classes are tried before applying the classification rules associated with majority class EXPERIMENTAL RESULTS The CLASSC algorithm was tested on water treatment plant data collected from UCI machine learning repository by building a classifier that can discriminate a problematic situation from normal situations based on unlabelled data. The data consists of 527 records, each of which describing the working condition of a waste water treatment plant on a day in the interval 1990-1991. [...]

[...] PH-E = COND-P =3120, ZN-E DQO-E =156, PH-P = SSV-D = DBO-D = SSV-S = PH-D DQO-S =220, SED-S = DQO-D = COND-S = SS-D = 56.37 Similarly based on the description of another unknown sample in terms of the attributes of modality B it can be classified into Cluster RD-DBO-P RD-SS-P RD-SED-P = RD-SS-G RD-SED-G = CONCLUSION An algorithm for semi-supervised learning of classification rules from unlabelled data collected from multiple collaborative sources is proposed. Existing classification algorithms requires a training set of examples with known class labels for building classifier models through collaborative learning. [...]

pdf