Type of Publication: Journal Articles
Authors: Asaf Shabtai,
Title: Adapted Features and Instance Selection for Improving Co-training
Name of the Journal: Mach. Learn. (Netherlands)
Year: 2014
Volume: 91
Issue: 1
Pages: 81 - 100
Abstract: High quality, labeled data is essential for successfully applying machine learning methods to real-world problems. However, in many cases, the amount of labeled data is insufficient and labeling that data is expensive or time consuming. Co-training algorithms, which use unlabeled data in order to improve classification, have proven to be effective in such cases. Generally, co-training algorithms work by using two classifiers trained on two different views of the data to label large amounts of unlabeled data, and hence they help minimize the human effort required to label new data. In this paper we propose simple and effective strategies for improving the basic co-training framework. The proposed strategies improve two aspects of the co-training algorithm: the manner in which the features set is partitioned and the method of selecting additional instances. An experimental study over 25 datasets, proves that the proposed strategies are especially effective for imbalanced datasets. In addition, in order to better understand the inner workings of the co-training process, we provide an in-depth analysis of the effects of classifier error rates and performance imbalance between the two ”views” of the data. We believe this analysis offers insights that could be used for future research.
Keywords: feature selection;learning (artificial intelligence);pattern classification; ,
Last Updated: 1/13/2016 12:00:00 AM
Powered by Rami Palombo © 2005
Search in: Google Scholar  |  Scitation