Unsupervised Ensemble Learning: Estimating the Accuracy of Multiple Classifiers without Labeled Data

Seminar: 
Applied Mathematics
Event time: 
Tuesday, May 5, 2015 - 12:15pm to 1:15pm
Location: 
AKW 200
Speaker: 
Ariel Jaffe
Speaker affiliation: 
Weizmann Institute of Science
Event description: 

Consider a situation where one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario occurs for instance in crowdsourcing, where the task of creating a dataset is distributed between people with unknown expertise.

This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the reliability of these different classifiers, is it possible to consistently estimate their accuracies? Furthermore, also in a completely unsupervised manner, can one construct a more accurate unsupervised ensemble classifier?

For the binary case, we prove that under standard classifier independence assumptions, the answer to these questions is positive. We also construct two computationally efficient algorithms for estimating the classifiers accuracy.

In addition, we discuss a more general model where the assumption of total independence is relaxed.

The competitive performance of our algorithms is illustrated via extensive experiments on both artificial and real datasets. For real data experiments we used datasets from problems in genomics.

Special note: 
*Tea - AKW 1st Floor Break Area at 3:45 p.m.*