Tuesday, January 16, 2018
01/16/2018 - 4:00pm to 5:00pm
With the availability of huge amounts of unlabeled data, unsupervised learning methods are gaining increasing popularity and importance. We focus on ”unsupervised ensemble learning”, where one obtains the predictions of multiple classiﬁers over a set of unlabeled instances. The classiﬁers may be human experts as in crowdsourcing, or prediction algorithms developed by research groups worldwide. The challenge is to estimate the accuracies of the diﬀerent classiﬁers and combine them to an accurate meta-learner. To tackle this problems we show howit relates to latent variable models, and derive simple estimates for the classiﬁers’ accuracies based on a spectral analysis of the observed data. On the experimental side, we apply our methods to a problem in Computational Biology, where for various classiﬁcation tasks one combines theresults of multiple algorithms for improved accuracy. In the second partof the talk, I will focus on extending the techniques developed for unsupervised ensemble learning to a speciﬁc family of linear latent variablemodels. For cases where the latent layer is binary, we derive an interesting relation between the model parameters and the relatively recent notion of tensor eigenvectors of the data higher order moments. We apply ourmethods to overlapping clustering, a problem that gained popularity dueits applicability in various domains such as gene expressions analysis and text categorization.