Abstract: Many scientific problems involve invariant structures, and learning functions that rely on a much lower dimensional set of features than the data itself. Incorporating these invariances into a parametric model can significantly reduce the model complexity, and lead to a vast reduction in the number of labeled examples required to estimate the parameters. We display this benefit in two settings. The first setting concerns ReLU networks, and the size of networks and number of points required to learn certain functions and classification regions. Here, we assume that the target function has built in invariances, namely that it only depends on the projection onto a very low dimensional, function defined manifold (with dimension possibly significantly smaller than even the intrinsic dimension of the data). We use this manifold variant of a single or multi index model to establish network complexity and ERM rates that beat even the intrinsic dimension of the data. We should note that a corollary of this result is developing intrinsic rates for a manifold plus noise data model without needing to assume the distribution of the noise decays exponentially, and we also discuss implications in two-sample testing and statistical distances. The second setting for building invariances concerns linearized optimal transport (LOT), and using it to build supervised classifiers on distributions. Here, we construct invariances to families of group actions (e.g., shifts and scalings of a fixed distribution), and show that LOT can learn a classifier on group orbits using a simple linear separator. We demonstrate the benefit of this on MNIST by constructing robust classifiers with only a small number of labeled examples. This talk covers joint work with Timo Klock, Xiuyuan Cheng, and Caroline Moosmueller.
email email@example.com for info