Over the past few decades, an increase in the availability of data has resulted in focus on algorithms that can make sense of large, complex datasets. Unfortunately, these algorithms often fail when the data doesn't satisfy the modeling assumptions the algorithm designer has in mind. Another problem with modern machine learning algorithms is the resources they require to be trained. The carbon footprint of AI has become a serious problem today, with the resources required to train them doubling every few months.
My work focuses on developing principled algorithms which are (1) efficient and (2) robust to input misspecification.
To do this, I exploit classical as well as modern observations about structure in the data as well as underlying signals (such as being sparse, or being representable by modern generative models).
Outlier-robust Regression.
Can we recover a high dimensional linear function or univariate polynomial function from a constant fraction of the samples being arbitrarily corrupted? What about half? more than half? We explore these problems and either show that these are possible or demonstrate lower bounds.
Learning one-layer neural networks.
We show that even for the Gaussian distribution, it is hard to agnostically learn a one-layer neural network. We also show that this is possible if we weaken the objective.
Inverse problems under generative model assumptions.
Can we sample-efficiently recover signals which are sampled from a 'natural-looking' distribution from their linear measurements? We demonstrate an instance optimal algorithm for this problem when the measurements are drawn from the output of a Generative Adversarial Network (GAN).