Learning and Inference: a view from theoretical computer science | Anindya De
Machine learning and high-dimensional inference are a wellspring of fundamental algorithmic challenges in data science. In this talk, I will discuss two strands of work in this line of research. (i) In the first part, I will talk about ""estimation of fit"" type problems -- the high level goal here is to understand how well the best model (from a class of models) fits the target data. My focus will be on ""sparse models"", a standard assumption in machine learning to deal with data deficiency. In this setting, we design algorithms for estimating fit where the complexity scales just in the sparsity parameter and is independent of the ambient dimension. (ii) In the second part, I will discuss ""noisy reconstruction problems"" where the algorithm gets access to the target object through noisy samples. This formulation captures many well studied problems in machine learning. We give a new general algorithmic technique for such problems based on Fourier analysis which we refer to as "Fourier stability" and use this to design new, state of the art algorithms for "population recovery" and "trace reconstruction" -- two basic problems in unsupervised learning which have received significant attention in theoretical computer science. Our algorithms leverage methods and techniques from Boolean function analysis -- a set of tools at the intersection of analysis and combinatorics -- which has been very influential in complexity theory, especially in areas like PCPs and hardness of approximation. Synergistically, the development of this new algorithmic toolkit for problems in learning and inference has also led to discovery of results of intrinsic interest to probability theory and analysis, some of which I will briefly survey.
