Abstract: Recent advances in machine learning and artificial intelligence have relied on fitting highly overparameterized models, notably deep neural networks, to observed data. In such settings, the number of parameters of the model is much greater than the number of data samples, thereby resulting in a continuum of models with near-zero training error. Understanding which of these models generalize well and which do not is the central open question in deep learning. Recent empirical evidence suggests one mechanism for generalization: the shape of the training loss around a local minimizer seems to strongly impact the model’s performance. In particular, flat minima — those around which the loss grows slowly — appear to generalize well. Clarifying this phenomenon can shed new light on generalization in deep learning, which still largely remains a mystery.
I will describe our recent work that takes a step towards this goal by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyze overparameterized matrix and bilinear sensing, robust PCA, covariance matrix estimation, and single hidden layer neural networks with quadratic activation functions. In all cases, we show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions. These results suggest (i) a theoretical basis for favoring methods that bias iterates towards flat solutions and (ii) use of Hessian trace as a good regularizer for some learning tasks. We end by discussing the impact of depth on the generalization properties of flat solutions, which surprisingly is not always beneficial.
Bio: Dmitriy Drusvyatskiy received his PhD from the Operations Research and Information Engineering department at Cornell University in 2013, followed by a post doctoral appointment in the Combinatorics and Optimization department at Waterloo, 2013-2014. He joined the Mathematics department at University of Washington as an Assistant Professor in 2014, and was promoted to an Associate Professor in 2019, and to Full Professor in 2022. Dmitriy’s research broadly focuses on designing and analyzing algorithms for large-scale optimization problems, primarily motivated by applications in data science. Dmitriy has received a number of awards, including the Air Force Office of Scientific Research (AFOSR) Young Investigator Program (YIP) Award, NSF CAREER, SIAG/OPT Best Paper Prize 2023, Paul Tseng Faculty fellowship 2022-2026, INFORMS Optimization Society Young Researcher Prize 2019, and finalist citations for the Tucker Prize 2015 and the Young Researcher Best Paper Prize at ICCOPT 2019. Dmitriy is currently a co-PI of the NSF funded Transdisciplinary Research in Principles of Data Science (TRIPODS) institute at University of Washington.