Contact Us

Give us a call or drop by anytime, we endeavor to answer all inquiries within 24 hours.

map

Find us

PO Box 16122 Collins Street West Victoria, Australia

Email us

info@domain.com / example@domain.com

Phone support

Phone: + (066) 0760 0260 / + (057) 0760 0560

Event Series Seminar Series

Flat minima and generalization in deep learning: a case study in low rank matrix recovery

Halıcıoğlu Data Science Institute Building, Room 123 3234 Matthews Lane, La Jolla, CA, United States

Abstract: Recent advances in machine learning and artificial intelligence have relied on fitting highly overparameterized models, notably deep neural networks, to observed data. In such settings, the number of parameters of the model is much greater than the number of data samples, thereby resulting in a continuum of models with near-zero training error. Understanding which of these models generalize well and which do not is the central open question in deep learning. Recent empirical evidence suggests one mechanism for generalization: the shape of the training loss around a local minimizer seems to strongly impact the model’s performance. In particular, flat minima -- those around which the loss grows slowly -- appear to generalize well. Clarifying this phenomenon can shed new light on generalization in deep learning, which still largely remains a mystery.

I will describe our recent work that takes a step towards this goal by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyze overparameterized matrix and bilinear sensing, robust PCA, covariance matrix estimation, and single hidden layer neural networks with quadratic activation functions. In all cases, we show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions. These results suggest (i) a theoretical basis for favoring methods that bias iterates towards flat solutions and (ii) use of Hessian trace as a good regularizer for some learning tasks. We end by discussing the impact of depth on the generalization properties of flat solutions, which surprisingly is not always beneficial.