Speaker: Tianhao Wang
Date & Time: Monday Nov 10th, 2pm
Location: HDSI Multipurpose Room 123, 1st floor
Title: Adaptive Optimizers: From Structured Preconditioners to Adaptive Geometry
Abstract: Adaptive optimizers such as Adam and Shampoo are workhorses of modern machine learning, enabling efficient training of large-scale models across architectures and domains. In this talk, we will present a unified framework for adaptive optimizers with structured preconditioners, encompassing a variety of existing methods and introducing new ones. Our analysis reveals the fundamental interplay between preconditioner structures and loss geometries, highlighting in particular that more adaptivity is not always helpful. Furthermore, the dominance of adaptive methods has recently been challenged by the surprising effectiveness of simpler normalized steepest descent (NSD)–type methods such as Muon, while a consensus has emerged that both families of methods succeed by exploiting the non-Euclidean geometry of the loss landscape. Building on the proposed framework, we show that the convergence of adaptive optimizers is governed by a notion of adaptive smoothness, which contrasts with the standard smoothness assumption leveraged by NSD. In addition, although adaptive smoothness is a stronger condition, it enables acceleration via Nesterov momentum, which cannot be achieved under the standard smoothness assumption in non-Euclidean settings. Finally, we develop a notion of adaptive gradient variance that parallels adaptive smoothness and yields qualitatively improved guarantees compared to those based on standard gradient variance.
Speaker Bio: Tianhao Wang is an Assistant Professor at the Halıcıoğlu Data Science Institute, University of California, San Diego. Prior to UCSD, he was a Research Assistant Professor at Toyota Technological Institute at Chicago. He received his PhD from the Department of Statistics and Data Science at Yale University in 2024. His research focuses on theoretical foundations at the intersection of deep learning, optimization, and statistics.
More info is available on Professor Wang’s website: https://tiiao.github.io/