Loading Events

« All Events

  • This event has passed.

Some new results for streaming principal component analysis

October 2, 2023 @ 2:00 pm - 3:00 pm

Abstract: While streaming PCA (also known as Oja’s algorithm) was proposed about four decades ago and has roots going back to 1949, theoretical resolution in terms of obtaining optimal convergence rates has been obtained only in the last decade. However, we are not aware of any available distributional guarantees, which can help provide confidence intervals on the quality of the solution. In this talk, I will present the problem of quantifying uncertainty for the estimation error of the leading eigenvector using Oja’s algorithm for streaming PCA, where the data are generated IID from some unknown distribution. Combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and concentration of matrix products, we establish a distributional approximation result for the error between the population eigenvector and the output of Oja’s algorithm. We also propose an online multiplier bootstrap algorithm and establish conditions under which the bootstrap distribution is close to the corresponding sampling distribution with high probability. While there are optimal rates for the streaming PCA problem, they typically apply to the IID setting, whereas in many applications like distributed optimization, the data is generated from a Markov chain and the goal is to infer parameters of the limiting stationary distribution. If time permits, I will also present our near-optimal finite sample guarantees which remove the logarithmic dependence on the sample size in previous work, where Markovian data is downsampled to get a nearly independent data stream.

Bio: Purnamrita Sarkar is an associate professor of Statistics at the University of Texas at Austin. Their interests are in the intersection of asymptotic statistics, scalable algorithms and networks and recently on uncertainty estimation for streaming algorithms and resampling methods for networks. Dr. Sarkar is affiliated with the AI institute and EnCORE: Institute for Emerging CORE Methods of Data Science. They were a postdoctoral scholar at the University of California, Berkeley working on asymptotic theory for network models and the nonparametric bootstrap for big data. Dr, Sarkar earned their PhD from the Machine Learning Department at Carnegie Mellon University
 

Details

Organizer

Other

Format
Hybrid
Speaker
Purnamrita Sarkar

Venue

  • 3234 Matthews Ln
    La Jolla, CA 92093 United States
    + Google Map