This event has passed.

Some new results for streaming principal component analysis

Name: Some new results for streaming principal component analysis
Start: 2023-10-02T14:00:00-07:00
End: 2023-10-02T15:00:00-07:00

October 2, 2023 @ 2:00 pm - 3:00 pm

Abstract: While streaming PCA (also known as Oja’s algorithm) was proposed about four decades ago and has roots going back to 1949, theoretical resolution in terms of obtaining optimal convergence rates has been obtained only in the last decade. However, we are not aware of any available distributional guarantees, which can help provide confidence intervals on the quality of the solution. In this talk, I will present the problem of quantifying uncertainty for the estimation error of the leading eigenvector using Oja’s algorithm for streaming PCA, where the data are generated IID from some unknown distribution. Combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and concentration of matrix products, we establish a distributional approximation result for the error between the population eigenvector and the output of Oja’s algorithm. We also propose an online multiplier bootstrap algorithm and establish conditions under which the bootstrap distribution is close to the corresponding sampling distribution with high probability. While there are optimal rates for the streaming PCA problem, they typically apply to the IID setting, whereas in many applications like distributed optimization, the data is generated from a Markov chain and the goal is to infer parameters of the limiting stationary distribution. If time permits, I will also present our near-optimal finite sample guarantees which remove the logarithmic dependence on the sample size in previous work, where Markovian data is downsampled to get a nearly independent data stream.

Bio: Purnamrita Sarkar is an associate professor of Statistics at the University of Texas at Austin. Their interests are in the intersection of asymptotic statistics, scalable algorithms and networks and recently on uncertainty estimation for streaming algorithms and resampling methods for networks. Dr. Sarkar is affiliated with the AI institute and EnCORE: Institute for Emerging CORE Methods of Data Science. They were a postdoctoral scholar at the University of California, Berkeley working on asymptotic theory for network models and the nonparametric bootstrap for big data. Dr, Sarkar earned their PhD from the Machine Learning Department at Carnegie Mellon University

Details

Date: October 2, 2023
Time:
2:00 pm - 3:00 pm

Series:

Special Seminar Series

Event Category: Seminar

Organizer

HDSI General

Other

Format: Hybrid
Speaker: Purnamrita Sarkar

Venue

3234 Matthews Ln
La Jolla, CA 92093 United States + Google Map

Contact Us

Find us

Email us

Phone support

Some new results for streaming principal component analysis

Details

Organizer

Other

Venue