Contact Us

Give us a call or drop by anytime, we endeavor to answer all inquiries within 24 hours.


Find us

PO Box 16122 Collins Street West Victoria, Australia

Email us /

Phone support

Phone: + (066) 0760 0260 / + (057) 0760 0560


Changing any of the form inputs will cause the list of events to refresh with the filtered results.

Event Series Special Seminar Series

Some new results for streaming principal component analysis

HDSI Building, Room 123 3234 Matthews Ln, La Jolla, CA, United States

Abstract: While streaming PCA (also known as Oja’s algorithm) was proposed about four decades ago and has roots going back to 1949, theoretical resolution in terms of obtaining optimal convergence rates has been obtained only in the last decade. However, we are not aware of any available distributional guarantees, which can help provide confidence intervals on the quality of the solution. In this talk, I will present the problem of quantifying uncertainty for the estimation error of the leading eigenvector using Oja's algorithm for streaming PCA, where the data are generated IID from some unknown distribution. Combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and concentration of matrix products, we establish a distributional approximation result for the error between the population eigenvector and the output of Oja's algorithm. We also propose an online multiplier bootstrap algorithm and establish conditions under which the bootstrap distribution is close to the corresponding sampling distribution with high probability. While there are optimal rates for the streaming PCA problem, they typically apply to the IID setting, whereas in many applications like distributed optimization, the data is generated from a Markov chain and the goal is to infer parameters of the limiting stationary distribution. If time permits, I will also present our near-optimal finite sample guarantees which remove the logarithmic dependence on the sample size in previous work, where Markovian data is downsampled to get a nearly independent data stream.