Detection and recovery of low-rank signals under heteroskedastic noise
A fundamental task in data analysis is to detect and recover a low-rank signal in a noisy data matrix. Typically, this task is addressed by inspecting and manipulating the spectrum of the observed data, e.g., thresholding the singular values of the data matrix at a certain critical level. This approach is well-established in the case of homoskedastic noise, where the noise variance is identical across the entries. However, in numerous applications, such as single-cell RNA sequencing (scRNA-seq), the noise can be heteroskedastic, where the noise characteristics vary considerably across the rows and columns of the data. In such scenarios, the noise spectrum can differ significantly from the homoskedastic case, posing various challenges for signal detection and recovery. In this talk, I will present a procedure for standardizing the noise spectrum by judiciously scaling the rows and columns of the data. Importantly, this procedure can provably enforce the standard spectral behavior of homoskedastic noise — the Marchenko-Pastur law. I will describe methods for estimating the required scaling factors directly from the observed data with suitable theoretical justification, and demonstrate the advantages of the proposed approach for signal detection and recovery in simulations and on real scRNA-seq data.