Name: Detection and recovery of low-rank signals under heteroskedastic noise
Start: 2023-11-29T14:00:00-08:00
End: 2023-11-29T15:30:00-08:00

October 2023

Wed 11

October 11, 2023 @ 10:00 am - 11:00 am TILOS Seminar Series

TILOS Seminar: Towards Foundation Models for Graph Reasoning and AI 4 Science

TILOS Seminar: Towards Foundation Models for Graph Reasoning and AI 4 Science Michael Galkin, Research Scientist at AI Lab HDSI 123 and Zoom: https://ucsd.zoom.us/j/99334315002 Abstract: Foundation models in graph learning are […]

Wed 11

October 11, 2023 @ 11:00 am - 12:30 pm

EGEMEN KOLEMEN | SEMINAR ON FUSION ENERGY AND AI/ML JOINT SEMINAR: CSE, HDSI, MAE, SDSC

Recent advances in computing hardware (FPGAs, distributed parallel computing) and numerical methods (machine learning algorithms, automatic differentiation) create new possibilities for Fusion Power Plant optimization and control. In this talk, I will discuss some of the recent accomplishments of the Plasma Control Group at Princeton that take advantage of these new capabilities.

Wed 11

October 11, 2023 @ 2:00 pm - 3:30 pm Distinguished Lecturer Series

Steampunk Data Science

How did scientists make sense of data before statistics and computing? This talk will explore this question by focusing on the discovery of vitamins, which occurred in the early 20th century just before the advent of modern statistical methodology. I will describe the varied practices in experimentation and reporting and highlight the sorts of insights required to uncover what "works." Through this discussion, I will draw connections to contemporary data science tools to illustrate their pros and cons in facilitating discovery.

Mon 16

October 16, 2023 @ 1:00 pm - 2:00 pm

In Silico: Simulators, Emulators and the Human Brain Project

The European Human Brain Project, a flagship project of the European Union, recently ended after 10 years of research and development. The goals of the HBP were to (1) explore the complexity of the human brain in space and time; (2) to transfer the knowledge broadly; (3) to provide research infrastructure for neuro-science; and (4) to create a community of researchers. One of the major challenges is to model neural activity, from micro- to macro-scale, in a way that enables simulation of the human brain. This leads to so-called in silico experiments, which will be used “to validate models, and to perform investigations that are not possible in the laboratory”. I will present examples of such experiments and discuss how they relate to, and can benefit from, statistical research on the design and analysis of computer experiments. My students and colleagues and I have been working on the potential advantages of replacing a slow/expensive simulator with a much faster and cheaper statistical emulator. Emulators are empirical replicas trained on data generated with the simulator. We have often used Gaussian process regression for this purpose, but in some applications other methods (random forests, polynomial regression) proved more effective. Emulators can be especially useful when the simulator runs are matched to data in the context of statistical inference. I will discuss the modeling options and present examples, including simulation of neural basket cells, calcium induced neural reactions, and stochastic simulators like the Hodgkin-Huxley model.

Mon 16

October 16, 2023 @ 3:00 pm - 4:00 pm

IPE Data talk: Berk Ustun on Personalization and Worsenalization

The Institute for Practical Ethics' working group on Data Governance and Accountability (aka IPE Data) is thrilled to announce our first talk, with UCSD's very own Prof. Berk Ustun next […]

Tue 17

October 17, 2023 @ 2:00 pm - 3:00 pm Seminar Series

Learning and Inference: a view from theoretical computer science | Anindya De

Machine learning and high-dimensional inference are a wellspring of fundamental algorithmic challenges in data science. In this talk, I will discuss two strands of work in this line of research. (i) In the first part, I will talk about ""estimation of fit"" type problems -- the high level goal here is to understand how well the best model (from a class of models) fits the target data. My focus will be on ""sparse models"", a standard assumption in machine learning to deal with data deficiency. In this setting, we design algorithms for estimating fit where the complexity scales just in the sparsity parameter and is independent of the ambient dimension. (ii) In the second part, I will discuss ""noisy reconstruction problems"" where the algorithm gets access to the target object through noisy samples. This formulation captures many well studied problems in machine learning. We give a new general algorithmic technique for such problems based on Fourier analysis which we refer to as "Fourier stability" and use this to design new, state of the art algorithms for "population recovery" and "trace reconstruction" -- two basic problems in unsupervised learning which have received significant attention in theoretical computer science. Our algorithms leverage methods and techniques from Boolean function analysis -- a set of tools at the intersection of analysis and combinatorics -- which has been very influential in complexity theory, especially in areas like PCPs and hardness of approximation. Synergistically, the development of this new algorithmic toolkit for problems in learning and inference has also led to discovery of results of intrinsic interest to probability theory and analysis, some of which I will briefly survey.

Mon 30

October 30, 2023 @ 1:00 pm - 2:00 pm EnCORE Series: Is There a Theory behind Large ML Models

The Uneasy Relation Between Deep Learning and Statistics

Deep learning uses the language and tools of statistics and classical machine learning, including empirical and population losses and optimizing a hypothesis on a training set. But it uses these tools in regimes where they should not be applicable: the optimization task is non-convex, models are often large enough to overfit, and the training and deployment tasks can radically differ. In this talk I will survey the relation between deep learning and statistics. In particular we will discuss recent works supporting the emerging intuition that deep learning is closer in some aspects to human learning than to classical statistics. Rather than estimating quantities from samples, deep neural nets develop broadly applicable representations and skills through their training. The talk will not assume background knowledge in artificial intelligence or deep learning.

November 2023

Mon 13

November 13, 2023 @ 3:00 pm - 4:00 pm IPE Data Governance & Accountability Series

Enabling Equity Assessments of Government Programs: Law, Policy, and Methods

Governments have increasingly mandated equity assessments for potential demographic disparities in programs and services. This talk will consider some of the legal, policy, and methodological challenges for carrying out such mandates, particularly in the public sector context. First, the talk will discuss emerging tensions between informational privacy and equity assessments, as illustrated by the data minimization principle under the Privacy Act of 1974. Second, it will discuss the range of methods available to improve disparity assessments, including imputation, record linkage, surveys, and form collection. Third, it will illustrate these tensions with a case study of an equity assessment of racial disparities of IRS tax audits.

Wed 15

November 15, 2023 @ 2:00 pm - 3:00 pm Seminar Series

Improving Technical Communication with End Users about Differential Privacy

Differential privacy (DP) is widely regarded as a gold standard for privacy-preserving computation over users’ data. A key challenge with DP is that its mathematical sophistication makes its privacy guarantees difficult to communicate to users, leaving them uncertain about how and whether they are protected. Despite recent widespread deployment of DP, relatively little is known about what users think of differential privacy and how to effectively communicate the practical privacy guarantees it offers.

This talk will cover a series of recent and ongoing user studies aimed at measuring and improving communication with non-technical end users about differential privacy. The first set explores users' privacy expectations related to differential privacy and measures the efficacy of existing methods for communicating the privacy guarantees of DP systems. We find that users care about the kinds of information leaks against which differential privacy protects and are more willing to share their private information when the risk of these leaks is reduced. Additionally, we find that the ways in which differential privacy is described in-the-wild set users' privacy expectations haphazardly, which can be misleading depending on the deployment. Motivated by these findings, the second set of user studies develops and evaluates prototype descriptions designed to help end users understand DP guarantees. These descriptions target two important technical details in DP deployments that are often poorly communicated to end users: the privacy parameter epsilon (which governs the level of privacy protections) and the distinctions between the local and central models of DP (which governs who can access exact user data). Based on joint works with Gabriel Kaptchuk, Priyanka Nanayakkara, Elissa Redmiles, Mary Anne Smart, including https://arxiv.org/abs/2110.06452 and https://arxiv.org/abs/2303.00738.

Wed 29

November 29, 2023 @ 2:00 pm - 3:30 pm Seminar Series

Detection and recovery of low-rank signals under heteroskedastic noise

A fundamental task in data analysis is to detect and recover a low-rank signal in a noisy data matrix. Typically, this task is addressed by inspecting and manipulating the spectrum of the observed data, e.g., thresholding the singular values of the data matrix at a certain critical level. This approach is well-established in the case of homoskedastic noise, where the noise variance is identical across the entries. However, in numerous applications, such as single-cell RNA sequencing (scRNA-seq), the noise can be heteroskedastic, where the noise characteristics vary considerably across the rows and columns of the data. In such scenarios, the noise spectrum can differ significantly from the homoskedastic case, posing various challenges for signal detection and recovery. In this talk, I will present a procedure for standardizing the noise spectrum by judiciously scaling the rows and columns of the data. Importantly, this procedure can provably enforce the standard spectral behavior of homoskedastic noise -- the Marchenko-Pastur law. I will describe methods for estimating the required scaling factors directly from the observed data with suitable theoretical justification, and demonstrate the advantages of the proposed approach for signal detection and recovery in simulations and on real scRNA-seq data.

Contact Us

Find us

Email us

Phone support

Seminar

October 2023

TILOS Seminar: Towards Foundation Models for Graph Reasoning and AI 4 Science

EGEMEN KOLEMEN | SEMINAR ON FUSION ENERGY AND AI/ML JOINT SEMINAR: CSE, HDSI, MAE, SDSC

Steampunk Data Science

In Silico: Simulators, Emulators and the Human Brain Project

IPE Data talk: Berk Ustun on Personalization and Worsenalization

Learning and Inference: a view from theoretical computer science | Anindya De

The Uneasy Relation Between Deep Learning and Statistics

November 2023

Enabling Equity Assessments of Government Programs: Law, Policy, and Methods

Improving Technical Communication with End Users about Differential Privacy

Detection and recovery of low-rank signals under heteroskedastic noise

Contact Us

Find us

Email us

Phone support

Filters

October 2023

November 2023