- January 10, 2024
- HDSIComm
Universal Learning for Decision-Making | Moise Blanchard
We provide general-use decision-making algorithms under provably minimal assumptions on the data, using the universal learning framework. Classically, learning guarantees typically require two types of assumptions: (1) restrictions on target policies to be learned and (2) assumptions on the data-generating process. Instead, we show that we can provide consistent algorithms with vanishing regret compared to the best policy in hindsight, (1) irrespective of the optimal policy, known as universal consistency, and (2) well beyond standard i.i.d. or stationary assumptions on the data. We present our results for the classical online regression problem as well as for the contextual bandit problem, where the learner’s rewards depend on their selected actions and an observable context. This generalizes the standard multi-armed bandit to the case where side information is available, e.g., patients’ records or customers’ history, which allows for personalized treatment. Precisely, we give necessary and sufficient conditions on the context-generating process for universal consistency to be possible. Surprisingly, for finite action spaces, universally learnable processes are the same for contextual bandits as for the supervised learning setting, suggesting that going from full feedback (supervised learning) to partial feedback (contextual bandits) came at no extra cost in terms of learnability. We then show that there always exists an algorithm that guarantees universal consistency whenever this is achievable. In particular, such an algorithm is universally consistent under provably minimal assumptions: if it fails to be universally consistent for some context-generating process, then no other algorithm would succeed either. In the case of finite action spaces, this algorithm balances a fine trade-off between generalization (similar to structural risk minimization) and personalization (tailoring actions to specific contexts).
Read More- January 10, 2024
- HDSIComm
Algorithm Dynamics in Modern Statistical Learning: Universality and Implicit Regularization | Tianhao Wang
Modern statistical learning is featured by the high-dimensional nature of data and over-parameterization of models. In this regime, analyzing the dynamics of the used algorithms is challenging but crucial for understanding the performance of learned models. This talk will present recent results on the dynamics of two pivotal algorithms: Approximate Message Passing (AMP) and Stochastic Gradient Descent (SGD). Specifically, AMP refers to a class of iterative algorithms for solving large-scale statistical problems, whose dynamics admit asymptotically a simple but exact description known as state evolution. We will demonstrate the universality of AMP’s state evolution over large classes of random matrices, and provide illustrative examples of applications of our universality results. Secondly, for SGD, a workhorse for training deep neural networks, we will introduce a novel mathematical framework for analyzing its implicit regularization. This is essential for SGD’s ability to find solutions with strong generalization performance, particularly in the case of over-parameterization. Our framework offers a general method to characterize the implicit regularization induced by gradient noise. Finally, in the context of underdetermined linear regression, we will show that both AMP and SGD can provably achieve sparse recovery, yet they do so from markedly different perspectives.
Read More- January 5, 2024
- HDSIComm
Statistical insight for biomedical data science, with applications to single-cell RNA-sequencing data | Yiqun Chen
My research centers around bringing statistical insights and understanding to the practice of modern data science, and I will cover two projects related to this research vision in this talk.
The first part of the talk is motivated by the practice of testing data-driven hypotheses. In the biomedical sciences, it has become increasingly common to collect massive datasets without a pre-specified research question. In this setting, a data analyst might use the data both to generate a research question, and to test the associated null hypothesis. For example, in single-cell RNA-sequencing analyses, researchers often first cluster the cells, and then test for differences in the expected gene expression levels between the clusters to quantify up- or down-regulation of genes, annotate known cell types, and identify new cell types. However, this popular practice is invalid from a statistical perspective: once we have used the data to generate hypotheses, standard statistical inference tools are no longer valid. To tackle this problem, I developed a conditional selective approach to test for a difference in means between pairs of clusters obtained via k-means clustering.
The proposed approach has appropriate statistical guarantees (e.g., selective Type 1 error control). In the second part of the talk, I will consider how to leverage large language models (LLMs) such as ChatGPT for biomedical discovery. While significant progress has been made in customizing large language models for biomedical data, these models often require extensive data curation and resource-intensive training. In the context of single-cell RNA-sequencing data, I will show that we can achieve surprisingly competitive results on many downstream tasks via a much simpler alternative: I input textual descriptions of genes into an off-the-shelf LLM, such as ChatGPT, to obtain low-dimensional representations of the genes, or “embeddings.” I then use these embeddings as features in downstream tasks. A similar approach enables LLM-derived embeddings of cells. This work highlights the potential of LLMs to provide meaningful and concise representations for biomedical data, and also raises a number of challenging statistical questions. Addressing these questions requires bringing principled statistical thinking to the practice of modern data science.
Read More- January 5, 2024
- HDSIComm
Inference and Decision-Making amid Social Interactions | Shuangning Li
From social media trends to family dynamics, social interactions shape our daily lives. In this talk, I will present tools I have developed for statistical inference and decision-making in light of these social interactions.
(1) Inference: I will talk about estimation of causal effects in the presence of interference. In causal inference, the term “interference” refers to a situation where, due to interactions between units, the treatment assigned to one unit affects the observed outcomes of others. I will discuss large-sample asymptotics for treatment effect estimation under network interference where the interference graph is a random draw from a graphon. When targeting the direct effect, we show that popular estimators in our setting are considerably more accurate than existing results suggest. Meanwhile, when targeting the indirect effect, we propose a consistent estimator in a setting where no other consistent estimators are currently available.
(2) Decision-Making: Turning to reinforcement learning amid social interactions, I will focus on a problem inspired by a specific class of mobile health trials involving both target individuals and their care partners. These trials feature two types of interventions: those targeting individuals directly and those aimed at improving the relationship between the individual and their care partner. I will present an online reinforcement learning algorithm designed to personalize the delivery of these interventions. The algorithm’s effectiveness is demonstrated through simulation studies conducted on a realistic test bed, which was constructed using data from a prior mobile health study. The proposed algorithm will be implemented in the ADAPTS HCT clinical trial, which seeks to improve medication adherence among adolescents undergoing allogeneic hematopoietic stem cell transplantation.
Read More- March 1, 2023
- Kaleigh O'Merry
Sampling from Graphical Models via Spectral Independence | Zongchen Chen
In many scientific settings we use a statistical model to describe a high-dimensional distribution over many variables. Such models are often represented as a weighted graph encoding the dependencies between different variables and are known as graphical models. Graphical models arise in a wide variety of scientific fields throughout science and engineering.
Read MoreCategories
Archives
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- October 2023
- September 2023
- July 2023
- June 2023
- April 2023
- August 2022
- July 2022
- June 2022
- April 2022
- March 2022
- February 2022
- January 2022
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- March 2020
- February 2020
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- February 2019
- January 2019
- October 2018
- January 2018
Tags
Tag Cloud
AI Anniversary Artificial Intelligence Autoencoders Bioengineering Causal Inference Causality Cognitive Science Computational cognition DARPA Data Analysis Data Governance Data Science Data Visualization Deep Learning Diffusion Models Dissertation Defense Engineering Ethics Excellence Global Policy Graduate Students HDSI Higher Education Image Classification Image Processing Language Models Latent Variables Linguistics LLM Machine Learning Mathematics Mathworks MATLAB ML Oceanography Physics Privacy Programming Simulink Statistics Teaching UC San diego UCSD Undergraduate Student Research
Post List
-
Data Science Freshman Makes His First Clouds
January 17, 2018 By pendari1080 -
FIRST VISITING SCHOLAR TO HALICIOĞLU INSTITUTE BRINGS EXPERTISE IN SENSORS AND DATA PRIVACY
October 8, 2018 By pendari1080 UC San Diego seeks way to help baseball pitchers avoid arm injuries
October 15, 2018 By pendari1080
Recent Posts
- First “SMASH” Meeting!
- UC San Diego’s Data Planet Initiative Marks Next Phase of Growth and Leadership
- New UC San Diego School Deepens Exploration of Artificial Intelligence
- UC San Diego Researchers Rank Among Top 100 Most Cited AI Papers of 2023
- UC San Diego’s Newest School Merges Data Science and AI Innovation