BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Halıcıoğlu Data Science Institute - UC San Diego - ECPv6.16.2//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Halıcıoğlu Data Science Institute - UC San Diego
X-ORIGINAL-URL:https://datascience.ucsd.edu
X-WR-CALDESC:Events for Halıcıoğlu Data Science Institute - UC San Diego
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20220313T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20221106T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20230312T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20231105T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20240310T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20241103T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20250309T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20251102T090000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240126T110000
DTEND;TZID=America/Los_Angeles:20240126T123000
DTSTAMP:20260530T124121
CREATED:20240110T203505Z
LAST-MODIFIED:20240123T171855Z
UID:10000420-1706266800-1706272200@datascience.ucsd.edu
SUMMARY:Universal Learning for Decision-Making | Moise Blanchard
DESCRIPTION:We provide general-use decision-making algorithms under provably minimal assumptions on the data\, using the universal learning framework. Classically\, learning guarantees typically require two types of assumptions: (1) restrictions on target policies to be learned and (2) assumptions on the data-generating process. Instead\, we show that we can provide consistent algorithms with vanishing regret compared to the best policy in hindsight\, (1) irrespective of the optimal policy\, known as universal consistency\, and (2) well beyond standard i.i.d. or stationary assumptions on the data. We present our results for the classical online regression problem as well as for the contextual bandit problem\, where the learner’s rewards depend on their selected actions and an observable context. This generalizes the standard multi-armed bandit to the case where side information is available\, e.g.\, patients’ records or customers’ history\, which allows for personalized treatment. Precisely\, we give necessary and sufficient conditions on the context-generating process for universal consistency to be possible. Surprisingly\, for finite action spaces\, universally learnable processes are the same for contextual bandits as for the supervised learning setting\, suggesting that going from full feedback (supervised learning) to partial feedback (contextual bandits) came at no extra cost in terms of learnability. We then show that there always exists an algorithm that guarantees universal consistency whenever this is achievable. In particular\, such an algorithm is universally consistent under provably minimal assumptions: if it fails to be universally consistent for some context-generating process\, then no other algorithm would succeed either. In the case of finite action spaces\, this algorithm balances a fine trade-off between generalization (similar to structural risk minimization) and personalization (tailoring actions to specific contexts). \nBio: Moïse Blanchard is a final year PhD student at MIT\, working with Prof. Patrick Jaillet. He obtained his MSc in applied mathematics as valedictorian of Ecole Polytechnique. His research focuses on algorithms for decision-making and statistical learning. His work has been recognized with a best student paper runner-up award at COLT and a best student paper award from the Informs TSL society.
URL:https://datascience.ucsd.edu/event/statistics-moise-blanchard/
LOCATION:3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240124T140000
DTEND;TZID=America/Los_Angeles:20240124T153000
DTSTAMP:20260530T124121
CREATED:20240121T234257Z
LAST-MODIFIED:20240122T001144Z
UID:10000424-1706104800-1706110200@datascience.ucsd.edu
SUMMARY:Statistics | Enric Boix
DESCRIPTION:
URL:https://datascience.ucsd.edu/event/statistics-enric-boix/
LOCATION:Computer Science & Engineering Building (CSE)\, Room 1242\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240122T140000
DTEND;TZID=America/Los_Angeles:20240122T153000
DTSTAMP:20260530T124121
CREATED:20240110T203331Z
LAST-MODIFIED:20240122T000929Z
UID:10000419-1705932000-1705937400@datascience.ucsd.edu
SUMMARY:Algorithm Dynamics in Modern Statistical Learning: Universality and Implicit Regularization | Tianhao Wang
DESCRIPTION:Abstract: Modern statistical learning is featured by the high-dimensional nature of data and over-parameterization of models. In this regime\, analyzing the dynamics of the used algorithms is challenging but crucial for understanding the performance of learned models. This talk will present recent results on the dynamics of two pivotal algorithms: Approximate Message Passing (AMP) and Stochastic Gradient Descent (SGD). Specifically\, AMP refers to a class of iterative algorithms for solving large-scale statistical problems\, whose dynamics admit asymptotically a simple but exact description known as state evolution. We will demonstrate the universality of AMP’s state evolution over large classes of random matrices\, and provide illustrative examples of applications of our universality results. Secondly\, for SGD\, a workhorse for training deep neural networks\, we will introduce a novel mathematical framework for analyzing its implicit regularization. This is essential for SGD’s ability to find solutions with strong generalization performance\, particularly in the case of over-parameterization. Our framework offers a general method to characterize the implicit regularization induced by gradient noise. Finally\, in the context of underdetermined linear regression\, we will show that both AMP and SGD can provably achieve sparse recovery\, yet they do so from markedly different perspectives. \nBio: Tianhao Wang is a final-year Ph.D. student in the Department of Statistics and Data Science at Yale University\, advised by Prof. Zhou Fan. His research focuses on the mathematical foundations of statistics and machine learning.
URL:https://datascience.ucsd.edu/event/statistics-tianhao-wang/
LOCATION:3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240119T110000
DTEND;TZID=America/Los_Angeles:20240119T120000
DTSTAMP:20260530T124121
CREATED:20240106T004809Z
LAST-MODIFIED:20240116T164734Z
UID:10000417-1705662000-1705665600@datascience.ucsd.edu
SUMMARY:Statistical insight for biomedical data science\, with applications to single-cell RNA-sequencing data | Yiqun Chen
DESCRIPTION:My research centers around bringing statistical insights and understanding to the practice of modern data science\, and I will cover two projects related to this research vision in this talk. \nThe first part of the talk is motivated by the practice of testing data-driven hypotheses. In the biomedical sciences\, it has become increasingly common to collect massive datasets without a pre-specified research question. In this setting\, a data analyst might use the data both to generate a research question\, and to test the associated null hypothesis. For example\, in single-cell RNA-sequencing analyses\, researchers often first cluster the cells\, and then test for differences in the expected gene expression levels between the clusters to quantify up- or down-regulation of genes\, annotate known cell types\, and identify new cell types. However\, this popular practice is invalid from a statistical perspective: once we have used the data to generate hypotheses\, standard statistical inference tools are no longer valid. To tackle this problem\, I developed a conditional selective approach to test for a difference in means between pairs of clusters obtained via k-means clustering. \nThe proposed approach has appropriate statistical guarantees (e.g.\, selective Type 1 error control). In the second part of the talk\, I will consider how to leverage large language models (LLMs) such as ChatGPT for biomedical discovery. While significant progress has been made in customizing large language models for biomedical data\, these models often require extensive data curation and resource-intensive training. In the context of single-cell RNA-sequencing data\, I will show that we can achieve surprisingly competitive results on many downstream tasks via a much simpler alternative: I input textual descriptions of genes into an off-the-shelf LLM\, such as ChatGPT\, to obtain low-dimensional representations of the genes\, or “embeddings.” I then use these embeddings as features in downstream tasks. A similar approach enables LLM-derived embeddings of cells. This work highlights the potential of LLMs to provide meaningful and concise representations for biomedical data\, and also raises a number of challenging statistical questions. Addressing these questions requires bringing principled statistical thinking to the practice of modern data science.
URL:https://datascience.ucsd.edu/event/statistics-yiqun-chen/
LOCATION:3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240117T140000
DTEND;TZID=America/Los_Angeles:20240117T150000
DTSTAMP:20260530T124121
CREATED:20240106T004650Z
LAST-MODIFIED:20240110T202753Z
UID:10000416-1705500000-1705503600@datascience.ucsd.edu
SUMMARY:Inference and Decision-Making amid Social Interactions | Shuangning Li
DESCRIPTION:From social media trends to family dynamics\, social interactions shape our daily lives. In this talk\, I will present tools I have developed for statistical inference and decision-making in light of these social interactions. \n(1) Inference: I will talk about estimation of causal effects in the presence of interference. In causal inference\, the term “interference” refers to a situation where\, due to interactions between units\, the treatment assigned to one unit affects the observed outcomes of others. I will discuss large-sample asymptotics for treatment effect estimation under network interference where the interference graph is a random draw from a graphon. When targeting the direct effect\, we show that popular estimators in our setting are considerably more accurate than existing results suggest. Meanwhile\, when targeting the indirect effect\, we propose a consistent estimator in a setting where no other consistent estimators are currently available. \n(2) Decision-Making: Turning to reinforcement learning amid social interactions\, I will focus on a problem inspired by a specific class of mobile health trials involving both target individuals and their care partners. These trials feature two types of interventions: those targeting individuals directly and those aimed at improving the relationship between the individual and their care partner. I will present an online reinforcement learning algorithm designed to personalize the delivery of these interventions. The algorithm’s effectiveness is demonstrated through simulation studies conducted on a realistic test bed\, which was constructed using data from a prior mobile health study. The proposed algorithm will be implemented in the ADAPTS HCT clinical trial\, which seeks to improve medication adherence among adolescents undergoing allogeneic hematopoietic stem cell transplantation. \nBio: Shuangning Li is currently a postdoctoral fellow working with Professor Susan Murphy in the Department of Statistics at Harvard University. Prior to this\, they earned their Ph.D. from the Department of Statistics at Stanford University\, where they were advised by Professors Emmanuel Candès and Stefan Wager. Before attending Stanford\, Shuangning obtained their Bachelor of Science degree from the University of Hong Kong.
URL:https://datascience.ucsd.edu/event/statistics-shuangning-li/
LOCATION:Computer Science & Engineering Building (CSE)\, Room 4140\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20230320T140000
DTEND;TZID=America/Los_Angeles:20230320T140000
DTSTAMP:20260530T124121
CREATED:20230302T000631Z
LAST-MODIFIED:20230317T154618Z
UID:10000352-1679320800-1679320800@datascience.ucsd.edu
SUMMARY:Sampling from Graphical Models via Spectral Independence | Zongchen Chen
DESCRIPTION:Abstract: In many scientific settings we use a statistical model to describe a high-dimensional distribution over many variables. Such models are often represented as a weighted graph encoding the dependencies between different variables and are known as graphical models. Graphical models arise in a wide variety of scientific fields throughout science and engineering.\nOne fundamental task for graphical models is to generate random samples from the associated distribution. The Markov chain Monte Carlo (MCMC) method is one of the simplest and most popular approaches to tackle such problems. Despite the popularity of graphical models and MCMC algorithms\, theoretical guarantees of their performance are not known even for some simple models. I will describe a new tool called “spectral independence” to analyze MCMC algorithms and more importantly to reveal the underlying structure behind such models. I will also discuss how these structural properties can be applied to sampling when MCMC fails and to other statistical problems like parameter learning or model fitting. \n\nBio: Zongchen Chen is an instructor (postdoc) in Mathematics at MIT. He received his PhD degree in Algorithms\, Combinatorics and Optimization (ACO) at Georgia Tech in 2021 advised by Eric Vigoda. His thesis received the 2021 Georgia Tech College of Computing Outstanding Doctoral Dissertation Award. He received his BS degree in Mathematics & Applied Mathematics from Zhiyuan College at Shanghai Jiao Tong University in 2016. He is broadly interested in randomized algorithms\, discrete probability\, and machine learning. His current research interests include Markov chain Monte Carlo (MCMC) methods\, approximate counting and sampling\, and learning and testing for high-dimensional distributions.
URL:https://datascience.ucsd.edu/event/zongchen-chen/
LOCATION:SDSC\, The Auditorium\, 9836 Hopkins Dr\, La Jolla\, San Diego\, CA\, United States
CATEGORIES:Seminar
END:VEVENT
END:VCALENDAR