BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Halıcıoğlu Data Science Institute - UC San Diego - ECPv6.16.2//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Halıcıoğlu Data Science Institute - UC San Diego
X-ORIGINAL-URL:https://datascience.ucsd.edu
X-WR-CALDESC:Events for Halıcıoğlu Data Science Institute - UC San Diego
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20220313T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20221106T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20230312T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20231105T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20240310T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20241103T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20250309T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20251102T090000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240221T140000
DTEND;TZID=America/Los_Angeles:20240221T153000
DTSTAMP:20260531T175630
CREATED:20240205T171619Z
LAST-MODIFIED:20240220T172031Z
UID:10000436-1708524000-1708529400@datascience.ucsd.edu
SUMMARY:The Synergy between Machine Learning and the Natural Sciences | Max Welling
DESCRIPTION:Abstract: Traditionally machine learning has been heavily influenced by neuroscience (hence the name artificial neural networks) and physics (e.g. MCMC\, Belief Propagation\, and Diffusion based Generative AI). We have recently witnessed that the flow of information has also reversed\, with new tools developed in the ML community impacting physics\, chemistry and biology. Examples include faster DFT\, Force-Field accelerated MD simulations\, PDE Neural Surrogate models\, generating druglike molecules\, and many more. In this talk I will review the exciting opportunities for further cross fertilization between these fields\, ranging from faster (classical) DFT calculations and enhanced transition path sampling to traveling waves in artificial neural networks. \nBio: Prof. Max Welling is a research chair in Machine Learning at the University of Amsterdam and a Distinguished Scientist at MSR. He is a fellow at the Canadian Institute for Advanced Research (CIFAR) and the European Lab for Learning and Intelligent Systems (ELLIS) where he also serves on the founding board. His previous appointments include VP at Qualcomm Technologies\, professor at UC Irvine\, postdoc at U. Toronto and UCL under supervision of prof. Geoffrey Hinton\, and postdoc at Caltech under supervision of prof. Pietro Perona. He finished his PhD in theoretical high energy physics under supervision of Nobel laureate prof. Gerard ‘t Hooft. \nMax Welling has served as associate editor in chief of IEEE TPAMI from 2011-2015\, he serves on the advisory board of the Neurips foundation since 2015 and has been program chair and general chair of Neurips in 2013 and 2014 respectively. He was also program chair of AISTATS in 2009 and ECCV in 2016 and general chair of MIDL 2018. Max Welling is recipient of the ECCV Koenderink Prize in 2010 and the ICML Test of Time award in 2021. He directs the Amsterdam Machine Learning Lab (AMLAB) and co-directs the Qualcomm-UvA deep learning lab (QUVA) and the Bosch-UvA Deep Learning lab (DELTA).
URL:https://datascience.ucsd.edu/event/distinguished-colloquium-max-welling/
LOCATION:Halıcıoğlu Data Science Institute (HDSI)\, Room 123\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Guest Lecture
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/02/Max_Welling_DLS_1240x650.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240221T123000
DTEND;TZID=America/Los_Angeles:20240221T140000
DTSTAMP:20260531T175630
CREATED:20240220T163858Z
LAST-MODIFIED:20240220T163858Z
UID:10000444-1708518600-1708524000@datascience.ucsd.edu
SUMMARY:Computational approaches for uncovering implicit strategies in political discourse | Julia Mendelsohn
DESCRIPTION:When discussing politics\, people often use subtle linguistic strategies to influence how their audience thinks about issues\, which can then impact public opinion and policy. For example\, anti-immigration activists may frame immigration as a threat to native born citizens’ jobs\, describe immigrants with dehumanizing vermin-related metaphors\, or even use coded expressions to covertly connect immigration with antisemitic conspiracy theories. This talk will focus on the development of computational approaches to analyze three strategies: framing\, dehumanization\, and dogwhistle communication. I will discuss how I draw from multiple social science disciplines to develop typologies and curate data resources\, as well as how I build and evaluate natural language processing models for detecting these strategies. I further analyze the use of these strategies in political discourse across several domains\, and assess the implications of such nuanced rhetoric for both society and technology.
URL:https://datascience.ucsd.edu/event/computational-approaches-for-uncovering-implicit-strategies-in-political-discourse-julia-mendelsohn-2/
LOCATION:GPS\, Robinson Building Complex (RBC)\, 3106
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240220T130000
DTEND;TZID=America/Los_Angeles:20240220T143000
DTSTAMP:20260531T175630
CREATED:20240220T182547Z
LAST-MODIFIED:20240220T182547Z
UID:10000446-1708434000-1708439400@datascience.ucsd.edu
SUMMARY:Learning Inductive Representations for Reasoning over Knowledge Graphs | Zhaocheng Zhu
DESCRIPTION:Abstract: Reasoning\, the ability to logically draw conclusions from existing knowledge\, has been long pursued as a goal of artificial intelligence. Although numerous learning algorithms have been developed for reasoning\, most of them are limited to the domain they are trained on. By contrast\, humans often derive high-level rules or principles from experience and apply them to new domains — an ability referred as inductive generalization. In this talk\, we present a series of works that learn inductive representations for reasoning over knowledge graphs. First\, we introduce Neural Bellman-Ford Networks (NBFNet) that captures paths between entities and can generalize to graphs of new entities. Then we discuss Graph Neural Network Query Executor (GNN-QE)\, an extension of NBNet that answers multi-hop logical queries and generalizes well on our inductive benchmark. Finally\, by learning inductive representations for both entities and relations\, we demonstrate that a model can generalize to any graph with arbitrary entity and relation vocabularies\, paving the way for foundation models for knowledge graph reasoning. \n \nBio: Zhaocheng Zhu is a final-year Ph.D. candidate advised by Prof. Jian Tang at Mila – Quebec AI Institute\, University of Montreal. His research interests include reasoning\, knowledge graphs and large language models. His works\, among the first to study inductive generalization across structures\, have led to a paradigm shift away from traditional knowledge graph embedding methods that have been used for years. He gave a tutorial on knowledge graph reasoning at AAAI 2022. He is also an active developer of machine learning systems\, and led the development of two open-source libraries\, GraphVite for large-scale embedding training and TorchDrug for drug discovery research.
URL:https://datascience.ucsd.edu/event/learning-inductive-representations-for-reasoning-over-knowledge-graphs-zhaocheng-zhu/
LOCATION:Computer Science & Engineering Building (CSE)\, Room 1242\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240215T123000
DTEND;TZID=America/Los_Angeles:20240215T140000
DTSTAMP:20260531T175630
CREATED:20240213T220919Z
LAST-MODIFIED:20240213T221018Z
UID:10000441-1708000200-1708005600@datascience.ucsd.edu
SUMMARY:Targeting humanitarian aid with machine learning and digital data | Emily Aiken
DESCRIPTION:Abstract: The majority of humanitarian aid and social protection programs globally are targeted\, providing assistance to individuals or communities identified to be poorest or most in need. In low- and middle-income countries\, the targeting of aid programs is often limited by low-quality\, out-of-date\, or missing data on poverty and vulnerability. Novel “big” digital data sources\, such as those captured by satellites\, mobile phones\, and financial services providers — when combined with advances in machine learning — can improve the accuracy of aid program targeting. In this talk\, I will cover empirical results on the accuracy of these new data-driven and algorithmic approaches to aid allocation\, and will discuss emergent implications for fairness\, privacy\, transparency\, and community dynamics.
URL:https://datascience.ucsd.edu/event/targeting-humanitarian-aid-with-machine-learning-and-digital-data-emily-aiken/
LOCATION:GPS\, Robinson Building Complex (RBC)\, 3203
CATEGORIES:Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240214T140000
DTEND;TZID=America/Los_Angeles:20240214T153000
DTSTAMP:20260531T175630
CREATED:20240201T193829Z
LAST-MODIFIED:20240220T172238Z
UID:10000434-1707919200-1707924600@datascience.ucsd.edu
SUMMARY:Enabling Performant and Trustworthy Learning-enabled CPS-IoT Systems | Mani Srivastava
DESCRIPTION:Abstract: “The previously discrete technologies of IoT and AI have now entered a tight virtuous embrace. IoT allows sensing and actuation in our physical\, social\, and urban spaces with unimaginable ubiquity. AI allows sophisticated inferences and decisions to be made algorithmically using deep neural networks\, even from unstructured and high- dimensional data\, with uncanny performance. Together they seek to perform sophisticated perception-cognition-communication-action loops in diverse applications. However\, designers of learning-enabled IoT systems face the challenge of extremely resource-constrained edge platforms operating in uncertain environments while assuring performance and trustworthiness. Moreover\, in many applications\, the systems go beyond taking actions based on rich inferences about the world state to perform long-term reasoning about complex events and obey the underlying physics\, rules\, and constraints. Based on our experience in designing such systems in applications including mHealth\, ocean animal health\, agriculture robotics\, and military\, This talk explores meeting these challenges through a combination of (i) neurosymbolic architectures that allow the incorporation of physics awareness and human knowledge while enhancing user trust\, (ii) automatic platform-aware architecture search and code generation\, and (iii) techniques to efficiently adapt to the deployment environment.”            \n \nBio: “Mani Srivastava is Distinguished Professor and Vice Chair at UCLA’s ECE Department with a joint appointment in the CS Department. His research is broadly in human-cyber-physical and IoT systems that are learning-enabled\, resource- constrained\, and trustworthy. It spans problems across the entire spectrum of applications\, architectures\, algorithms\, and technologies in the context of systems and applications for mHealth\, sustainable buildings\, smart environments\, etc. He is a Fellow of the ACM and the IEEE.”    
URL:https://datascience.ucsd.edu/event/special-seminar-mani-srivastava/
LOCATION:Halıcıoğlu Data Science Institute (HDSI)\, Room 123\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240213T130000
DTEND;TZID=America/Los_Angeles:20240213T140000
DTSTAMP:20260531T175630
CREATED:20240209T164305Z
LAST-MODIFIED:20240209T164305Z
UID:10000440-1707829200-1707832800@datascience.ucsd.edu
SUMMARY:On Data Ecology\, Data Markets\, the Value of Data\, and Dataflow Governance | Raul Castro Fernandez
DESCRIPTION:Abstract:\nData shapes our social\, economic\, cultural\, and technological environments. Data is valuable\, so people seek it\, inducing data to flow. The resulting dataflows distribute data and thus value. For example\, large Internet companies profit from accessing data from their users\, and engineers of large language models seek large and diverse data sources to train powerful models. It is possible to judge the impact of data in an environment by analyzing how the dataflows in that environment impact the participating agents. My research hypothesizes that it is also possible to design (better) data environments by controlling what dataflows materialize; not only can we analyze environments but also synthesize them. In this talk\, I present the research agenda on “data ecology\,” which seeks to build the principles\, theory\, algorithms\, and systems to design beneficial data environments. I will also present examples of data environments my group has designed\, including data markets for machine learning\, data-sharing\, and data integration. I will conclude by discussing the impact of dataflows in data governance and how the ideas are interwoven with the concepts of trust\, privacy\, and the elusive notion of “data value.” As part of the technical discussion\, I will complement the data market designs with the design of a data escrow system that permits controlling dataflows.\nBio:\nIn my research\, I ask what is the value of data and explore the potential of data markets to unlock that value. My group collaborates with economists\, legal scholars\, statisticians\, and domain scientists. We build systems to share\, discover\, prepare\, integrate\, and process data. I have traditionally worked on distributed query processing systems and continue to do so. I have received a SIGMOD’23 Test-of-time-Award. I am an assistant professor in the Department of Computer Science and on the Committee of Data Science at The University of Chicago. Before UChicago\, I did a postdoc at MIT with Sam Madden and Mike Stonebraker. And before that\, I completed a PhD at Imperial College London with Peter Pietzuch.
URL:https://datascience.ucsd.edu/event/on-data-ecology-data-markets-the-value-of-data-and-dataflow-governance-raul-castro-fernandez/
CATEGORIES:Colloquium,Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image-e1712856546428.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240213T080000
DTEND;TZID=America/Los_Angeles:20240213T170000
DTSTAMP:20260531T175630
CREATED:20240213T221205Z
LAST-MODIFIED:20240213T221205Z
UID:10000442-1707811200-1707843600@datascience.ucsd.edu
SUMMARY:Computational approaches for uncovering implicit strategies in political discourse | Julia Mendelsohn
DESCRIPTION:Abstract: When discussing politics\, people often use subtle linguistic strategies to influence how their audience thinks about issues\, which can then impact public opinion and policy. For example\, anti-immigration activists may frame immigration as a threat to native born citizens’ jobs\, describe immigrants with dehumanizing vermin-related metaphors\, or even use coded expressions to covertly connect immigration with antisemitic conspiracy theories. This talk will focus on the development of computational approaches to analyze three strategies: framing\, dehumanization\, and dogwhistle communication. I will discuss how I draw from multiple social science disciplines to develop typologies and curate data resources\, as well as how I build and evaluate natural language processing models for detecting these strategies. I further analyze the use of these strategies in political discourse across several domains\, and assess the implications of such nuanced rhetoric for both society and technology.
URL:https://datascience.ucsd.edu/event/computational-approaches-for-uncovering-implicit-strategies-in-political-discourse-julia-mendelsohn/
LOCATION:GPS\, Robinson Building Complex (RBC)\, 3106
CATEGORIES:Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240213T023000
DTEND;TZID=America/Los_Angeles:20240213T160000
DTSTAMP:20260531T175630
CREATED:20240207T223735Z
LAST-MODIFIED:20240220T172208Z
UID:10000437-1707791400-1707840000@datascience.ucsd.edu
SUMMARY:Principled Approaches for Trustworthy Algorithms\, Statistics\, and Machine Learning | Gautam Kamath
DESCRIPTION:Abstract: Despite impressive recent advances\, machine learning models exhibit a number of critical deficiencies. They are prone to leaking sensitive information about their training data. They remain alarmingly brittle to attacks by malicious parties. Troublingly\, these issues stem from more fundamental statistical vulnerabilities\, which remain unresolved even decades later\, highlighting significant gaps in our understanding of how to deal with these important considerations. As long as these problems remain\, our models will not be appropriate for use beyond deployment in toy settings. In this talk\, I will discuss recent advances on a number of these problems\, which give key new algorithmic insights into how to address these considerations\, and enable real-world deployments that were previously thought infeasible. In a first vignette\, we will explore how to guarantee individual privacy in machine learning models\, with a particular focus on large language models and the important role played by public data in the training pipeline. In a second vignette\, we focus on how to robustly perform mean estimation\, giving the first efficient and accurate algorithms for multivariate settings. We will go on to discuss connections to robustness against data poisoning attacks\, robust exploratory data analysis\, and surprising conceptual and technical connections with privacy. \nBio: Gautam Kamath is an Assistant Professor at the University of Waterloo\, and a Faculty Member and Canada CIFAR AI Chair at the Vector Institute for Artificial Intelligence. His research interests are in trustworthy algorithms\, statistics\, and machine learning\, particularly focusing on considerations like data privacy and robustness. He has a B.S. from Cornell University and a Ph.D. from MIT. He is the recipient of the 2023 Golden Jubilee Research Excellence Award\, recognizing him as the most outstanding junior researcher in the University of Waterloo’s Faculty of Math. Beyond research\, he is celebrated for his teaching. His course on differential privacy is the most popular resource for learning the topic\, with his lecture videos having over 100\,000 views. He has also given invited tutorials on the topic in multiple different countries. He is further well known for his passion and commitment to service and improving the community. Besides organizing and chairing several workshops and conferences\, he is an Editor-in-Chief of Transactions on Machine Learning Research\, and on the Executive Committee of the Learning Theory Alliance.
URL:https://datascience.ucsd.edu/event/principled-approaches-for-trustworthy-algorithms-statistics-and-machine-learning-gautam-kamath/
LOCATION:Halıcıoğlu Data Science Institute (HDSI)\, Room 123\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240212T140000
DTEND;TZID=America/Los_Angeles:20240212T153000
DTSTAMP:20260531T175630
CREATED:20240126T182854Z
LAST-MODIFIED:20240220T172343Z
UID:10000430-1707746400-1707751800@datascience.ucsd.edu
SUMMARY:Integrating Longitudinal Multimodal Data To Realize Precision Medicine | Samantha Piekos
DESCRIPTION:Abstract: The interplay of biology\, environment\, and lifestyle direct the development and progression of complex diseases and other health outcomes. Therefore\, integration of longitudinal multimodal data is needed to understand the mechanisms underpinning major molecular transitions. Previously during my doctoral work at Stanford\, I integrated multiomics data to elucidate the epigenetic mechanism of human surface ectoderm differentiation. I also built a pipeline to investigate the role of polymorphism\, particularly non-coding genetic variants\, in complex diseases. To address the common pain point of data silos limiting the interpretation of multimodal data integration\, I formed a collaboration with Google Data Commons to build a free\, open-source biomedical knowledge graph with a common schema and API. Currently it is composed of approximately 130 million nodes and 1.7 trillion triples (node-edge-node) from 22 publicly available biomedical datasets. Knowledge graphs are a key tool for hypothesis generation\, data interpretation\, and dimensionality reduction required for systems medicine research. Upon starting my postdoctoral work at the Institute for Systems Biology\, I identified pregnancy as an excellent model system for prototyping precision medicine approaches. I used electronic healthcare records (EHR) from Providence St. Joseph Healthcare to investigate the impact of COVID-19 maternal infection and vaccination on maternal-fetal outcomes. In addition\, I integrated multiomics placental data to investigate molecular network changes (interomics and intraomics) in common obstetric disorders. In a follow-up study (enrollment complete) we have longitudinal deep-phenotyping data of 435 people throughout pregnancy 80 of which have pregnancy complications. This includes multiomics\, survey\, EHR\, and air quality data collected from first prenatal visit through delivery. My lab will use this data to define major molecular transition states throughout pregnancy. I will also investigate the disease mechanisms of common obstetric disorders including identifying for an individual the earliest possible point of deviation from a healthy trajectory. This interdisciplinary approach will identify potential drug targets\, biomarker panels\, and individualized clinical interventions. \n  \nBio: Samantha completed her PhD in Stem Cell Biology and Regenerative Medicine with a PhD minor in Biomedical Informatics at Stanford University under the advisement of Dr. Anthony Oro. Using a multiomics approach\, Samantha demonstrated how transcription factors direct keratinocyte differentiation by changing the epigenetic landscape\, including chromatin looping\, thereby effecting the cell transcriptional program. Samantha has also been collaborating with Google since June 2019 to build Biomedical Data Commons\, a knowledge graph that integrates biomedical data from a wide array of sources into a single searchable database thereby increasing data accessibility. Upon completion of her PhD in 2020\, began her postdoctoral fellowship at the Institute for Systems Biology under the advisement of Drs. Lee Hood and Nathan Price. Using electronic healthcare records (EHR)\, she has provided insight into the impact of maternal COVID-19 and vaccination on maternal-fetal outcomes. In addition to her EHR research\, Samantha is using multidimensional omics placental data to understand the molecular mechanism of common obstetric disorders. Upon transitioning to Assistant Professor\, she intends to perform multimodal data integration of longitudinal deep-phenotyping data to evaluate changes in molecular networks in complex diseases.
URL:https://datascience.ucsd.edu/event/special-seminar-samantha-piekos/
LOCATION:Halıcıoğlu Data Science Institute (HDSI)\, Room 123\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240209T130000
DTEND;TZID=America/Los_Angeles:20240209T140000
DTSTAMP:20260531T175630
CREATED:20240209T164509Z
LAST-MODIFIED:20240209T164509Z
UID:10000439-1707483600-1707487200@datascience.ucsd.edu
SUMMARY:EnCORE : Theoretical Exploration of Foundation Model Adaptation\, Kangwook Lee\, UW Madison\, Feb 9th\, 1-2pm
DESCRIPTION:Abstract: Due to the enormous size of foundation models\, various new methods for efficient model adaptation have been developed. Parameter-efficient fine-tuning (PEFT) is an adaptation method that updates only a tiny fraction of the model parameters\, leaving the remainder unchanged. In-context Learning (ICL) is a test-time adaptation method\, which repurposes foundation models by providing them with labeled samples as part of the input context. Given the growing importance of this emerging paradigm\, developing theoretical foundations for the new paradigm is of utmost importance. \nIn this talk\, I will introduce two preliminary results toward this goal. In the first part\, I will present a theoretical analysis of Low-Rank Adaptation (also known as LoRA)\, one of the most popular PEFT methods today. Our analysis of the expressive power of LoRA not only helps us better understand the high adaptivity of LoRA observed in practice but also provides insights to practitioners. In the second part\, I will introduce our probabilistic framework for a better understanding of ICL. With our framework\, one can analyze the transition between two distinct modes of ICL: task retrieval and learning. We also discuss how our framework can help explain and predict various phenomena\, which can be observed with large language models in practice yet not fully explained. \nBio: Kangwook Lee is an Assistant Professor in the Electrical and Computer Engineering Department and the Computer Sciences Department (by courtesy) at the University of Wisconsin-Madison. Previously\, he was a Research Assistant Professor at the Information and Electronics Research Institute of KAIST and was a postdoctoral scholar at the same institute. He received his PhD in 2016 from the Electrical Engineering and Computer Science department at UC Berkeley. He is the recipient of the IEEE Joint Communications Society/Information Theory Society Paper Award (2020) and the KSEA Young Investigator Grant Award (2022).
URL:https://datascience.ucsd.edu/event/encore-theoretical-exploration-of-foundation-model-adaptation-kangwook-lee-uw-madison-feb-9th-1-2pm/
LOCATION:Atkinson Hall\, Fourth Floor
CATEGORIES:Colloquium,Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2023/10/Encore-logo_HDSI-Website.png
ORGANIZER;CN="k1omerry@ucsd.edu":MAILTO:k1omerry@ucsd.edu
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240208T130000
DTEND;TZID=America/Los_Angeles:20240208T143000
DTSTAMP:20260531T175630
CREATED:20240121T235126Z
LAST-MODIFIED:20240205T165821Z
UID:10000427-1707397200-1707402600@datascience.ucsd.edu
SUMMARY:Inference in context: Statistical theory and thinking | Jeffrey Bye
DESCRIPTION:Abstract: The likelihood function plays a foundational role in statistical theory. I will demonstrate my teaching philosophy and approach through a lesson on maximum likelihood estimation and its connection to Neyman-Pearson\, Bayesian\, and other approaches to statistical and scientific inference. I will then expand on the role of context in statistical thinking\, particularly how it informs my scholarship on how people learn about data\, math\, statistics\, and programming. \nBio: Jeffrey K. Bye is an interdisciplinary teacher and researcher who received his Ph.D. from UCLA in Cognitive Psychology with a specialization in computational modeling and minor in quantitative methods. He has years of experience teaching undergraduate- and graduate-level statistics and programming at UCLA and University of Minnesota. His research blends cognitive and learning science approaches to understand how people learn and think about data\, math\, statistics\, and programming. He is passionate about making science more collaborative\, inclusive\, and understandable to the public.
URL:https://datascience.ucsd.edu/event/special-seminar-jeffrey-bye/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240205T140000
DTEND;TZID=America/Los_Angeles:20240205T153000
DTSTAMP:20260531T175630
CREATED:20240121T234846Z
LAST-MODIFIED:20240202T204856Z
UID:10000426-1707141600-1707147000@datascience.ucsd.edu
SUMMARY:Algebraic vision: A gentle introduction | Jessie Loucks-Tavitas
DESCRIPTION:Abstract:\n\nMy talk will be broken into three parts:\nPart I: Meet Jessie.\nPart II: Assessing Deep Learning Models. A short lesson on assessment criteria for deep learning models\, such as LLMs and image segmentation models.\nPart III: Algebraic Vision\, a Gentle Introduction. Algebraic vision\, lying in the intersection of computer vision and projective geometry\, is the study of 3D objects being photographed by multiple cameras\, using techniques found in computational algebraic geometry. Two natural questions arise: (1) Given a 3D object and multiple images of it\, can we determine the relative camera positions? And\, (2) given multiple images as well as relative camera locations\, can we reconstruct the object being photographed? Carlsson and Weinshall showed in 1998 that the algorithms to solve these problems are intrinsically connected. A beneficial corollary of recent joint work with Erin Connelly and Timothy Duff is a formalization of this “duality” mechanism. We will discuss this formalization\, along with some future directions that we hope to venture down.\n\n \nBio: Jessie Loucks-Tavitas is currently a 6th-year PhD candidate in mathematics at the University of Washington. She received her MS in mathematics in 2022\, following her BA in mathematics in 2018 from California State University\, Sacramento. Jessie’s commitment to higher education and supporting underrepresented groups has been acknowledged with the Gloria Hewitt Endowed Fellowship and the Excellence in Teaching Award from the UW mathematics department in 2022. Outside of academic pursuits\, Jessie finds joy in drinking black coffee\, cozying up with a book and her two cats\, and adventuring with her friends and family in her newfound love for skiing.
URL:https://datascience.ucsd.edu/event/special-seminar-jessie-loucks-tavitas/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240205T140000
DTEND;TZID=America/Los_Angeles:20240205T150000
DTSTAMP:20260531T175630
CREATED:20240205T170247Z
LAST-MODIFIED:20240205T170333Z
UID:10000435-1707141600-1707145200@datascience.ucsd.edu
SUMMARY:Scaling Data-Constrained Language Model
DESCRIPTION:Extrapolating scaling trends suggest that training dataset size for LLMs may soon be limited by the amount of text data available on the internet. In this talk we investigate scaling language models in data-constrained regimes. Specifically\, we run a set of empirical experiments varying the extent of data repetition and compute budget. From these experiments we propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. Finally\, we discuss and experiment with approaches for mitigating data scarcity.\n \nBio: Alexander “Sasha” Rush is an Associate Professor at Cornell Tech and a researcher at Hugging Face. His research interest is in the study of language models with applications in controllable text generation\, efficient inference\, and applications in summarization and information extraction. In addition to research\, he has written several popular open-source software projects supporting NLP research\, programming for deep learning\, and virtual academic conferences. His projects have received paper and demo awards at major NLP\, visualization\, and hardware conferences\, an NSF Career Award and Sloan Fellowship. He tweets at @srush_nlp.\n\n\n \n \n 
URL:https://datascience.ucsd.edu/event/scaling-data-constrained-language-model/
LOCATION:Virtual
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2023/10/Encore-logo_HDSI-Website.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240205T110000
DTEND;TZID=America/Los_Angeles:20240205T123000
DTSTAMP:20260531T175630
CREATED:20240126T182051Z
LAST-MODIFIED:20240130T224530Z
UID:10000429-1707130800-1707136200@datascience.ucsd.edu
SUMMARY:Lessons from the deep: engineering biosensors\, workflows\, and visualizations for communication and collaboration in comparative medicine and climate science | Jessica Kendall-Bar
DESCRIPTION:Abstract: Effective conservation and management relies on an in-depth understanding of the health of marine ecosystems. Dr. Kendall-Bar’s interdisciplinary approach combines engineering\, visualization\, and computation to study ocean resilience in terms of the extreme physiology and behavior of marine animals\, establishing eco-physiological baselines to track over time in the face of climate change. This seminar and chalk talk will review her work to create innovative tools to detect\, visualize\, and analyze the physiology and behavior of animals in extreme environments that showcase their biological resilience to oxygen and sleep deprivation. From individuals to ecosystems\, Kendall-Bar conducts multidisciplinary physiological studies that combine basic and applied science with potential to advance conservation and comparative medicine. This seminar reviews Kendall-Bar’s dissertation research on sleep in seals and presents some current and ongoing projects to combine high-performance computing\, automation\, and visualization to assess diving physiology in human freedivers\, epilepsy in sea lions\, and cardiac performance in some of the largest (blue whales) and smallest (emperor penguins) divers. Kendall-Bar’s newest projects involve novel data visualizations and science communication to inform research as well as international policy in domains ranging from marine mammal conservation to traditional ecological knowledge and coral reef restoration.\n \nBio: Dr. Jessica Kendall-Bar is a Schmidt AI in Science Postdoctoral Fellow at Scripps Institution of Oceanography\, UC San Diego. Her research combines engineering\, data science\, ecology\, and visualization to measure behavior and physiology of marine animals amidst a changing climate. For her dissertation\, she developed a non-invasive system to record and visualize the first recordings of marine mammal sleep at sea published in Science. She is an award-winning scientist\, artist\, and science communicator who designs data visualization courses\, large-scale exhibits\, immersive analytical tools\, and decision support tools. Her data visualizations\, published in local news outlets\, The New York Times and The Atlantic\, have informed international policy in domains ranging from marine mammal conservation to coral reef restoration.\n 
URL:https://datascience.ucsd.edu/event/special-seminar-jessica-kendall-bar/
LOCATION:Powell-Focht Bioengineering Hall (PFBH)\, FUNG Auditorium
CATEGORIES:Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;VALUE=DATE:20240201
DTEND;VALUE=DATE:20240203
DTSTAMP:20260531T175630
CREATED:20240201T173318Z
LAST-MODIFIED:20240201T173318Z
UID:10000433-1706745600-1706918399@datascience.ucsd.edu
SUMMARY:ENCORE: NSF TRIPODS Workshop
DESCRIPTION:
URL:https://encore.ucsd.edu/nsf-tripods-workshop/
LOCATION:Computer Science & Engineering Building (CSE)\, Room 1242\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Workshops
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2023/10/Encore-logo_HDSI-Website.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240129T140000
DTEND;TZID=America/Los_Angeles:20240129T153000
DTSTAMP:20260531T175630
CREATED:20240121T234616Z
LAST-MODIFIED:20240126T181419Z
UID:10000425-1706536800-1706542200@datascience.ucsd.edu
SUMMARY:Promoting fairness\, inclusivity\, and excitement in undergraduate education through teaching and research | Josh Grossman
DESCRIPTION:Abstract: This talk has three parts. First\, I’ll conduct an interactive\, undergraduate-level introduction to k-means clustering. I’ll then outline my pedagogy\, making specific reference to strategies employed in the teaching demo. Finally\, I’ll discuss my research on fairness in college admissions and increasing access to higher education.  \nBio: Josh is a Ph.D. candidate in Computational Social Science at Stanford University’s Department of Management Science & Engineering (MS&E). Broadly\, his research applies tools from data science to issues in public policy. His recent work investigates how racial disparate impact manifests itself in judicial decisions and college admissions. Josh recently designed a new course called “Detecting Discrimination with Data” that brings his research into the classroom. He is currently the instructor for the MS&E department’s core undergraduate applied statistics course\, and he won the Stanford Centennial TA Award and Department Course Assistant Award as a TA for the same course. Josh has worked as a data scientist at the Stanford Computational Policy Lab\, as a product manager in K–12 education technology\, and as a research data scientist at Recidiviz\, a non-profit that builds technology to reduce incarceration. Josh completed his bachelor’s degree at Harvard University\, majoring in Neurobiology and minoring in Statistics. His work is supported by a National Science Foundation Graduate Research Fellowship. 
URL:https://datascience.ucsd.edu/event/special-seminar-josh-grossman/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240126T110000
DTEND;TZID=America/Los_Angeles:20240126T123000
DTSTAMP:20260531T175630
CREATED:20240110T203505Z
LAST-MODIFIED:20240123T171855Z
UID:10000420-1706266800-1706272200@datascience.ucsd.edu
SUMMARY:Universal Learning for Decision-Making | Moise Blanchard
DESCRIPTION:We provide general-use decision-making algorithms under provably minimal assumptions on the data\, using the universal learning framework. Classically\, learning guarantees typically require two types of assumptions: (1) restrictions on target policies to be learned and (2) assumptions on the data-generating process. Instead\, we show that we can provide consistent algorithms with vanishing regret compared to the best policy in hindsight\, (1) irrespective of the optimal policy\, known as universal consistency\, and (2) well beyond standard i.i.d. or stationary assumptions on the data. We present our results for the classical online regression problem as well as for the contextual bandit problem\, where the learner’s rewards depend on their selected actions and an observable context. This generalizes the standard multi-armed bandit to the case where side information is available\, e.g.\, patients’ records or customers’ history\, which allows for personalized treatment. Precisely\, we give necessary and sufficient conditions on the context-generating process for universal consistency to be possible. Surprisingly\, for finite action spaces\, universally learnable processes are the same for contextual bandits as for the supervised learning setting\, suggesting that going from full feedback (supervised learning) to partial feedback (contextual bandits) came at no extra cost in terms of learnability. We then show that there always exists an algorithm that guarantees universal consistency whenever this is achievable. In particular\, such an algorithm is universally consistent under provably minimal assumptions: if it fails to be universally consistent for some context-generating process\, then no other algorithm would succeed either. In the case of finite action spaces\, this algorithm balances a fine trade-off between generalization (similar to structural risk minimization) and personalization (tailoring actions to specific contexts). \nBio: Moïse Blanchard is a final year PhD student at MIT\, working with Prof. Patrick Jaillet. He obtained his MSc in applied mathematics as valedictorian of Ecole Polytechnique. His research focuses on algorithms for decision-making and statistical learning. His work has been recognized with a best student paper runner-up award at COLT and a best student paper award from the Informs TSL society.
URL:https://datascience.ucsd.edu/event/statistics-moise-blanchard/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240124T140000
DTEND;TZID=America/Los_Angeles:20240124T153000
DTSTAMP:20260531T175630
CREATED:20240121T234257Z
LAST-MODIFIED:20240122T001144Z
UID:10000424-1706104800-1706110200@datascience.ucsd.edu
SUMMARY:Statistics | Enric Boix
DESCRIPTION:
URL:https://datascience.ucsd.edu/event/statistics-enric-boix/
LOCATION:Computer Science & Engineering Building (CSE)\, Room 1242\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240123T120000
DTEND;TZID=America/Los_Angeles:20240123T133000
DTSTAMP:20260531T175630
CREATED:20240119T173928Z
LAST-MODIFIED:20240119T174155Z
UID:10000423-1706011200-1706016600@datascience.ucsd.edu
SUMMARY:Women in Data Science Social | RSVP - Space is Limited
DESCRIPTION:Come to the 1st WDS Lunch Social to connect with other women researchers and students working in machine learning\, AI\, and data science related fields. Lunch will be provided. RSVP is required as space is limited. RSVP at QR code in image.
URL:https://datascience.ucsd.edu/event/women-in-data-science-social-rsvp-space-is-limited/
LOCATION:Zanzibar\, 9500 Gilman Drive\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Social Event
ATTACH;FMTTYPE=image/jpeg:https://datascience.ucsd.edu/wp-content/uploads/2024/01/signal-2024-01-16-151339.jpeg
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240122T140000
DTEND;TZID=America/Los_Angeles:20240122T153000
DTSTAMP:20260531T175630
CREATED:20240110T203331Z
LAST-MODIFIED:20240122T000929Z
UID:10000419-1705932000-1705937400@datascience.ucsd.edu
SUMMARY:Algorithm Dynamics in Modern Statistical Learning: Universality and Implicit Regularization | Tianhao Wang
DESCRIPTION:Abstract: Modern statistical learning is featured by the high-dimensional nature of data and over-parameterization of models. In this regime\, analyzing the dynamics of the used algorithms is challenging but crucial for understanding the performance of learned models. This talk will present recent results on the dynamics of two pivotal algorithms: Approximate Message Passing (AMP) and Stochastic Gradient Descent (SGD). Specifically\, AMP refers to a class of iterative algorithms for solving large-scale statistical problems\, whose dynamics admit asymptotically a simple but exact description known as state evolution. We will demonstrate the universality of AMP’s state evolution over large classes of random matrices\, and provide illustrative examples of applications of our universality results. Secondly\, for SGD\, a workhorse for training deep neural networks\, we will introduce a novel mathematical framework for analyzing its implicit regularization. This is essential for SGD’s ability to find solutions with strong generalization performance\, particularly in the case of over-parameterization. Our framework offers a general method to characterize the implicit regularization induced by gradient noise. Finally\, in the context of underdetermined linear regression\, we will show that both AMP and SGD can provably achieve sparse recovery\, yet they do so from markedly different perspectives. \nBio: Tianhao Wang is a final-year Ph.D. student in the Department of Statistics and Data Science at Yale University\, advised by Prof. Zhou Fan. His research focuses on the mathematical foundations of statistics and machine learning.
URL:https://datascience.ucsd.edu/event/statistics-tianhao-wang/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2024/01/HDSI-UCSD-Image_Dark-blue-e1710178042629.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240119T110000
DTEND;TZID=America/Los_Angeles:20240119T120000
DTSTAMP:20260531T175630
CREATED:20240106T004809Z
LAST-MODIFIED:20240116T164734Z
UID:10000417-1705662000-1705665600@datascience.ucsd.edu
SUMMARY:Statistical insight for biomedical data science\, with applications to single-cell RNA-sequencing data | Yiqun Chen
DESCRIPTION:My research centers around bringing statistical insights and understanding to the practice of modern data science\, and I will cover two projects related to this research vision in this talk. \nThe first part of the talk is motivated by the practice of testing data-driven hypotheses. In the biomedical sciences\, it has become increasingly common to collect massive datasets without a pre-specified research question. In this setting\, a data analyst might use the data both to generate a research question\, and to test the associated null hypothesis. For example\, in single-cell RNA-sequencing analyses\, researchers often first cluster the cells\, and then test for differences in the expected gene expression levels between the clusters to quantify up- or down-regulation of genes\, annotate known cell types\, and identify new cell types. However\, this popular practice is invalid from a statistical perspective: once we have used the data to generate hypotheses\, standard statistical inference tools are no longer valid. To tackle this problem\, I developed a conditional selective approach to test for a difference in means between pairs of clusters obtained via k-means clustering. \nThe proposed approach has appropriate statistical guarantees (e.g.\, selective Type 1 error control). In the second part of the talk\, I will consider how to leverage large language models (LLMs) such as ChatGPT for biomedical discovery. While significant progress has been made in customizing large language models for biomedical data\, these models often require extensive data curation and resource-intensive training. In the context of single-cell RNA-sequencing data\, I will show that we can achieve surprisingly competitive results on many downstream tasks via a much simpler alternative: I input textual descriptions of genes into an off-the-shelf LLM\, such as ChatGPT\, to obtain low-dimensional representations of the genes\, or “embeddings.” I then use these embeddings as features in downstream tasks. A similar approach enables LLM-derived embeddings of cells. This work highlights the potential of LLMs to provide meaningful and concise representations for biomedical data\, and also raises a number of challenging statistical questions. Addressing these questions requires bringing principled statistical thinking to the practice of modern data science.
URL:https://datascience.ucsd.edu/event/statistics-yiqun-chen/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240118T100000
DTEND;TZID=America/Los_Angeles:20240118T110000
DTSTAMP:20260531T175630
CREATED:20240111T172743Z
LAST-MODIFIED:20240111T172743Z
UID:10000422-1705572000-1705575600@datascience.ucsd.edu
SUMMARY:TILOS Seminar: The Dissimilarity Dimension: Sharper Bounds for Optimistic Algorithms
DESCRIPTION:Title: The Dissimilarity Dimension: Sharper Bounds for Optimistic Algorithms\n\nSpeaker: Aldo Pacciano\, Assistant Professor\, Boston University Center for Computing and Data Sciences \nZoom: https://ucsd.zoom.us/j/99334315002 \nAbstract: The principle of Optimism in the Face of Uncertainty (OFU) is one of the foundational algorithmic design choices in Reinforcement Learning and Bandits. Optimistic algorithms balance exploration and exploitation by deploying data collection strategies that maximize expected rewards in plausible models. This is the basis of celebrated algorithms like the Upper Confidence Bound (UCB) for multi-armed bandits. For nearly a decade\, the analysis of optimistic algorithms\, including Optimistic Least Squares\, in the context of rich reward function classes has relied on the concept of eluder dimension\, introduced by Russo and Van Roy in 2013. In this talk we shed light on the limitations of the eluder dimension in capturing the true behavior of optimistic strategies in the realm of function approximation. We remediate these by introducing a novel statistical measure\, the “dissimilarity dimension”. We show it can be used to provide sharper sample analysis of algorithms like Optimistic Least Squares by establishing a link between regret and the dissimilarity dimension. To illustrate this\, we will show that some function classes have arbitrarily large eluder dimension but constant dissimilarity. Our regret analysis draws inspiration from graph theory and may be of interest to the mathematically minded beyond the field of statistical learning theory. This talk sheds new light on the fundamental principle of optimism and its algorithms in the function approximation regime\, advancing our understanding of these concepts. \nBio: Aldo Pacchiano is an Assistant Professor at the Boston University Center for Computing and Data Sciences and a Fellow at the Eric and Wendy Schmidt Center of the broad institute of MIT and Harvard. He obtained his PhD under the supervision of Profs. Michael Jordan and Peter Bartlett at UC Berkeley and was a Postdoctoral Researcher at Microsoft Research\, NYC. His research lies in the areas of Reinforcement Learning\, Online Learning\, Bandits and Algorithmic Fairness. He is particularly interested in furthering our statistical understanding of learning phenomena in adaptive environments and use these theoretical insights and techniques to design efficient and safe algorithms for scientific\, engineering\, and large-scale societal applications.
URL:https://datascience.ucsd.edu/event/tilos-seminar-the-dissimilarity-dimension-sharper-bounds-for-optimistic-algorithms/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2023/10/TILOS-Square_HDSI-Website-e1712854679822.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240117T140000
DTEND;TZID=America/Los_Angeles:20240117T150000
DTSTAMP:20260531T175630
CREATED:20240106T004650Z
LAST-MODIFIED:20240110T202753Z
UID:10000416-1705500000-1705503600@datascience.ucsd.edu
SUMMARY:Inference and Decision-Making amid Social Interactions | Shuangning Li
DESCRIPTION:From social media trends to family dynamics\, social interactions shape our daily lives. In this talk\, I will present tools I have developed for statistical inference and decision-making in light of these social interactions. \n(1) Inference: I will talk about estimation of causal effects in the presence of interference. In causal inference\, the term “interference” refers to a situation where\, due to interactions between units\, the treatment assigned to one unit affects the observed outcomes of others. I will discuss large-sample asymptotics for treatment effect estimation under network interference where the interference graph is a random draw from a graphon. When targeting the direct effect\, we show that popular estimators in our setting are considerably more accurate than existing results suggest. Meanwhile\, when targeting the indirect effect\, we propose a consistent estimator in a setting where no other consistent estimators are currently available. \n(2) Decision-Making: Turning to reinforcement learning amid social interactions\, I will focus on a problem inspired by a specific class of mobile health trials involving both target individuals and their care partners. These trials feature two types of interventions: those targeting individuals directly and those aimed at improving the relationship between the individual and their care partner. I will present an online reinforcement learning algorithm designed to personalize the delivery of these interventions. The algorithm’s effectiveness is demonstrated through simulation studies conducted on a realistic test bed\, which was constructed using data from a prior mobile health study. The proposed algorithm will be implemented in the ADAPTS HCT clinical trial\, which seeks to improve medication adherence among adolescents undergoing allogeneic hematopoietic stem cell transplantation. \nBio: Shuangning Li is currently a postdoctoral fellow working with Professor Susan Murphy in the Department of Statistics at Harvard University. Prior to this\, they earned their Ph.D. from the Department of Statistics at Stanford University\, where they were advised by Professors Emmanuel Candès and Stefan Wager. Before attending Stanford\, Shuangning obtained their Bachelor of Science degree from the University of Hong Kong.
URL:https://datascience.ucsd.edu/event/statistics-shuangning-li/
LOCATION:Computer Science & Engineering Building (CSE)\, Room 4140\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240117T100000
DTEND;TZID=America/Los_Angeles:20240117T110000
DTSTAMP:20260531T175630
CREATED:20240111T171853Z
LAST-MODIFIED:20240116T232430Z
UID:10000421-1705485600-1705489200@datascience.ucsd.edu
SUMMARY:TILOS Seminar: Medical image reconstruction via deep learning: new architectures\, data reduction and theoretical guarantees
DESCRIPTION:Title: Medical image reconstruction via deep learning: new architectures\, data reduction and theoretical guarantees \nSpeaker: Mahdi Soltanolkotabi\, Director of the Center on AI Foundations for the Sciences (AIF4S) at USC \nZoom: https://ucsd.zoom.us/j/99334315002 \nAbstract: In this talk I will discuss the challenges and opportunities for using deep learning in medical image reconstruction. Contemporary techniques in this field rely on convolutional architectures that are limited by the spatial invariance of their filters and have difficulty modeling long-range dependencies. To remedy this\, I will discuss our work on designing new transformer-based architectures called HUMUS-Net that lead to state of the art performance and do not suffer from these limitations. In the next part of the talk I will report on techniques to significantly reduce the required data for training. Finally\, I will briefly discuss our recent attempts to develop rigorous theory for simple end-to-end training methods used in image reconstruction problems which is surprisingly quite challenging even for simple target functions. Notability\, our theory will be in the rich (or beyond NTK regime) that conforms with practical choice of hyperparameters. Time permitting I will discuss other exciting directions for the use of deep learning in MR. \nBio: Mahdi Soltanolkotabi is the director of the center on AI Foundations for the Sciences (AIF4S) at the University of Southern California. He is also an associate professor in the Departments of Electrical and Computer Engineering\, Computer Science\, and Industrial and Systems engineering where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC\, he completed his PhD in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year. Mahdi is the recipient of the Information Theory Society Best Paper Award\, Packard Fellowship in Science and Engineering\, an NIH Director’s new innovator award\, a Sloan Research Fellowship\, an NSF Career award\, an Airforce Office of Research Young Investigator award (AFOSR-YIP)\, the Viterbi school of engineering junior faculty research award\, and faculty awards from Google and Amazon. His research focuses on developing the mathematical foundations of modern data science via characterizing the behavior and pitfalls of contemporary nonconvex learning and optimization algorithms with applications in deep learning\, large scale distributed training\, federated learning\, computational imaging\, and AI for scientific and medical applications.
URL:https://datascience.ucsd.edu/event/tilos-seminar-medical-image-reconstruction-via-deep-learning-new-architectures-data-reduction-and-theoretical-guarantees/
LOCATION:Computer Science & Engineering Building (CSE)\, Room 2154\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:Seminar
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2023/10/TILOS-Square_HDSI-Website-e1712854679822.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240112T110000
DTEND;TZID=America/Los_Angeles:20240112T120000
DTSTAMP:20260531T175630
CREATED:20240106T004540Z
LAST-MODIFIED:20240110T202635Z
UID:10000415-1705057200-1705060800@datascience.ucsd.edu
SUMMARY:Uncertainty Quantification for Interpretable Machine Learning | Lili Zheng
DESCRIPTION:Interpretable machine learning has been widely deployed for scientific discoveries and decision-making\, while its reliability hinges on the critical role of uncertainty quantification (UQ). In this talk\, I will discuss UQ in two challenging scenarios motivated by scientific and societal applications: selective inference for large-scale graph learning and UQ for model-agnostic machine learning interpretations. Specifically\, the first part concerns graphical model inference when only irregular\, patchwise observations are available\, a common setting in neuroscience\, healthcare\, genomics\, and econometrics. To filter out low-confidence edges due to the irregular measurements\, I will present a novel inference method that quantifies the uneven edgewise uncertainty levels over the graph as well as an FDR control procedure; this is achieved by carefully disentangling the dependencies across the graph and consequently yields more reliable graph selection. In the second part\, I will discuss the computational and statistical challenges associated with UQ for feature importance of any machine learning model. I will take inspiration from recent advances in conformal inference and utilize an ensemble framework to address these challenges. This leads to an almost computationally free\, assumption-light\, and statistically powerful inference approach for occlusion-based feature importance. For both parts of the talk\, I will highlight the potential applications of my research in science and society as well as how it contributes to more reliable and trustworthy data science. \nBio: Lili Zheng is a current postdoctoral researcher in the Department of Electrical and Computer Engineering at Rice University\, mentored by Prof. Genevera I. Allen. Prior to this\, she obtained her Ph.D. degree from the Department of Statistics at the University of Wisconsin-Madison\, mentored by Prof. Garvesh Raskutti. Her research interests include graph learning\, interpretable machine learning\, uncertainty quantification\, tensor data analysis\, ensemble methods\, and time series. Her website can be found at https://lili-zheng-stat.github.io
URL:https://datascience.ucsd.edu/event/statistics-lili-zheng/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240110T140000
DTEND;TZID=America/Los_Angeles:20240110T150000
DTSTAMP:20260531T175630
CREATED:20240106T004438Z
LAST-MODIFIED:20240106T004438Z
UID:10000414-1704895200-1704898800@datascience.ucsd.edu
SUMMARY:Causal Inference for Complex Real-world Questions | Chan Park
DESCRIPTION:Abstract: Causal inference has gained popularity due to its ability to draw causal conclusions for real-world questions. While much of the existing work in causal inference revolves around determining average treatment effects under no interference and no unmeasured confounding assumptions\, these approaches may not be suitable for answering causal questions in complex settings. In the talk\, we focus on the following three topics\, each accompanied by relevant applications. \nFirst\, we discuss causal inference under interference and non-i.i.d. settings. Interference refers to a phenomenon where one’s outcome is affected by others’ treatment status; for instance\, one’s own risk of flu infection is affected by family members’ flu vaccination status. We explore two estimators\, a point-estimator and a bound-estimator\, that are developed to infer causal effects in the presence of interference. \nSecond\, we explore causal inference under the presence of unmeasured confounders\, variables that affect both treatment status and outcome. We demonstrate that ignoring the potential existence of unmeasured confounding can lead to incorrect causal conclusions using data on the Zika virus outbreak. Recently developed frameworks\, including universal difference-in-differences and single proxy control\, are discussed to address the challenges posed by unmeasured confounding. \nFinally\, we turn our attention to optimal treatment regimes and policy learning. We introduce the concept of the minimum resource threshold policy (MRTP)\, which is defined as the minimum amount of treatment required to achieve a policy target. Our focus narrows down to estimating the MRTP of the water\, sanitation\, and hygiene facilities in Senegal to achieve a desirable level of a public health indicator as guided by international organizations. \nBio: I am a postdoctoral researcher in the Department of Statistics and Data Science at the Wharton School\, University of Pennsylvania\, mentored by Prof. Eric J. Tchetgen. In May 2022\, I received my Ph.D. in Statistics from the University of Wisconsin-Madison where I was advised by Prof. Hyunseung Kang. Prior to joining the Ph.D. program in July 2017\, I worked as a statistician for two and a half years at the Central Bank of Korea. I received a B.S. in Statistics from Seoul National University.\nMy research broadly focuses on (a) causal inference under interference and non-i.i.d. settings\, (b) causal inference under unmeasured confounding\, and (c) optimal treatment regimes and policy learning. A common theme in my research is to use non/semiparametric theory and optimization methods to develop efficient and robust estimators of causal quantities in (a)-(c).
URL:https://datascience.ucsd.edu/event/causal-inference-for-complex-real-world-questions-chan-park/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20231204T140000
DTEND;TZID=America/Los_Angeles:20231204T153000
DTSTAMP:20260531T175630
CREATED:20231130T194107Z
LAST-MODIFIED:20231130T194107Z
UID:10000413-1701698400-1701703800@datascience.ucsd.edu
SUMMARY:Flat minima and generalization in deep learning: a case study in low rank matrix recovery
DESCRIPTION:Abstract: Recent advances in machine learning and artificial intelligence have relied on fitting highly overparameterized models\, notably deep neural networks\, to observed data. In such settings\, the number of parameters of the model is much greater than the number of data samples\, thereby resulting in a continuum of models with near-zero training error. Understanding which of these models generalize well and which do not is the central open question in deep learning. Recent empirical evidence suggests one mechanism for generalization: the shape of the training loss around a local minimizer seems to strongly impact the model’s performance. In particular\, flat minima — those around which the loss grows slowly — appear to generalize well. Clarifying this phenomenon can shed new light on generalization in deep learning\, which still largely remains a mystery.\nI will describe our recent work that takes a step towards this goal by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyze overparameterized matrix and bilinear sensing\, robust PCA\, covariance matrix estimation\, and single hidden layer neural networks with quadratic activation functions. In all cases\, we show that flat minima\, measured by the trace of the Hessian\, exactly recover the ground truth under standard statistical assumptions. These results suggest (i) a theoretical basis for favoring methods that bias iterates towards flat solutions and (ii) use of Hessian trace as a good regularizer for some learning tasks. We end by discussing the impact of depth on the generalization properties of flat solutions\, which surprisingly is not always beneficial. \n \nBio: Dmitriy Drusvyatskiy received his PhD from the Operations Research and Information Engineering department at Cornell University in 2013\, followed by a post doctoral appointment in the Combinatorics and Optimization department at Waterloo\, 2013-2014. He joined the Mathematics department at University of Washington as an Assistant Professor in 2014\, and was promoted to an Associate Professor in 2019\, and to Full Professor in 2022. Dmitriy’s research broadly focuses on designing and analyzing algorithms for large-scale optimization problems\, primarily motivated by applications in data science. Dmitriy has received a number of awards\, including the Air Force Office of Scientific Research (AFOSR) Young Investigator Program (YIP) Award\, NSF CAREER\, SIAG/OPT Best Paper Prize 2023\, Paul Tseng Faculty fellowship 2022-2026\, INFORMS Optimization Society Young Researcher Prize 2019\, and finalist citations for the Tucker Prize 2015 and the Young Researcher Best Paper Prize at ICCOPT 2019. Dmitriy is currently a co-PI of the NSF funded Transdisciplinary Research in Principles of Data Science (TRIPODS) institute at University of Washington.
URL:https://datascience.ucsd.edu/event/flat-minima-and-generalization-in-deep-learning-a-case-study-in-low-rank-matrix-recovery/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20231129T140000
DTEND;TZID=America/Los_Angeles:20231129T153000
DTSTAMP:20260531T175630
CREATED:20231127T190722Z
LAST-MODIFIED:20231129T183848Z
UID:10000412-1701266400-1701271800@datascience.ucsd.edu
SUMMARY:Detection and recovery of low-rank signals under heteroskedastic noise
DESCRIPTION:Title: Detection and recovery of low-rank signals under heteroskedastic noise \nAbstract: A fundamental task in data analysis is to detect and recover a low-rank signal in a noisy data matrix. Typically\, this task is addressed by inspecting and manipulating the spectrum of the observed data\, e.g.\, thresholding the singular values of the data matrix at a certain critical level. This approach is well-established in the case of homoskedastic noise\, where the noise variance is identical across the entries. However\, in numerous applications\, such as single-cell RNA sequencing (scRNA-seq)\, the noise can be heteroskedastic\, where the noise characteristics vary considerably across the rows and columns of the data. In such scenarios\, the noise spectrum can differ significantly from the homoskedastic case\, posing various challenges for signal detection and recovery. In this talk\, I will present a procedure for standardizing the noise spectrum by judiciously scaling the rows and columns of the data. Importantly\, this procedure can provably enforce the standard spectral behavior of homoskedastic noise — the Marchenko-Pastur law. I will describe methods for estimating the required scaling factors directly from the observed data with suitable theoretical justification\, and demonstrate the advantages of the proposed approach for signal detection and recovery in simulations and on real scRNA-seq data.
URL:https://datascience.ucsd.edu/event/detection-and-recovery-of-low-rank-signals-under-heteroskedastic-noise/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Conference,Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20231115T140000
DTEND;TZID=America/Los_Angeles:20231115T150000
DTSTAMP:20260531T175630
CREATED:20231113T170847Z
LAST-MODIFIED:20231113T170847Z
UID:10000411-1700056800-1700060400@datascience.ucsd.edu
SUMMARY:Improving Technical Communication with End Users about Differential Privacy
DESCRIPTION:Title: Improving Technical Communication with End Users about Differential Privacy\n \nAbstract:  Differential privacy (DP) is widely regarded as a gold standard for privacy-preserving computation over users’ data. A key challenge with DP is that its mathematical sophistication makes its privacy guarantees difficult to communicate to users\, leaving them uncertain about how and whether they are protected. Despite recent widespread deployment of DP\, relatively little is known about what users think of differential privacy and how to effectively communicate the practical privacy guarantees it offers.\n \nThis talk will cover a series of recent and ongoing user studies aimed at measuring and improving communication with non-technical end users about differential privacy. The first set explores users’ privacy expectations related to differential privacy and measures the efficacy of existing methods for communicating the privacy guarantees of DP systems. We find that users care about the kinds of information leaks against which differential privacy protects and are more willing to share their private information when the risk of these leaks is reduced. Additionally\, we find that the ways in which differential privacy is described in-the-wild set users’ privacy expectations haphazardly\, which can be misleading depending on the deployment. Motivated by these findings\, the second set of user studies develops and evaluates prototype descriptions designed to help end users understand DP guarantees. These descriptions target two important technical details in DP deployments that are often poorly communicated to end users: the privacy parameter epsilon (which governs the level of privacy protections) and the distinctions between the local and central models of DP (which governs who can access exact user data). Based on joint works with Gabriel Kaptchuk\, Priyanka Nanayakkara\, Elissa Redmiles\, Mary Anne Smart\, including https://arxiv.org/abs/2110.06452 and https://arxiv.org/abs/2303.00738. \n \nBio:  Dr. Rachel Cummings is an Assistant Professor in the Fu Foundation School of Engineering and Applied Science at Columbia University. Her research interests lie primarily in data privacy\, with connections to machine learning\, algorithmic economics\, optimization\, statistics\, and information theory. Her work has focused on problems such as strategic aspects of data generation\, incentivizing truthful reporting of data\, privacy-preserving algorithm design\, impacts of privacy policy\, and human decision-making. Dr. Cummings is the recipient of an NSF CAREER award\, a Google Research Fellowship for the Simons Institute program on Data Privacy\, a Mozilla Research Grant\, the ACM SIGecom Doctoral Dissertation Honorable Mention\, the Amori Doctoral Prize in Computing and Mathematical Sciences\, a Caltech Leadership Award\, a Simons Award for Graduate Students in Theoretical Computer Science\, and the Best Paper Award at the 2014 International Symposium on Distributed Computing. Dr. Cummings also serves on the ACM U.S. Public\nPolicy Council’s Privacy Committee. She received her Ph.D. in Computing and Mathematical Sciences from the California Institute of Technology\, her M.S. in Computer Science from Northwestern University\, and her B.A. in Mathematics and Economics from the University of Southern California.
URL:https://datascience.ucsd.edu/event/improving-technical-communication-with-end-users-about-differential-privacy/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Seminar
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20231114T073000
DTEND;TZID=America/Los_Angeles:20231114T120000
DTSTAMP:20260531T175630
CREATED:20231004T163650Z
LAST-MODIFIED:20231101T170913Z
UID:10000405-1699947000-1699963200@datascience.ucsd.edu
SUMMARY:Save the Date: Data Science Day 2023
DESCRIPTION:
URL:https://techsandiego.org/data-science-day/
LOCATION:3234 Matthews Ln\, La Jolla\, 92093\, United States
CATEGORIES:Symposium
ATTACH;FMTTYPE=image/png:https://datascience.ucsd.edu/wp-content/uploads/2023/10/STD-Data-Science-Day.png
END:VEVENT
END:VCALENDAR