Name: The Synergy between Machine Learning and the Natural Sciences | Max Welling
Start: 2024-02-21T14:00:00-08:00
End: 2024-02-21T15:30:00-08:00
Location: Halıcıoğlu Data Science Institute (HDSI), Room 123

February 2024
EnCORE : Theoretical Exploration of Foundation Model Adaptation, Kangwook Lee, UW Madison, Feb 9th, 1-2pm

February 9, 2024 @ 1:00 pm - 2:00 pm EnCORE Series

Atkinson Hall, Fourth Floor

Abstract: Due to the enormous size of foundation models, various new methods for efficient model adaptation have been developed. Parameter-efficient fine-tuning (PEFT) is an adaptation method that updates only a tiny fraction of the model parameters, leaving the remainder unchanged. In-context Learning (ICL) is a test-time adaptation method, which repurposes foundation models by providing them with labeled samples as part of the input context. Given the growing importance of this emerging paradigm, developing theoretical foundations for the new paradigm is of utmost importance.
Integrating Longitudinal Multimodal Data To Realize Precision Medicine | Samantha Piekos

February 12, 2024 @ 2:00 pm - 3:30 pm Special Seminar Series

Halıcıoğlu Data Science Institute (HDSI), Room 123 3234 Matthews Ln, La Jolla, CA, United States

Abstract: The interplay of biology, environment, and lifestyle direct the development and progression of complex diseases and other health outcomes. Therefore, integration of longitudinal multimodal data is needed to understand the mechanisms underpinning major molecular transitions. Previously during my doctoral work at Stanford, I integrated multiomics data to elucidate the epigenetic mechanism of human surface ectoderm differentiation. I also built a pipeline to investigate the role of polymorphism, particularly non-coding genetic variants, in complex diseases. To address the common pain point of data silos limiting the interpretation of multimodal data integration, I formed a collaboration with Google Data Commons to build a free, open-source biomedical knowledge graph with a common schema and API. Currently it is composed of approximately 130 million nodes and 1.7 trillion triples (node-edge-node) from 22 publicly available biomedical datasets. Knowledge graphs are a key tool for hypothesis generation, data interpretation, and dimensionality reduction required for systems medicine research. Upon starting my postdoctoral work at the Institute for Systems Biology, I identified pregnancy as an excellent model system for prototyping precision medicine approaches. I used electronic healthcare records (EHR) from Providence St. Joseph Healthcare to investigate the impact of COVID-19 maternal infection and vaccination on maternal-fetal outcomes. In addition, I integrated multiomics placental data to investigate molecular network changes (interomics and intraomics) in common obstetric disorders. In a follow-up study (enrollment complete) we have longitudinal deep-phenotyping data of 435 people throughout pregnancy 80 of which have pregnancy complications. This includes multiomics, survey, EHR, and air quality data collected from first prenatal visit through delivery. My lab will use this data to define major molecular transition states throughout pregnancy. I will also investigate the disease mechanisms of common obstetric disorders including identifying for an individual the earliest possible point of deviation from a healthy trajectory. This interdisciplinary approach will identify potential drug targets, biomarker panels, and individualized clinical interventions.
Principled Approaches for Trustworthy Algorithms, Statistics, and Machine Learning | Gautam Kamath

February 13, 2024 @ 2:30 am - 4:00 pm Special Seminar Series

Halıcıoğlu Data Science Institute (HDSI), Room 123 3234 Matthews Ln, La Jolla, CA, United States

Abstract: Despite impressive recent advances, machine learning models exhibit a number of critical deficiencies. They are prone to leaking sensitive information about their training data. They remain alarmingly brittle to attacks by malicious parties. Troublingly, these issues stem from more fundamental statistical vulnerabilities, which remain unresolved even decades later, highlighting significant gaps in our understanding of how to deal with these important considerations. As long as these problems remain, our models will not be appropriate for use beyond deployment in toy settings. In this talk, I will discuss recent advances on a number of these problems, which give key new algorithmic insights into how to address these considerations, and enable real-world deployments that were previously thought infeasible. In a first vignette, we will explore how to guarantee individual privacy in machine learning models, with a particular focus on large language models and the important role played by public data in the training pipeline. In a second vignette, we focus on how to robustly perform mean estimation, giving the first efficient and accurate algorithms for multivariate settings. We will go on to discuss connections to robustness against data poisoning attacks, robust exploratory data analysis, and surprising conceptual and technical connections with privacy.
Computational approaches for uncovering implicit strategies in political discourse | Julia Mendelsohn

February 13, 2024 @ 8:00 am - 5:00 pm Special Seminar Series

GPS, Robinson Building Complex (RBC), 3106

Abstract: When discussing politics, people often use subtle linguistic strategies to influence how their audience thinks about issues, which can then impact public opinion and policy. For example, anti-immigration activists may frame immigration as a threat to native born citizens’ jobs, describe immigrants with dehumanizing vermin-related metaphors, or even use coded expressions to covertly connect […]
On Data Ecology, Data Markets, the Value of Data, and Dataflow Governance | Raul Castro Fernandez

February 13, 2024 @ 1:00 pm - 2:00 pm Seminar Series

Abstract:
Data shapes our social, economic, cultural, and technological environments. Data is valuable, so people seek it, inducing data to flow. The resulting dataflows distribute data and thus value. For example, large Internet companies profit from accessing data from their users, and engineers of large language models seek large and diverse data sources to train powerful models. It is possible to judge the impact of data in an environment by analyzing how the dataflows in that environment impact the participating agents. My research hypothesizes that it is also possible to design (better) data environments by controlling what dataflows materialize; not only can we analyze environments but also synthesize them. In this talk, I present the research agenda on “data ecology,” which seeks to build the principles, theory, algorithms, and systems to design beneficial data environments. I will also present examples of data environments my group has designed, including data markets for machine learning, data-sharing, and data integration. I will conclude by discussing the impact of dataflows in data governance and how the ideas are interwoven with the concepts of trust, privacy, and the elusive notion of “data value.” As part of the technical discussion, I will complement the data market designs with the design of a data escrow system that permits controlling dataflows.
Enabling Performant and Trustworthy Learning-enabled CPS-IoT Systems | Mani Srivastava

February 14, 2024 @ 2:00 pm - 3:30 pm Special Seminar Series

Halıcıoğlu Data Science Institute (HDSI), Room 123 3234 Matthews Ln, La Jolla, CA, United States

Abstract: "The previously discrete technologies of IoT and AI have now entered a tight virtuous embrace. IoT allows sensing and actuation in our physical, social, and urban spaces with unimaginable ubiquity. AI allows sophisticated inferences and decisions to be made algorithmically using deep neural networks, even from unstructured and high- dimensional data, with uncanny performance. Together they seek to perform sophisticated perception-cognition-communication-action loops in diverse applications. However, designers of learning-enabled IoT systems face the challenge of extremely resource-constrained edge platforms operating in uncertain environments while assuring performance and trustworthiness. Moreover, in many applications, the systems go beyond taking actions based on rich inferences about the world state to perform long-term reasoning about complex events and obey the underlying physics, rules, and constraints. Based on our experience in designing such systems in applications including mHealth, ocean animal health, agriculture robotics, and military, This talk explores meeting these challenges through a combination of (i) neurosymbolic architectures that allow the incorporation of physics awareness and human knowledge while enhancing user trust, (ii) automatic platform-aware architecture search and code generation, and (iii) techniques to efficiently adapt to the deployment environment."
Targeting humanitarian aid with machine learning and digital data | Emily Aiken

February 15, 2024 @ 12:30 pm - 2:00 pm Special Seminar Series

GPS, Robinson Building Complex (RBC), 3203

Abstract: The majority of humanitarian aid and social protection programs globally are targeted, providing assistance to individuals or communities identified to be poorest or most in need. In low- and middle-income countries, the targeting of aid programs is often limited by low-quality, out-of-date, or missing data on poverty and vulnerability. Novel "big" digital data sources, […]
Learning Inductive Representations for Reasoning over Knowledge Graphs | Zhaocheng Zhu

February 20, 2024 @ 1:00 pm - 2:30 pm

Computer Science & Engineering Building (CSE), Room 1242 3234 Matthews Ln, La Jolla, CA, United States

Abstract: Reasoning, the ability to logically draw conclusions from existing knowledge, has been long pursued as a goal of artificial intelligence. Although numerous learning algorithms have been developed for reasoning, most of them are limited to the domain they are trained on. By contrast, humans often derive high-level rules or principles from experience and apply them to new domains — an ability referred as inductive generalization. In this talk, we present a series of works that learn inductive representations for reasoning over knowledge graphs. First, we introduce Neural Bellman-Ford Networks (NBFNet) that captures paths between entities and can generalize to graphs of new entities. Then we discuss Graph Neural Network Query Executor (GNN-QE), an extension of NBNet that answers multi-hop logical queries and generalizes well on our inductive benchmark. Finally, by learning inductive representations for both entities and relations, we demonstrate that a model can generalize to any graph with arbitrary entity and relation vocabularies, paving the way for foundation models for knowledge graph reasoning.
Computational approaches for uncovering implicit strategies in political discourse | Julia Mendelsohn

February 21, 2024 @ 12:30 pm - 2:00 pm Special Seminar Series

GPS, Robinson Building Complex (RBC), 3106

When discussing politics, people often use subtle linguistic strategies to influence how their audience thinks about issues, which can then impact public opinion and policy. For example, anti-immigration activists may frame immigration as a threat to native born citizens’ jobs, describe immigrants with dehumanizing vermin-related metaphors, or even use coded expressions to covertly connect immigration with antisemitic conspiracy theories. This talk will focus on the development of computational approaches to analyze three strategies: framing, dehumanization, and dogwhistle communication. I will discuss how I draw from multiple social science disciplines to develop typologies and curate data resources, as well as how I build and evaluate natural language processing models for detecting these strategies. I further analyze the use of these strategies in political discourse across several domains, and assess the implications of such nuanced rhetoric for both society and technology.
The Synergy between Machine Learning and the Natural Sciences | Max Welling

February 21, 2024 @ 2:00 pm - 3:30 pm Distinguished Lecturer Series

Halıcıoğlu Data Science Institute (HDSI), Room 123 3234 Matthews Ln, La Jolla, CA, United States

Abstract: Traditionally machine learning has been heavily influenced by neuroscience (hence the name artificial neural networks) and physics (e.g. MCMC, Belief Propagation, and Diffusion based Generative AI). We have recently witnessed that the flow of information has also reversed, with new tools developed in the ML community impacting physics, chemistry and biology. Examples include faster DFT, Force-Field accelerated MD simulations, PDE Neural Surrogate models, generating druglike molecules, and many more. In this talk I will review the exciting opportunities for further cross fertilization between these fields, ranging from faster (classical) DFT calculations and enhanced transition path sampling to traveling waves in artificial neural networks.

Contact Us

Find us

Email us

Phone support

February 2024

EnCORE : Theoretical Exploration of Foundation Model Adaptation, Kangwook Lee, UW Madison, Feb 9th, 1-2pm

Integrating Longitudinal Multimodal Data To Realize Precision Medicine | Samantha Piekos

Principled Approaches for Trustworthy Algorithms, Statistics, and Machine Learning | Gautam Kamath

Computational approaches for uncovering implicit strategies in political discourse | Julia Mendelsohn

On Data Ecology, Data Markets, the Value of Data, and Dataflow Governance | Raul Castro Fernandez

Enabling Performant and Trustworthy Learning-enabled CPS-IoT Systems | Mani Srivastava

Targeting humanitarian aid with machine learning and digital data | Emily Aiken

Learning Inductive Representations for Reasoning over Knowledge Graphs | Zhaocheng Zhu

Computational approaches for uncovering implicit strategies in political discourse | Julia Mendelsohn

The Synergy between Machine Learning and the Natural Sciences | Max Welling

Contact Us

Find us

Email us

Phone support

Filters

Event Category: Open filter Close filter

Tags: Open filter Close filter

Event Organizers: Open filter Close filter

Day: Open filter Close filter

Time: Open filter Close filter

Format: Open filter Close filter

Series: Open filter Close filter

Featured Events: Open filter Close filter

Event Status: Open filter Close filter

February 2024