• The Synergy between Machine Learning and the Natural Sciences | Max Welling

    Distinguished Lecturer Series
    Halıcıoğlu Data Science Institute (HDSI), Room 123 3234 Matthews Ln, La Jolla, CA, United States

    Abstract: Traditionally machine learning has been heavily influenced by neuroscience (hence the name artificial neural networks) and physics (e.g. MCMC, Belief Propagation, and Diffusion based Generative AI). We have recently witnessed that the flow of information has also reversed, with new tools developed in the ML community impacting physics, chemistry and biology. Examples include faster DFT, Force-Field accelerated MD simulations, PDE Neural Surrogate models, generating druglike molecules, and many more. In this talk I will review the exciting opportunities for further cross fertilization between these fields, ranging from faster (classical) DFT calculations and enhanced transition path sampling to traveling waves in artificial neural networks.

  • Enabling Performant and Trustworthy Learning-enabled CPS-IoT Systems | Mani Srivastava

    Special Seminar Series
    Halıcıoğlu Data Science Institute (HDSI), Room 123 3234 Matthews Ln, La Jolla, CA, United States

    Abstract: "The previously discrete technologies of IoT and AI have now entered a tight virtuous embrace. IoT allows sensing and actuation in our physical, social, and urban spaces with unimaginable ubiquity. AI allows sophisticated inferences and decisions to be made algorithmically using deep neural networks, even from unstructured and high- dimensional data, with uncanny performance. Together they seek to perform sophisticated perception-cognition-communication-action loops in diverse applications. However, designers of learning-enabled IoT systems face the challenge of extremely resource-constrained edge platforms operating in uncertain environments while assuring performance and trustworthiness. Moreover, in many applications, the systems go beyond taking actions based on rich inferences about the world state to perform long-term reasoning about complex events and obey the underlying physics, rules, and constraints. Based on our experience in designing such systems in applications including mHealth, ocean animal health, agriculture robotics, and military, This talk explores meeting these challenges through a combination of (i) neurosymbolic architectures that allow the incorporation of physics awareness and human knowledge while enhancing user trust, (ii) automatic platform-aware architecture search and code generation, and (iii) techniques to efficiently adapt to the deployment environment."       

  • Principled Approaches for Trustworthy Algorithms, Statistics, and Machine Learning | Gautam Kamath

    Special Seminar Series
    Halıcıoğlu Data Science Institute (HDSI), Room 123 3234 Matthews Ln, La Jolla, CA, United States

    Abstract: Despite impressive recent advances, machine learning models exhibit a number of critical deficiencies. They are prone to leaking sensitive information about their training data. They remain alarmingly brittle to attacks by malicious parties. Troublingly, these issues stem from more fundamental statistical vulnerabilities, which remain unresolved even decades later, highlighting significant gaps in our understanding of how to deal with these important considerations. As long as these problems remain, our models will not be appropriate for use beyond deployment in toy settings. In this talk, I will discuss recent advances on a number of these problems, which give key new algorithmic insights into how to address these considerations, and enable real-world deployments that were previously thought infeasible. In a first vignette, we will explore how to guarantee individual privacy in machine learning models, with a particular focus on large language models and the important role played by public data in the training pipeline. In a second vignette, we focus on how to robustly perform mean estimation, giving the first efficient and accurate algorithms for multivariate settings. We will go on to discuss connections to robustness against data poisoning attacks, robust exploratory data analysis, and surprising conceptual and technical connections with privacy.

  • Integrating Longitudinal Multimodal Data To Realize Precision Medicine | Samantha Piekos

    Special Seminar Series
    Halıcıoğlu Data Science Institute (HDSI), Room 123 3234 Matthews Ln, La Jolla, CA, United States

    Abstract: The interplay of biology, environment, and lifestyle direct the development and progression of complex diseases and other health outcomes. Therefore, integration of longitudinal multimodal data is needed to understand the mechanisms underpinning major molecular transitions. Previously during my doctoral work at Stanford, I integrated multiomics data to elucidate the epigenetic mechanism of human surface ectoderm differentiation. I also built a pipeline to investigate the role of polymorphism, particularly non-coding genetic variants, in complex diseases. To address the common pain point of data silos limiting the interpretation of multimodal data integration, I formed a collaboration with Google Data Commons to build a free, open-source biomedical knowledge graph with a common schema and API. Currently it is composed of approximately 130 million nodes and 1.7 trillion triples (node-edge-node) from 22 publicly available biomedical datasets. Knowledge graphs are a key tool for hypothesis generation, data interpretation, and dimensionality reduction required for systems medicine research. Upon starting my postdoctoral work at the Institute for Systems Biology, I identified pregnancy as an excellent model system for prototyping precision medicine approaches. I used electronic healthcare records (EHR) from Providence St. Joseph Healthcare to investigate the impact of COVID-19 maternal infection and vaccination on maternal-fetal outcomes. In addition, I integrated multiomics placental data to investigate molecular network changes (interomics and intraomics) in common obstetric disorders. In a follow-up study (enrollment complete) we have longitudinal deep-phenotyping data of 435 people throughout pregnancy 80 of which have pregnancy complications. This includes multiomics, survey, EHR, and air quality data collected from first prenatal visit through delivery. My lab will use this data to define major molecular transition states throughout pregnancy. I will also investigate the disease mechanisms of common obstetric disorders including identifying for an individual the earliest possible point of deviation from a healthy trajectory. This interdisciplinary approach will identify potential drug targets, biomarker panels, and individualized clinical interventions.