Colloquium

Name: On Data Ecology, Data Markets, the Value of Data, and Dataflow Governance | Raul Castro Fernandez
Start: 2024-02-13T13:00:00-08:00
End: 2024-02-13T14:00:00-08:00

November 2021
HDSI Seminar Series | Theories of Inference for Visual Analysis

November 17, 2021 @ 2:00 pm - 3:00 pm
March 2023
Structured Transformer Models for NLP

March 22, 2023 @ 2:00 pm - 3:30 pm Colloquia Lecture Series

SDSC, The Synthesis Center 9500 Gilman Drive, La Jolla, CA, United States

The field of natural language processing has recently unlocked a wide range of new capabilities through the use of large language models, such as GPT-4. The growing application of these models motivates developing a more thorough understanding of how and why they work, as well as further improvements in both quality and efficiency.

In this talk, I will present my work on analyzing and improving the Transformer architecture underlying today's language models through the study of how information is routed between multiple words in an input. I will show that such models can predict the syntactic structure of text in a variety of languages, and discuss how syntax can inform our understanding of how the networks operate. I will also present my work on structuring information flow to build radically more efficient models, including models that can process text of up to one million words, which enables new possibilities for NLP with book-length text.
May 2023
Scaling and Generalizing Approximate Bayesian Inference | David Blei

May 18, 2023 @ 2:00 pm - 3:00 pm Jeffrey Elman Distinguished Lecturer Series

SDSC, The Auditorium 9836 Hopkins Dr, La Jolla, San Diego, CA, United States

A core problem in statistics and machine learning is to approximate difficult-to-compute probability distributions. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation about a conditional distribution. In this talk I review and discuss innovations in variational inference (VI), a method that approximates probability distributions through optimization. VI has been used in myriad applications in machine learning and Bayesian statistics.
Algorithms for multi-group learning

May 22, 2023 @ 1:00 pm - 2:30 pm Colloquia Lecture Series

SDSC, The Synthesis Center 9500 Gilman Drive, La Jolla, CA, United States

Abstract: Multi-group agnostic learning is a formal learning criterion that is concerned with the conditional risks of predictors within subgroups of a population. The criterion addresses recent practical concerns such as subgroup fairness and hidden stratification. I'll talk about the structure of solutions to the multi-group learning problem, as well as some simple and near-optimal algorithms for the learning problem. This is based on joint work with Christopher Tosh.
Spectral clustering in high-dimensional Gaussian mixture block models

May 24, 2023 @ 2:00 pm - 3:00 pm Colloquia Lecture Series

SDSC, The Synthesis Center 9500 Gilman Drive, La Jolla, CA, United States

The Gaussian mixture block model is a simple generative model for networks: to generate a sample, we associate each node with a latent feature vector sampled from a mixture of Gaussians, and we add an edge between nodes if and only if their feature vectors are sufficiently similar. The different components of the Gaussian mixture […]
Representation Learning: A Causal Perspective

May 30, 2023 @ 2:00 pm - 3:30 pm Colloquia Lecture Series

SDSC, The Auditorium 9836 Hopkins Dr, La Jolla, San Diego, CA, United States

Abstract: Representation learning constructs low-dimensional representations to summarize essential features of high-dimensional data like images and texts. Ideally, such a representation should efficiently capture non-spurious features of the data. It shall also be disentangled so that we can interpret what feature each of its dimensions captures. However, these desiderata are often intuitively defined and challenging to quantify or enforce.
On the complexity of Frank-Wolfe methods

May 31, 2023 @ 2:00 pm - 3:30 pm Colloquia Lecture Series

SDSC, The Auditorium 9836 Hopkins Dr, La Jolla, San Diego, CA, United States

Abstract: Frank-Wolfe methods are popular for optimization over a polytope. One of the reasons is because they do not need projection onto the polytope but only linear optimization over it. This talk has two parts. The first part will be about the complexity of Wolfe's method, an algorithm closely related to Frank-Wolfe methods. In 1974 […]
June 2023
Deep Latent Variable Models for Compression and Natural Science | Stephan Mandt

June 9, 2023 @ 3:00 pm - 4:00 pm

Computer Science & Engineering Building (CSE), Room 1202

Latent variable models have been an integral part of probabilistic machine learning, ranging from simple mixture models to variational autoencoders to powerful diffusion probabilistic models at the center of recent media attention. Perhaps less well-appreciated is the intimate connection between latent variable models and data compression, and the potential of these models for advancing natural science. This talk will explore these topics.
February 2024
EnCORE : Theoretical Exploration of Foundation Model Adaptation, Kangwook Lee, UW Madison, Feb 9th, 1-2pm

February 9, 2024 @ 1:00 pm - 2:00 pm EnCORE Series

Atkinson Hall, Fourth Floor

Abstract: Due to the enormous size of foundation models, various new methods for efficient model adaptation have been developed. Parameter-efficient fine-tuning (PEFT) is an adaptation method that updates only a tiny fraction of the model parameters, leaving the remainder unchanged. In-context Learning (ICL) is a test-time adaptation method, which repurposes foundation models by providing them with labeled samples as part of the input context. Given the growing importance of this emerging paradigm, developing theoretical foundations for the new paradigm is of utmost importance.
On Data Ecology, Data Markets, the Value of Data, and Dataflow Governance | Raul Castro Fernandez

February 13, 2024 @ 1:00 pm - 2:00 pm Seminar Series

Abstract:
Data shapes our social, economic, cultural, and technological environments. Data is valuable, so people seek it, inducing data to flow. The resulting dataflows distribute data and thus value. For example, large Internet companies profit from accessing data from their users, and engineers of large language models seek large and diverse data sources to train powerful models. It is possible to judge the impact of data in an environment by analyzing how the dataflows in that environment impact the participating agents. My research hypothesizes that it is also possible to design (better) data environments by controlling what dataflows materialize; not only can we analyze environments but also synthesize them. In this talk, I present the research agenda on “data ecology,” which seeks to build the principles, theory, algorithms, and systems to design beneficial data environments. I will also present examples of data environments my group has designed, including data markets for machine learning, data-sharing, and data integration. I will conclude by discussing the impact of dataflows in data governance and how the ideas are interwoven with the concepts of trust, privacy, and the elusive notion of “data value.” As part of the technical discussion, I will complement the data market designs with the design of a data escrow system that permits controlling dataflows.

Contact Us

Find us

Email us

Phone support

Colloquium

November 2021

HDSI Seminar Series | Theories of Inference for Visual Analysis

March 2023

Structured Transformer Models for NLP

May 2023

Scaling and Generalizing Approximate Bayesian Inference | David Blei

Algorithms for multi-group learning

Spectral clustering in high-dimensional Gaussian mixture block models

Representation Learning: A Causal Perspective

On the complexity of Frank-Wolfe methods

June 2023

Deep Latent Variable Models for Compression and Natural Science | Stephan Mandt

February 2024

EnCORE : Theoretical Exploration of Foundation Model Adaptation, Kangwook Lee, UW Madison, Feb 9th, 1-2pm

On Data Ecology, Data Markets, the Value of Data, and Dataflow Governance | Raul Castro Fernandez

Contact Us

Find us

Email us

Phone support

Filters

Event Category: Open filter Close filter

Tags: Open filter Close filter

Event Organizers: Open filter Close filter

Day: Open filter Close filter

Time: Open filter Close filter

Format: Open filter Close filter

Series: Open filter Close filter

Featured Events: Open filter Close filter

Event Status: Open filter Close filter

November 2021

March 2023

May 2023

June 2023

February 2024