Contact Us

Give us a call or drop by anytime, we endeavor to answer all inquiries within 24 hours.

map

Find us

PO Box 16122 Collins Street West Victoria, Australia

Email us

info@domain.com / example@domain.com

Phone support

Phone: + (066) 0760 0260 / + (057) 0760 0560

Event Series Special Seminar Series

Building Human-AI Alignment: Specifying, Inspecting, and Modeling AI Behaviors | Serena Booth

GPS, Robinson Building Complex (RBC), 3106

Abstract: The learned behaviors of AI and robot agents should align with the intentions of their human designers. Alignment is necessary for AI systems to be used in many sectors of the economy, and so the process of aligning AI systems becomes critical to study for defining effective AI policy. Toward this goal, people must be able to easily specify, inspect, and model agent behaviors. For specifications, we will consider expert-written reward functions for reinforcement learning (RL) and non-expert preferences for reinforcement learning from human feedback (RLHF). I will show evidence that experts are bad at writing reward functions: even in a trivial setting, experts write specifications that are overfit to a particular RL algorithm, and they often write erroneous specifications for agents that fail to encode their true intent. I will also show that the common approach to learning a reward function from non-experts in RLHF uses an inductive bias that fails to encode how humans express preferences, and that our proposed bias better encodes human preferences both theoretically and empirically. I will discuss the policy implications: namely, that engineers' design processes and embedded assumptions in building AI must be considered. For inspection, humans must be able to assess the behaviors an agent learns from a given specification. I will discuss a method to find settings that exhibit particular behaviors, like out-of-distribution failures. I will discuss the policy implications for testing AI systems, for example through red teaming. Lastly, cognitive science theories attempt to show how people build conceptual models that explain agent behaviors. I will show evidence that some of these theories are used in research to support humans, but that we can still build better curricula for modeling. I will discuss the policy need for careful onboarding to AI systems. I will end by discussing my current work in the U.S. Senate on responding to the proliferation of AI. Collectively, my research provides evidence that—even with the best of intentions— current human-AI systems often fail to induce alignment, and my research proposes promising directions for how to build better aligned human-AI systems.

Event Series Special Seminar Series

Computational approaches for uncovering implicit strategies in political discourse | Julia Mendelsohn

GPS, Robinson Building Complex (RBC), 3106

When discussing politics, people often use subtle linguistic strategies to influence how their audience thinks about issues, which can then impact public opinion and policy. For example, anti-immigration activists may frame immigration as a threat to native born citizens’ jobs, describe immigrants with dehumanizing vermin-related metaphors, or even use coded expressions to covertly connect immigration with antisemitic conspiracy theories. This talk will focus on the development of computational approaches to analyze three strategies: framing, dehumanization, and dogwhistle communication. I will discuss how I draw from multiple social science disciplines to develop typologies and curate data resources, as well as how I build and evaluate natural language processing models for detecting these strategies. I further analyze the use of these strategies in political discourse across several domains, and assess the implications of such nuanced rhetoric for both society and technology.

Event Series Special Seminar Series

Computational approaches for uncovering implicit strategies in political discourse | Julia Mendelsohn

GPS, Robinson Building Complex (RBC), 3106

Abstract: When discussing politics, people often use subtle linguistic strategies to influence how their audience thinks about issues, which can then impact public opinion and policy. For example, anti-immigration activists may frame immigration as a threat to native born citizens’ jobs, describe immigrants with dehumanizing vermin-related metaphors, or even use coded expressions to covertly connect […]