Contact Us

Give us a call or drop by anytime, we endeavor to answer all inquiries within 24 hours.


Find us

PO Box 16122 Collins Street West Victoria, Australia

Email us /

Phone support

Phone: + (066) 0760 0260 / + (057) 0760 0560

Loading Events

« All Events

  • This event has passed.
Event Series Event Series: Special Seminar Series

Building Human-AI Alignment: Specifying, Inspecting, and Modeling AI Behaviors | Serena Booth

February 26 @ 12:30 pm - 2:00 pm

Abstract: The learned behaviors of AI and robot agents should align with the intentions of their human designers. Alignment is necessary for AI systems to be used in many sectors of the economy, and so the process of aligning AI systems becomes critical to study for defining effective AI policy. Toward this goal, people must be able to easily specify, inspect, and model agent behaviors. For specifications, we will consider expert-written reward functions for reinforcement learning (RL) and non-expert preferences for reinforcement learning from human feedback (RLHF). I will show evidence that experts are bad at writing reward functions: even in a trivial setting, experts write specifications that are overfit to a particular RL algorithm, and they often write erroneous specifications for agents that fail to encode their true intent. I will also show that the common approach to learning a reward function from non-experts in RLHF uses an inductive bias that fails to encode how humans express preferences, and that our proposed bias better encodes human preferences both theoretically and empirically. I will discuss the policy implications: namely, that engineers’ design processes and embedded assumptions in building AI must be considered. For inspection, humans must be able to assess the behaviors an agent learns from a given specification. I will discuss a method to find settings that exhibit particular behaviors, like out-of-distribution failures. I will discuss the policy implications for testing AI systems, for example through red teaming. Lastly, cognitive science theories attempt to show how people build conceptual models that explain agent behaviors. I will show evidence that some of these theories are used in research to support humans, but that we can still build better curricula for modeling. I will discuss the policy need for careful onboarding to AI systems. I will end by discussing my current work in the U.S. Senate on responding to the proliferation of AI. Collectively, my research provides evidence that—even with the best of intentions— current human-AI systems often fail to induce alignment, and my research proposes promising directions for how to build better aligned human-AI systems.


February 26
12:30 pm - 2:00 pm
Event Category:
Event Tags:


HDSI General


Serena Booth
Event Recording Link