Contact Us

Give us a call or drop by anytime, we endeavor to answer all inquiries within 24 hours.


Find us

PO Box 16122 Collins Street West Victoria, Australia

Email us /

Phone support

Phone: + (066) 0760 0260 / + (057) 0760 0560

Loading Events

« All Events

  • This event has passed.
Event Series Event Series: TILOS Seminar Series

TILOS Seminar: Transformers learn in-context by (functional) gradient descent

April 17 @ 10:00 am - 11:00 am

Transformers learn in-context by (functional) gradient descent

Xiang Cheng, TILOS Postdoctoral Scholar at MIT

HDSI 123 and Zoom:

Abstract: Motivated by the in-context learning phenomenon, we investigate how the Transformer neural network can implement learning algorithms in its forward pass. We show that a linear Transformer naturally learns to implement gradient descent, which enables it to learn linear functions in-context. More generally, we show that a non-linear Transformer can implement functional gradient descent with respect to some RKHS metric, which allows it to learn a broad class of functions in-context. Additionally, we show that the RKHS metric is determined by the choice of attention activation, and that the optimal choice of attention activation depends in a natural way on the class of functions that need to be learned. I will end by discussing some implications of our results for the choice and design of Transformer architectures.


April 17
10:00 am - 11:00 am
Event Category:






Event Recording Link