TILOS Seminar: Transformers learn in-context by (functional) gradient descent
VirtualTransformers learn in-context by (functional) gradient descent Xiang Cheng, TILOS Postdoctoral Scholar at MIT HDSI 123 and Zoom: https://ucsd.zoom.us/j/99334315002 Abstract: Motivated by the in-context learning phenomenon, we investigate how the Transformer neural […]