Contact Us

Give us a call or drop by anytime, we endeavor to answer all inquiries within 24 hours.

map

Find us

PO Box 16122 Collins Street West Victoria, Australia

Email us

info@domain.com / example@domain.com

Phone support

Phone: + (066) 0760 0260 / + (057) 0760 0560

Loading Events

« All Events

  • This event has passed.
Event Series Event Series: Special Seminar Series

Universal Learning for Decision-Making | Moise Blanchard

January 26 @ 11:00 am - 12:30 pm

We provide general-use decision-making algorithms under provably minimal assumptions on the data, using the universal learning framework. Classically, learning guarantees typically require two types of assumptions: (1) restrictions on target policies to be learned and (2) assumptions on the data-generating process. Instead, we show that we can provide consistent algorithms with vanishing regret compared to the best policy in hindsight, (1) irrespective of the optimal policy, known as universal consistency, and (2) well beyond standard i.i.d. or stationary assumptions on the data. We present our results for the classical online regression problem as well as for the contextual bandit problem, where the learner’s rewards depend on their selected actions and an observable context. This generalizes the standard multi-armed bandit to the case where side information is available, e.g., patients’ records or customers’ history, which allows for personalized treatment. Precisely, we give necessary and sufficient conditions on the context-generating process for universal consistency to be possible. Surprisingly, for finite action spaces, universally learnable processes are the same for contextual bandits as for the supervised learning setting, suggesting that going from full feedback (supervised learning) to partial feedback (contextual bandits) came at no extra cost in terms of learnability. We then show that there always exists an algorithm that guarantees universal consistency whenever this is achievable. In particular, such an algorithm is universally consistent under provably minimal assumptions: if it fails to be universally consistent for some context-generating process, then no other algorithm would succeed either. In the case of finite action spaces, this algorithm balances a fine trade-off between generalization (similar to structural risk minimization) and personalization (tailoring actions to specific contexts).

Bio: Moïse Blanchard is a final year PhD student at MIT, working with Prof. Patrick Jaillet. He obtained his MSc in applied mathematics as valedictorian of Ecole Polytechnique. His research focuses on algorithms for decision-making and statistical learning. His work has been recognized with a best student paper runner-up award at COLT and a best student paper award from the Informs TSL society.

Details

Date:
January 26
Time:
11:00 am - 12:30 pm
Series:
Event Category:
Event Tags:

Venue

3234 Matthews Ln
La Jolla, CA 92093 United States
+ Google Map

Organizer

HDSI General

Other

Format
Hybrid
Speaker
Moise Blanchard
Event Recording Link
http://bit.ly/HDSI-Seminars