Give us a call or drop by anytime, we endeavor to answer all inquiries within 24 hours.
PO Box 16122 Collins Street West Victoria, Australia
info@domain.com / example@domain.com
Phone: + (066) 0760 0260 / + (057) 0760 0560
Abstract:
Machine learning (ML) over tabular data has become ubiquitous with applications in many domains. This success has led to the rise of ML platforms, including automated ML (AutoML) platforms to manage the end-to-end ML workflow. The tedious grunt work involved in data preparation (prep) reduces data scientist productivity and slows down the ML development lifecycle, which makes the automation of data prep even more critical. While many works have looked into automating feature engineering and model selection in the end-to-end ML workflows, little attention has been paid to understanding the automated data prep for ML. Automating data prep remains challenging due to several reasons such as semantic gaps and lack of ways to objectively measure accuracy.
In this work, we aim to address these challenges by abstracting data prep in terms of ML-readiness properties of the data to help simplify and automate them. In the first part of the talk, we present how we leverage database schema information to reduce the burden in procuring datasets for ML. In the remaining part, we first discuss our vision of systematic benchmarking and automating ML data prep by formalizing them as applied ML tasks. We then present a case study of our approach on a key data prep task: ML feature type inference. Our approach not only outperforms state-of-the-art AutoML tools but also improves the performance of the downstream model. We conclude by discussing our research plans to tackle another major ML data prep task.
Vraj Shah is inviting you to a scheduled Zoom meeting.
Topic: Vraj Shah Thesis Proposal
Time: Dec 17, 2020 01:00 PM Pacific Time (US and Canada)
Join Zoom Meeting
https://ucsd.zoom.us/j/6922746284?pwd=SEtoYi9TQndtQVFuU0JubVpSSHB6dz09
Meeting ID: 692 274 6284
Password: proposal
One tap mobile
+16692192599,,6922746284# US (San Jose)
+16699006833,,6922746284# US (San Jose)
Dial by your location
+1 669 219 2599 US (San Jose)
+1 669 900 6833 US (San Jose)
+1 213 338 8477 US (Los Angeles)
Meeting ID: 692 274 6284
Find your local number: https://ucsd.zoom.us/u/apq1x0zGX