Seminar: Cerebro: A Layered Data Platform for Scalable Deep Learning
March 19 @ 10:00 am - 11:00 am
Arun Kumar, UCSD
“Cerebro: A Layered Data Platform for Scalable Deep Learning”
Deep learning (DL) is gaining popularity across many domains thanks to tools such as TensorFlow and easier access to GPUs. But building large-scale DL applications is still too resource-intensive and painful for all but the big tech firms. A key reason for this pain is the expensive model selection process needed to get DL to work well. Existing DL systems treat this process as an afterthought, leading to massive resource wastage and a usability mess. To tackle these issues, we present our vision of a first-of-its-kind data platform for scalable DL, Cerebro, inspired by lessons from the database world (https://adalabucsd.github.io/cerebro.html ). We elevate the DL model selection process with higher-level APIs already inherent in practice and devise a series of novel multi-query optimization techniques to substantially raise resource efficiency. In turn, this can help improve scalability, reduce resource costs and energy usage, and improve DL user productivity. This talk will present our system design rationales and architecture, our recent research and empirical results, a discussion on how Cerebro is being applied to use cases in public health and enterprises, and our future research plans.
Overview paper (9pg): http://cidrdb.org/cidr2021/papers/cidr2021_paper25.pdf; Short video (10min): https://www.youtube.com/watch?v=8QfMvdlmdic
About the Speaker:
Arun Kumar is an Assistant Professor in the Department of Computer Science and Engineering and the Halicioglu Data Science Institute at the University of California, San Diego (https://adalabucsd.github.io). He is a member of the Database Lab and Center for Networked Systems and an affiliate member of the AI Group. His primary research interests are in data management and systems for machine learning/artificial intelligence-based data analytics. Systems and ideas based on his research have been released as part of the Apache MADlib open-source library, shipped as part of products from Cloudera, IBM, Oracle, and Pivotal, and used internally by Facebook, Google, LogicBlox, Microsoft, and other companies. He is a recipient of two SIGMOD research paper awards, a SIGMOD Research Highlight Award, three distinguished reviewer awards from SIGMOD/VLDB, the PhD dissertation award from UW-Madison CS, an NSF CAREER Award, a Hellman Fellowship, a UCSD oSTEM Faculty of the Year Award, and research award gifts from Google, Oracle, and VMware.