Course Descriptions and Prerequisites
Lower Division Data Science Courses
Introduction and Overview
Concepts of data and its role in science will be introduced, as well as the ideas behind data-mining, text-mining, machine learning, and graph theory, and how scientists and companies are leveraging those methods to uncover new insights into human cognition.
Data, Inference, Prediction, and Computation
The following four courses together provide the basic skills to deal with data statistically and computationally.
This introductory course develops computational thinking and tools necessary to answer questions that arise from large-scale datasets. This course emphasizes an end-to-end approach to data science, introducing programming techniques in Python that cover data processing, modeling, and analysis.
Extended description: First, how can data be extracted that describes real-world phenomenon? This part of the course includes data collection, processing, and cleaning (“munging”), and dealing with formatted and semi-formatted data (e.g. json). Second, how can data be modeled, and used to make predictions? This includes methods in regression and classification, and experimental design. And third, how can the results of this analysis be understood and reasoned about? This includes topics in visualization, and methods for hypothesis testing and validation. The course will involve hands-on analysis of a variety of real-world datasets, including economic data, document collections, geographical data and social networks.
Data Meets Theory
The following two courses provide a theoretical foundation for data science at the lower-division level.
This course, the first of a two-course sequence (DSC 40A and DSC 40B), will introduce the theoretical foundations of data science. Students will become familiar with mathematical language for expressing data analysis problems and solution strategies, and will receive training in probabilistic reasoning, mathematical modeling of data, and algorithmic problem solving. DSC 40A will introduce fundamental topics in machine learning, statistics and linear algebra with applications to data analysis.
Extended description: DSC 40 A-B connect to DSC 10, 20 and 30 by providing the theoretical foundation for the methods that underlie data science.
- DSC 10
- MATH 18 or 31AH
- MATH 20C or 31BH
Practice and Application of Data Science
The marriage of data, computation, and inferential thinking, or “data science”, is redefining how people and organizations solve challenging problems and understand the world. This intermediate level class bridges between DSC 10, 20 and 30 and upper division data science courses as well as methods courses in other fields. Students master the data science life-cycle and learn many of the fundamental principles and techniques of data science spanning algorithms, statistics, machine learning, visualization, and data systems.
Extended description: Compared to DSC 10, 20 and 30, this class adopts an end-to-end approach to data science, focused on building large-scale, working systems on real data, and putting knowledge from previous courses into practice. Skills and expertise developed in this course enable students to pursue careers in data science or apply it to research.
- DSC 30
- DSC 40A
Upper Division Data Science Courses
Data Science Upper Division Core
This course is an introduction to storage and management of large-scale data using classical relational (SQL) systems, with an eye toward applications in Data Science. The course covers topics including the SQL data model and query language, relational data modeling and schema design, elements of cost-based query optimizations, relational data base architecture, and database-backed applications.
- DSC 40B
- DSC 80
Previously listed as DSC 110
Data Science Electives
The course will introduce a variety of No-SQL data formats, data models, high-level query languages and programming abstractions representative of the needs of modern data analytic tasks. Topics include hierarchical graph database systems, unrestricted graph database systems, array databases, comparison of expressive power of the data models, and parallel programming abstractions, including Map/Reduce and its descendants.
- DSC 102
- DSC 100 with EASy Request Approval
Graduate Level Courses
Data Science for Scientists and Engineers
The last decade has seen an explosion in the size and complexity of datasets. Many popular data analysis software packages: R, Matlab, SPSS, etc. are not adequate when the data size is many times the memory size. New packages such as Hadoop and Spark have been developed that process massive data using commodity computers in the cloud.
In this class you will learn the statistical and engineering foundations of big data Analytics.
This class has two pre-requisites:
- Probability and Statistics: Random Variables, Expectation, Standard Deviation, PCA, linear regression, t-tests.
- Python programming: numpy, pandas, matplotlib, jupyter notebooks.
To apply to this course, each student would need to fill in a google form. The instructor will accept/reject applicants based on these forms
- Engineering Data
- I/O limited computation and the memory bottleneck.
- Data parallel computation in the cloud.
- Map-reduce, NFS, Spark
- DataFrames and SQL in Spark.
- Data Analysis:
- Unsupervised learning: PCA
- Unsupervised learning: Manifolds, intrinsic dimension, and the graph Laplacian.
- Supervised learning: random forests and boosted trees
- Supervised learning: Neural Networks and TensorFlow.
- Stability and over-fitting.