**Course Information**

## Lower Division Data Science Courses

### Introduction and Overview

Concepts of data and its role in science will be introduced, as well as the ideas behind data-mining, text-mining, machine learning, and graph theory, and how scientists and companies are leveraging those methods to uncover new insights into human cognition.

Prerequisites: None

### Data, Inference, Prediction, and Computation

The following four courses together provide the basic skills to deal with data statistically and computationally.

Students will explore topics and tools relevant to the practice of data science in a workshop format. The instructor works with students on guided projects to help students acquire knowledge and skills to complement their coursework in the core data science classes. Two credit course. Pass/No Pass only.

Corequisites: DSC 10 (or previous completion of DSC 10)

This introductory course develops computational thinking and tools necessary to answer questions that arise from large-scale datasets. This course emphasizes an end-to-end approach to data science, introducing programming techniques in Python that cover data processing, modeling, and analysis.

*Extended description*: First, how can data be extracted that describes real-world phenomenon? This part of the course includes data collection, processing, and cleaning (“munging”), and dealing with formatted and semi-formatted data (e.g. json). Second, how can data be modeled, and used to make predictions? This includes methods in regression and classification, and experimental design. And third, how can the results of this analysis be understood and reasoned about? This includes topics in visualization, and methods for hypothesis testing and validation. The course will involve hands-on analysis of a variety of real-world datasets, including economic data, document collections, geographical data and social networks.

Prerequisites: None

Provides an understanding of the structures that underlie the programs, algorithms, and languages used in data science by expanding the repertoire of computational concepts introduced in DSC 10 and exposing students to techniques of abstraction. Course will be taught in Python and will cover topics including recursion, higher-order functions, function composition, object-oriented programming, interpreters, classes and simple data structures such as arrays, lists and linked lists.

Prerequisites: DSC 10

Builds on topics covered in DSC 20 and provides practical experience in composing larger computational systems through several significant programming projects using Java. Students will study advanced programming techniques including encapsulation, abstract data types, interfaces, algorithms and complexity, and data structures such as stacks, queues, priority queues, heaps, linked lists, binary trees, binary search trees and hash tables.

Prerequisites: DSC 20

### Data Meets Theory

The following two courses provide a theoretical foundation for data science at the lower-division level.

This course, the first of a two-course sequence (DSC 40A and DSC 40B), will introduce the theoretical foundations of data science. Students will become familiar with mathematical language for expressing data analysis problems and solution strategies, and will receive training in probabilistic reasoning, mathematical modeling of data, and algorithmic problem solving. DSC 40A will introduce fundamental topics in machine learning, statistics and linear algebra with applications to data analysis.

*Extended description*: DSC 40 A-B connect to DSC 10, 20 and 30 by providing the theoretical foundation for the methods that underlie data science.

Prerequisites: DSC 10 and MATH 20C or MATH 31BH and MATH 18 or MATH 20F or MATH 31AH. Restricted to students within the DS25 major. All other students will be allowed as space permits.

This course will introduce the theoretical foundations of data science. Students will become familiar with mathematical language for expressing data analysis problems and solution strategies, and will receive training in probabilistic reasoning, mathematical modeling of data, and algorithmic problem solving. DSC 40B introduces fundamental topics in combinatorics, graph theory, probability, and continuous and discrete algorithms with applications to data analysis.

Prerequisites: DSC 40A; Restricted to students within the DS25 major. All other students will be allowed as space permits.

### Practice and Application of Data Science

The marriage of data, computation, and inferential thinking, or “data science”, is redefining how people and organizations solve challenging problems and understand the world. This intermediate level class bridges between DSC 10, 20 and 30 and upper division data science courses as well as methods courses in other fields. Students master the data science life-cycle and learn many of the fundamental principles and techniques of data science spanning algorithms, statistics, machine learning, visualization, and data systems.

*Extended description*: Compared to DSC 10, 20 and 30, this class adopts an end-to-end approach to data science, focused on building large-scale, working systems on real data, and putting knowledge from previous courses into practice. Skills and expertise developed in this course enable students to pursue careers in data science or apply it to research.

Prerequisites: DSC 30, DSC 40A

## Upper Division Data Science Courses

### Data Science Upper Division Core

This course is an introduction to storage and management of large-scale data using classical relational (SQL) systems, with an eye toward applications in Data Science. The course covers topics including the SQL data model and query language, relational data modeling and schema design, elements of cost-based query optimizations, relational data base architecture, and database-backed applications.

Prerequisites** : **DSC 80 and DSC 40B.

Previously listed as DSC 110

This course introduces the principles of computing systems and infrastructure for scaling analytics to large datasets. Topics include memory hierarchy, distributed systems, model selection, heterogeneous datasets and deployment at scale. The course will also discuss the design of systems such as MapReduce/Hadoop and Spark, in conjunction with their implementation. Students will also learn how dataflow operations can be used to perform data preparation, cleaning, and feature engineering.

Prerequisites: DSC 100.

Previously listed as DSC 120

Data visualization helps explore and interpret data through interaction. This course introduces the principles, techniques and algorithms for creating effective visualizations. The course draws on knowledge from several disciplines including computer graphics, human-computer interaction, cognitive psychology, design, and statistical graphics and synthesizes relevant ideas. Students will design visualization systems using D3 or other web-based software and evaluate their effectiveness.

Prerequisites*: *DSC 80.

Previously listed as DSC 170L

In this two course sequence students will investigate a topic and design a system to produce statistically informed output. The investigation will pan the entire lifecycle, including assessing the problem, learning domain knowledge, collecting/cleaning data, creating a model, addressing ethical issues, designing the system, analyzing the output, and presenting the results. 180A deals with research, methodology and system design. Students will produce a research summary and project proposal.

Prerequisites: DSC 102 and Math 189 and (CSE 151 or CSE 158)

Previously listed as DSC 196A

In this two course sequence students will investigate a topic and design a system to produce statistically informed output. The investigation will pan the entire lifecycle, including assessing the problem, learning domain knowledge, collecting/cleaning data, creating a model, addressing ethical issues, designing the system, analyzing the output, and presenting the results. 180B will consist of implementing the project while studying the best practices for evaluation.

Prerequisites: DSC 106 and DSC 180A

Previously listed as DSC 196B

### Data Science Electives

The course will introduce a variety of No-SQL data formats, data models, high-level query languages and programming abstractions representative of the needs of modern data analytic tasks. Topics include hierarchical graph database systems, unrestricted graph database systems, array databases, comparison of expressive power of the data models, and parallel programming abstractions, including Map/Reduce and its descendants.

Prerequisites: DSC 102

This course will focus on ideas from classical and modern signal processing, with the main themes of sampling continuous data and building informative representations using orthonormal bases, frames, and data dependent operators. Topics include sampling theory, Fourier analysis, lossy transformations and compression, time and spatial filters, and random Fourier features and connections to kernel methods. Sources of data include time series and streaming signals and various imaging modalities.

Prerequisites: Math 18 and Math 20C and DSC 40B

Spatial data science is a set of concepts and methods that deal with accessing, managing, visualizing, analyzing and reasoning about spatial data in applications where location, shape and size of objects, and their mutual arrangement, are important. This upper division course explores advanced data science concepts for spatial data, introducing students to principles and techniques of spatial data analysis, including geographic information systems, spatial big data management, and geostatistics.

Prerequisites: DSC 80