# Course Descriptions and Prerequisites

## Lower Division Data Science Courses

### Introduction and Overview

Concepts of data and its role in science will be introduced, as well as the ideas behind data-mining, text-mining, machine learning, and graph theory, and how scientists and companies are leveraging those methods to uncover new insights into human cognition.

Prerequisites:

- None

### Data, Inference, Prediction, and Computation

The following four courses together provide the basic skills to deal with data statistically and computationally.

This introductory course develops computational thinking and tools necessary to answer questions that arise from large-scale datasets. This course emphasizes an end-to-end approach to data science, introducing programming techniques in Python that cover data processing, modeling, and analysis.

*Extended description*: First, how can data be extracted that describes real-world phenomenon? This part of the course includes data collection, processing, and cleaning (“munging”), and dealing with formatted and semi-formatted data (e.g. json). Second, how can data be modeled, and used to make predictions? This includes methods in regression and classification, and experimental design. And third, how can the results of this analysis be understood and reasoned about? This includes topics in visualization, and methods for hypothesis testing and validation. The course will involve hands-on analysis of a variety of real-world datasets, including economic data, document collections, geographical data and social networks.

Prerequisites:

- None

Provides an understanding of the structures that underlie the programs, algorithms, and languages used in data science by expanding the repertoire of computational concepts introduced in DSC 10 and exposing students to techniques of abstraction. Course will be taught in Python and will cover topics including recursion, higher-order functions, function composition, object-oriented programming, interpreters, classes and simple data structures such as arrays, lists and linked lists.

Prerequisites:

- DSC 10

Builds on topics covered in DSC 20 and provides practical experience in composing larger computational systems through several significant programming projects using Java. Students will study advanced programming techniques including encapsulation, abstract data types, interfaces, algorithms and complexity, and data structures such as stacks, queues, priority queues, heaps, linked lists, binary trees, binary search trees and hash tables.

Prerequisites:

- DSC 20

Students will explore topics and tools relevant to the practice of data science in a workshop format. The instructor works with students on guided projects to help students acquire knowledge and skills to complement their coursework in the core data science classes. Two credit course. Pass/No Pass only.

Corequisites:

- DSC 10 (or previous completion of DSC 10)

### Data Meets Theory

The following two courses provide a theoretical foundation for data science at the lower-division level.

This course, the first of a two-course sequence (DSC 40A and DSC 40B), will introduce the theoretical foundations of data science. Students will become familiar with mathematical language for expressing data analysis problems and solution strategies, and will receive training in probabilistic reasoning, mathematical modeling of data, and algorithmic problem solving. DSC 40A will introduce fundamental topics in machine learning, statistics and linear algebra with applications to data analysis.

*Extended description*: DSC 40 A-B connect to DSC 10, 20 and 30 by providing the theoretical foundation for the methods that underlie data science.

Prerequisites:

- DSC 10
- MATH 18
*or*31AH - MATH 20C
*or*31BH

This course will introduce the theoretical foundations of data science. Students will become familiar with mathematical language for expressing data analysis problems and solution strategies, and will receive training in probabilistic reasoning, mathematical modeling of data, and algorithmic problem solving. DSC 40B introduces fundamental topics in combinatorics, graph theory, probability, and continuous and discrete algorithms with applications to data analysis.

Prerequisites:

- DSC 20
- DSC 40A

### Practice and Application of Data Science

The marriage of data, computation, and inferential thinking, or “data science”, is redefining how people and organizations solve challenging problems and understand the world. This intermediate level class bridges between DSC 10, 20 and 30 and upper division data science courses as well as methods courses in other fields. Students master the data science life-cycle and learn many of the fundamental principles and techniques of data science spanning algorithms, statistics, machine learning, visualization, and data systems.

*Extended description*: Compared to DSC 10, 20 and 30, this class adopts an end-to-end approach to data science, focused on building large-scale, working systems on real data, and putting knowledge from previous courses into practice. Skills and expertise developed in this course enable students to pursue careers in data science or apply it to research.

Prerequisites:

- DSC 30
- DSC 40A

## Upper Division Data Science Courses

### Data Science Upper Division Core

This course is an introduction to storage and management of large-scale data using classical relational (SQL) systems, with an eye toward applications in Data Science. The course covers topics including the SQL data model and query language, relational data modeling and schema design, elements of cost-based query optimizations, relational data base architecture, and database-backed applications.

Prerequisites*: *

- DSC 40B
- DSC 80

Previously listed as DSC 110

This course introduces the principles of computing systems and infrastructure for scaling analytics to large datasets. Topics include memory hierarchy, distributed systems, model selection, heterogeneous datasets and deployment at scale. The course will also discuss the design of systems such as MapReduce/Hadoop and Spark, in conjunction with their implementation. Students will also learn how dataflow operations can be used to perform data preparation, cleaning, and feature engineering.

Prerequisites:

- DSC 100

Previously listed as DSC 120

Data visualization helps explore and interpret data through interaction. This course introduces the principles, techniques and algorithms for creating effective visualizations. The course draws on knowledge from several disciplines including computer graphics, human-computer interaction, cognitive psychology, design, and statistical graphics and synthesizes relevant ideas. Students will design visualization systems using D3 or other web-based software and evaluate their effectiveness.

Prerequisites*: *

- DSC 80

Previously listed as DSC 170L

The course covers learning and using probabilistic models for knowledge representation and decision making. Topics covered include graphical models, temporal models, and online learning, as well as applications to natural language processing, adversarial learning, and computational biology, and robotics.

Course Prerequisites:

- DSC 40B
- DSC 80
- ECE 109
*or*ECON 120A*or*MAE 108*or*MATH 180A*or*Math 183*or*Math 186

This course substitutes the CORE CSE 150A Requirement. Students cannot receive **major** credit for both CSE 150A and DSC 140A, as only one course can fulfill this major core requirement.

In this two course sequence students will investigate a topic and design a system to produce statistically informed output. The investigation will pan the entire lifecycle, including assessing the problem, learning domain knowledge, collecting/cleaning data, creating a model, addressing ethical issues, designing the system, analyzing the output, and presenting the results. 180A deals with research, methodology and system design. Students will produce a research summary and project proposal.

Prerequisites:

- DSC 102 (For the FA20 Quarter, this prerequisite will not be enforced)
- Math 189
- CSE 151A or CSE 158 or DSC 190 A00 (SP20)

Previously listed as DSC 196A

In this two course sequence students will investigate a topic and design a system to produce statistically informed output. The investigation will pan the entire lifecycle, including assessing the problem, learning domain knowledge, collecting/cleaning data, creating a model, addressing ethical issues, designing the system, analyzing the output, and presenting the results. 180B will consist of implementing the project while studying the best practices for evaluation.

Prerequisites:

- DSC 106
- DSC 180

Previously listed as DSC 196B

This course mainly focuses on introducing current methods and models that are useful in analyzing and mining real-world data. It will cover frequent pattern mining, regression & classification, clustering, and representation learning. No previous background in machine learning is required, but all participants should be comfortable with programming, and with basic optimization and linear algebra.

This course substitutes the CORE CSE 158 Requirement. Students cannot receive **major** credit for both CSE 158 and DSC 190 A00-SP20/WI21, as only one course can fulfill this major core requirement.

MAJORS and MINORS: Degree audits will be updated accordingly after completion of the course.

Prerequisites:

- CSE 12
*or*DSC 40B - CSE 15L
*or*DSC 80 - CSE 103
*or*ECE 109*or*ECON 120A*or*MATH 183*or*MATH 181A

Machine learning with an emphasis on practical issues. Topics include supervised and unsupervised learning, feature selection, evaluating performance, and specific learning algorithms, such as k-nearest neighbor classifiers, decision trees, boosting, perceptrons, k-means and hierarchical clustering.

Prerequisites:

- CSE 12
*or*DSC 40B - CSE 15L
*or*DSC 80 - CSE 103
*or*MATH 181A*or*ECE 109*or*MATH 183*or*ECON 120A - MATH 18
*or*MATH 31AH

### Data Science Electives

The course will introduce a variety of No-SQL data formats, data models, high-level query languages and programming abstractions representative of the needs of modern data analytic tasks. Topics include hierarchical graph database systems, unrestricted graph database systems, array databases, comparison of expressive power of the data models, and parallel programming abstractions, including Map/Reduce and its descendants.

Prerequisites:

- DSC 102
- DSC 100 with EASy Request Approval

This course will focus on ideas from classical and modern signal processing, with the main themes of sampling continuous data and building informative representations using orthonormal bases, frames, and data dependent operators. Topics include sampling theory, Fourier analysis, lossy transformations and compression, time and spatial filters, and random Fourier features and connections to kernel methods. Sources of data include time series and streaming signals and various imaging modalities.

Prerequisites:

- Math 18
- Math 20C
- DSC 40B

Rigorous treatment of principal component analysis, one of the most effective methods in finding signals amidst the noise of large data arrays. Topics include singular value decomposition for matrices, maximal likelihood estimation, least squares methods, unbiased estimators, random matrices, Wigner’s semicircle law, Markchenko-Pastur laws, universality of eigenvalue statistics, outliers, the BBP transition, applications to community detection, and stochastic block model.

Prerequisites:

- MATH 180A
- MATH 18
*or*MATH 31AH

Completion of MATH 102 is encouraged but not required. Students will not receive credit for both MATH 182 and DSC 155.

This course addresses the intersection of data science and contemporary arts and culture, exploring four main themes of authorship, representation, visualization, and data provenance. The course is not solely an introduction to data science techniques, nor merely an arts practice course, but explores significant new possibilities for both fields arising from their intersection. Students will examine problems from complementary perspectives of artist-researchers and data scientists.

Prerequisites:

- DSC 80

This class explores statistical and computational methods to enable students to use text as a data source in the social sciences. Hands-on examples will equip students to work with text data in final projects.

Prerequisites:

- ECON 5
*or*POLI 170A*or*POLI 171*or*POLI 30*or*POLI 30D*or*POLI 5*or*POLI 5D. Students will not receive credit for both POLI 176 and DSC 161. Restricted to junior and senior level students.

This course examines the greater context under which the practice of Data Science exists and explores concrete ways these issues surface in technical work. Students learn frameworks for understanding how individuals relate to social institutions, how to use them to identify how issues of fairness arise in the “life of a Data Scientist”, and use them to propose and critique potential solutions.

Prerequisites:

- DSC 80

Spatial data science is a set of concepts and methods that deal with accessing, managing, visualizing, analyzing and reasoning about spatial data in applications where location, shape and size of objects, and their mutual arrangement, are important. This upper division course explores advanced data science concepts for spatial data, introducing students to principles and techniques of spatial data analysis, including geographic information systems, spatial big data management, and geostatistics.

Prerequisites:

- DSC 80

This course is a topic of special interest in data science. Topics vary from quarter to quarter. May be taken for credit up to four times. Prerequisites: Restricted to junior and senior level students. Department and instructor approval is required to monitor enrollment and to ensure that students have the sufficient educational background for a given topic.

This course demonstrates application of Data Science in various industry areas. Students are familiarized with real-life business questions from industry experts, and learn how experts deal with them. The course nurtures creativity and informed decision-making by illustrating how to understand business problems, data sets, and select suitable tools.

This course is an ELECTIVE.

MAJORS and MINORS: The course should be automatically applied as a DSC Elective on your degree audit.

Prerequisites:

- DSC 80 (Strongly Recommended: DSC 102)

Introduction to Robotics Perception and Navigation is a fast-paced Data Science-centric robotics class where students are divided into teams of three to work on increasingly difficult challenges resulting in a robotics scale car navigating autonomously on simulated city-like tracks (indoor and outdoors) while avoiding obstacles. We will be using a robotics industry-relevant framework called ROS (Robot Operating System).

Students are not expected to have previous background in robotics, mechanical, electrical, or embedded computer science. This class is in association with DSC 180A – Automated Reasoning Applied to Physical Systems.

This course is an ELECTIVE.

MAJORS and MINORS: The course should be automatically applied as a DSC Elective on your degree audit.

Prerequisites: Students will fill out a Google Form and the instructor will determine course preparation. Students should have completed at least through DSC 80.

Advanced topics in algorithms and computer science theory for data scientists. Introduces the algorithms and data structures used to query and organize very large data sets. Tentative topics include dynamic programming, string matching, locality sensitive hashing, sketching algorithms, sorting in linear time, bloom filters, disjoint set forests, KD-trees, and an introduction to computational complexity theory and its implications for machine learning.

This course is an ELECTIVE.

MAJORS and MINORS: The course should be automatically applied as a DSC Elective on your degree audit.

Prerequisites:

- DSC 40B

## Graduate Level Courses

**Geometry of Data with Prof. Gal Mishne**

This course focuses on unsupervised machine learning methods for analysis of high-dimensional data, covering topics on data exploration, visualization, organization, and dimensionality reduction. Topics include: spectral graph theory, spectral clustering, kernel-based manifold learning. The course will also review the role graphs play in modern signal processing and machine learning, for example, signal processing on graphs and learning representations with deep neural networks.