MSDS Course Requirements

Group A: Introductory Courses

Credit for a maximum of 4 courses

These courses seek to provide five critical foundational knowledge and skills that each student graduating from the master’s program is expected to receive at a graduate level: programming skills, data organization and methods skills, numerical linear algebra, multivariate calculus, probability and statistics.

The program is designed so that students lacking in any (and all) of these foundational knowledge and skills can take credit for a maximum of four courses from the following five courses:

DSC 200:  Data Science Programming; 4 units: 

Computing structures and programming concepts such as object orientation, data structures such as queues, heaps, lists, search trees and hash tables. Laboratory skills include Jupyte notebooks, RESTful interfaces and various software development kits (SDKs).

DSC 202: Data management for Data Science; 4 Unit: 

Principles of data management, relational data model, relational algebra, SQL for data science, NoSQL Databases (document, key-value, graph, column-family), Multidimensional data management (data warehousing, OLAP Queries, OLAP Cubes, Visualizing multidimensional data).

Recommended Preparation: Multivariate calculus, optimization

DSC 210: Numerical Linear Algebra; 4 units: 

Linear algebraic systems, least squares problems and regularization, orthogonalization methods, ill-conditioned problems, eigenvalue and singular value decomposition, principal component analysis, structured matrix factorization and fast algorithms, randomized linear algebra, JL lemma, sparse approximations

DSC 211: Introduction to Optimization; 4 units: 

Continuity and differentiability of a function of several variables, gradient vector, Hessian matrices, Taylor approximation, fundamentals of optimization, Lagrange multipliers, convexity, gradient descent.

Prerequisites: DSC 210

DSC 212: Probability and Statistics for Data Science; 4 units: 

Probability, random variables, distributions, central limit theorem, maximum likelihood estimation, method of moments, confidence intervals, hypothesis testing, Bayesian estimation, introduction to simulation and the bootstrap.

Group B: Core Courses

3 required courses, minimum of six courses

These courses build upon foundational courses. All students must take three required core courses and at least three out of the additional core courses.

Depending on academic preparation and with prior approval from the Graduate Program Committee/Chair, M.S. students may be approved to take an advanced course on Applied Statistics, such as MATH 282B in place  of DSC 241. Similarly, in place of DSC204A, a student may be approved to take a course on Algorithms, such as DSC 206: Algorithms for Data Science or CSE 202: Design and Analysis of Algorithms.

Courses marked with an * are comprehensive exam eligible.

Required Core Courses [minimum 3 courses]

DSC 240: Machine Learning; 4 units:

A graduate level course in machine learning algorithms: decision trees, principal component analysis, k-means, clustering, logistic regression, random forests, boosting, neural networks, kernel methods, deep learning.

Prerequisites: DSC 210, 212

Recommended Preparation: multivariate calculus and optimization

DSC 241: Statistical Models; 4 units:

Linear/nonlinear models, diagnostics, polynomial regression, robust methods, regularization and penalization (ridge regression, lasso, etc.), bootstrap, model selection (cross-validation), generalized linear models, nonparametric regression, linear classification, classification and regression trees, boosting, neural networks.

Prerequisites: DSC 210, 212

Recommended Preparation: Basic programming or prior exposure to programming language such as R, Python, Matlab, etc.

DSC 260: Data Ethics and Fairness; 4 units:

Data science is transforming our lives, and so we must consider the ethical and societal impacts, implications, and constraints in our data science efforts. This course will consider foundational concepts including power, justice, bias, privacy, and explainability; societal practices including delegation, organizational incentives, and accountability; and governance mechanisms including law, regulation, and norms.

Core Courses [minimum 3 courses]

DSC 203: Data Visualization and Scalable Visual Analytics; 4 units:

Commonly used algorithms and techniques in data visualization. Interactive reasoning and exploratory analysis through visual interfaces. Application of data visualization in various domains including science, engineering, and medicine. Scalable interactive methods involving exploring big data and visualization methods. Techniques to evaluate effectivity and interpretability of analytical products for diverse users to obtain insights in support of assessment, planning, and decision making.

Prerequisites: DSC 202

DSC 204A: Scalable Data Systems; 4 units:

Storage/memory hierarchy, distributed scalable computing (cluster, cloud, edge) principles. Big Data storage, management and processing at scale. Dataflow programming systems and programming models (MapReduce/Hadoop and Spark).

Prerequisites: DSC 202

DSC 204B: Big Data Analytics Using Spark; 4 units:

This course is a hands-on introduction to big data analytics. Topics covered include: I/O bottleneck and the memory hierarchy; HDFS and Spark; RMS error minimization, PCA and percent of variance explained. Analysis of NOAA weather data. Data collection and curation. Limitations of train/test methodology and leaderboards. Kmeans and intrinsic dimension. Classification, Boosting and XGBoost. Margins. Neural Networks and tensorflow. Students will develop the skills and attitudes required to write jupyter notebooks that can be understood by domain experts.

Prerequisites: DSC 200, 210, 212

Cross-listing: CSE 255

DSC 206: Algorithms for Data Science; 4 units:

With the advent of large-scale machine learning, online social networks, and computationally intensive models, data scientists must deal with data that is massive in size, arrives fast, and must be processed within an interactive or online manner. This course studies the mathematical foundations of massive data processing, developing algorithms and analyzing them. We explore methods for sampling, sketching, and distributed processing of large scale databases, clustering, dimensionality reduction, and methods of optimization for the purpose of scalable statistical description, querying, pattern mining, and learning from data.

Prerequisites: DSC 212

DSC 215: Statistical Thinking & Experimental Design; 4 units:

We hold science in high regard, however, not all scientific claims are correct. How do we know which claims to trust and which not to? This fundamental question is at the heart of this course. The goal of this course is to enable the student to evaluate any paper in data science, regardless of application area. Topics covered include experimental design, claims, evidence and statistical significance, The Replication Crisis, falsifiability, philosophy of science, history of probability and statistics. About half of thee class meetings, as well as the final project, would be devoted to evaluating contemporary papers in data science. This class will be in the form of an open discussion, based on provided reading materials. The only prerequisite is a statistics class that covered hypothesis testing and P-values.

Recommended Preparation: Hypothesis testing and p-values, basic statistics.

DSC 242: High-dimensional Probability and Statistics; 4 units:

Concentration inequalities, Markov processes and ergodicity, martingale inequalities, empirical processes, sparse linear models in high dimensions, principal component analysis in high dimensions, estimation of large covariance matrices.

Recommended Preparation: Undergraduate probability theory

DSC 243: Advanced Optimization; 4 units:

Linear/quadratic programming, optimization under constraints, optimization on the space of probabilities, gradient descent (deterministic and stochastic), convergence rate of gradient descent, acceleration phenomena in convex optimization, stochastic optimization with large data sets, complexity lower bounds for convex.

Prerequisites: DSC 211, DSC 212

DSC 244: Large-Scale Statistical Analysis; 4 units:

Large-scale hypothesis testing, Family-wise error rate (FWER) and false discovery rate (FDR) control and estimation, empirical null distribution, empirical covariance matrices, empirical Bayes methods.

Prerequisites: DSC 210, DSC 212, DSC 241

DSC 245: Introduction to Causal Inference; 4 units:

Causal versus predictive inference, potential outcomes and randomized experiments (A/B testing), structural causal models (interventions, counterfactuals, causal diagram, do-operator, d-separation), causal structure learning (constraint and score-based algorithms, and functional causal model-based methods), identification of causal effect (back-door and front-door criterion, do-calculus), estimation of causal effect (matching, propensity score, doubly robust estimation, instrumental variables, conditional effects), advanced topics (causal discovery and inference in the presence of distribution shifts, selection bias, hidden confounders, cycles, nonlinear causal mechanisms, missing values, and causal representation learning.)

Prerequisites: DSC 212, DSC 240

Recommended Preparation: Basic statistics and machine learning.

DSC 250: Advanced Data Mining; 4 units:

Graph mining and basic text analysis (including keyphrase extraction and generation), set expansion and taxonomy construction, graph representation learning, graph convolutional neural networks, heterogeneous information networks, label propagation, and truth findings.

Recommended Preparation: Knowledge about Machine Learning and Data Mining, coding with python, C/C++, Java; statistics

DSC 261: Responsible Data Science; 4 units:

Data Science lifecycle, data cleaning and quality management, data profiling, causal inference, algorithmic fairness (fairness definitions, impossibility results, causal fairness, building fair ML models, fairness beyond classification), algorithmic transparency (interpretability vs explainability, auditing-black-box algorithms, algorithmic recourse).

Prerequisites: DSC 212, DSC 240

Recommended Preparation: Machine learning, causal inference, data management

Group C: Elective Courses

Remaining Course Credit Requirements

MS students can take advantage of electives to complete their course of study. These courses can be Core Data Science subjects listed under Group B, DSC 299 Research Courses, DSC 291: Topics in Data Science courses or they can be graduate (or upper-division undergraduate) courses in other departments subject to approval by the student’s HDSI faculty advisor.

General Elective Courses:

Data Science Electives:

DSC 205: Geometry of Data; 4 units:

This course will cover graph-based data modeling, analysis and representation. Topics include: spectral graph theory, spectral clustering, kernel-based manifold learning, dimensionality reduction and visualization, multiway data analysis, graph signal processing, graph neural networks.

Prerequisites: (DSC 210 or ECE 269), DSC 212, DSC 240

Recommended Preparation: Matlab/Python coding, linear algebra, probability theory/statistics. Review basic linear algebra (inner products, orthogonality, eigen-decomposition) and probability theory (multivariate random variable, statistical independence, covariance)

DSC 213: Statistics on Manifolds; 4 units:

This is a graduate topics course covering statistics with manifold constraints. Topics include: Frechet means and variances, principal geodesic analysis, directional statistics, random fields on manifolds, statistical distances between distributions, transport problems, and information geometry. Manifold constraints will be considered on simplexes, spheres, Stiefel manifold, stratified manifolds, cone of positive definite matrices, trees, compositional data, and other relevant manifolds.

Prerequisites: DSC 210, DSC 212

Recommended Preparation: Differential geometry.

DSC 214: Topological Data Analysis; 4 units:

Topology provides a powerful way to describe essential features of functions and spaces. In recent years topological methods have attracted much attention for analyzing complex data. Fundamental developments have been made and the resulting methods have been applied in many fields, e.g., graphics, visualization, neuroscience and material science. This course introduces basic concepts and topological structures behind these developments, algorithms for them, and examples of applications.

Recommended Preparation: Linear Algebra and programming.

DSC 231: Embedded Sensing and IOT Data Models and Methods; 4 units:

Sensory data and control is mediated by devices near the edge of sensor networks, referred to as IOT (Internet of Things) devices. Components of IOT platforms: signal processing, communications/networking, control, real-time operating systems. Interfaces to cloud computing stack, publish-subscribe protocols such as MQTT, embedded software/middleware components, metadata schema, metadata normalization methods, applications in selected CPS (cyber-physical system) applications.

Recommended Preparation: embedded systems and embedded software, basic courses in digital hardware, algorithms and data structures, programming, and computer architecture

DSC 251: Machine Learning in Control; 4 units:

Estimation of stability and uncertainty, optimal control, and sequential decision making.

Prerequisites: DSC 211, DSC 240

Recommended Preparation: Probability theory.

DSC 252: Statistical Natural Language Processing; 4 units:

Diving deep to the classical NLP pipeline: tokenization, stemming, lemmatization, part-of-speech tagging, named entity recognition, parsing, and machine translation. Finite-state transducer, context-free grammar, Hidden Markov Models (HMM), and Conditional Random Fields (CRF) will be covered in detail.

Recommended Preparation: Introduction-level Machine Learning

DSC 253: Advanced Data-driven Text Mining; 4 units:

Unsupervised, weakly supervised, and distantly supervised methods for text mining problems, including information retrieval, open-domain information extraction, text summarization (both extractive and generative), and knowledge graph construction. Bootstrapping, comparative analysis, learning from seed words and existing knowledge bases will be the key methodologies.

Recommended Preparation: Knowledge about Machine Learning and Data Mining, coding with python, C/C++, Java; statistics

DSC 254: Statistical Signal and Image Analysis; 4 units.

A graduate level course on signal and image analysis spanning three main themes. Statistical signal processing: random processes, stochasticity, stationarity, Wiener filter, Kalman filter, matched filter ; Signal processing: time-frequency representations, wavelets, signal processing with sparse representation (dictionary learning) ; Image processing: registration, image degradation and restoration: noise models + denoising, image pyramids, random fields

Prerequisites: (DSC 210 or ECE 269), DSC 212, DSC 220

Possible electives from other disciplines:

MATH 281A-B-C: Mathematical Statistics (4-4-4 units). 

Math 281A consists of statistical models, sufficiency, efficiency, optimal estimation, least squares and maximum likelihood, large sample theory. Math 281B continues and discusses Hypothesis testing and confidence intervals, one-sample and two-sample problems. Bayes theory, statistical decision theory, linear models and regression. Math 281C finished the sequence with nonparametrics: tests, regression, density estimation, bootstrap and jackknife.

MATH 284: Survival Analysis; 4 units. 

Survival analysis is an important tool in many areas of applications including biomedicine, economics, engineering. It deals with the analysis of time to events data with censoring. This course discusses the concepts and theories associated with survival data and censoring, comparing survival distributions, proportional hazards regression, nonparametric tests, competing risk models, and frailty models. The emphasis is on semiparametric inference, and material is drawn from recent literature.

MATH 285. Stochastic Processes; 4 units. 

Elements of stochastic processes, Markov chains, hidden Markov models, martingales, Brownian motion, Gaussian processes. 

Recommended preparation: undergraduate probability theory.

MATH 287A. Time Series Analysis; 4 units. 

Discussion of finite parameter schemes in the Gaussian and non-Gaussian context. Estimation for finite parameter schemes. Linear vs. nonlinear time series. Stationary processes and their spectral representation. Spectral estimation. 

[Students who have not taken MATH 282A may enroll with consent of the instructor.]

COGS 243: Statistical Inference and data analysis; 4 units: 

This course provides a rigorous treatment of hypothesis testing, statistical inference, model fitting, and exploratory data analysis techniques used in the cognitive and neural sciences. Students will acquire an understanding of mathematical foundations and hands-on experience in applying these methods using Matlab. 

Cognitive science PhD students must enroll for four units and will be required to do assignments and a final project. All other students can enroll for two units and will be required to complete all assignments but not a final project (or by request of a project and no assignments).

Degree Completion Options

The student must sign up for a minimum of 8 and a maximum of 12 units of DSC 299 (Thesis Research) which can also be used to meet Group C requirements. The student will perform thesis research under the guidance of a thesis advisor and a thesis committee consisting of at least three members. The Master thesis produced by the student must be approved by the committee. It is required that at least two members of the committee are members of the HDSI faculty council; one of the three committee members can be an HDSI industry fellow with an adjunct appointment or a faculty member drawn from another department or division. Chair of the committee shall be appointed by the Graduate program committee. Alternatively, the industry fellow may be requested to serve as the fourth member of the committee. The committee must be approved by the Graduate Division by the end of the fourth quarter in the MS program.

The students are required to successfully pass the comprehensive examination in three courses drawn from each of the three areas: Computing (ML), Math/Statistics, Systems (Algorithms/CS). Students are permitted up to five attempts; that is, five different courses.

No more than three course-hosted comprehensive examinations can be taken in a single quarter, and no comprehensive examination can be repeated in a single quarter. The courses marked for comprehensive examination can be taken only for a letter grade. In order to receive credit for a comprehensive exam, students must also receive a passing grade (B- or higher) in the course.

Courses eligible for the comprehensive are listed below.  Not all of these will be offered each year. This list will also potentially be updated. 

  • DSC 240: Machine Learning  (Computing)
  • DSC 241: Statistical models (Math/Stats)
  • DSC 204A: Scalable systems (Systems)
  • DSC 206:  Algorithms (Systems)
  • DSC 204B: Big Data Analytics & Applications: (Systems or Math/Stats)
  • DSC 243: Advanced Optimization (Computing or Math/Stats)
  • DSC 244: Large-Scale Statistical Analysis (Math/Stats)
  • DSC 245: Introduction to Causal Inference (Systems or Math/Stats)