Course Descriptions

IMPORTANT!

Course offering and course descriptions are subject to change.

Courses marked with an * are comprehensive exam eligible.

DSC 200: Data Science Programming

Computing structures and programming concepts such as object orientation, data structures such as queues, heaps, lists, search trees and hash tables. Laboratory skills include Jupyter notebooks, RESTful interfaces and various software development kits (SDKs).

DSC 202: Data Management for Data Science

Principles of data management, relational data model, relational algebra, SQL for data science, NoSQL Databases (document, key-value, graph, column-family), Multidimensional data management (data warehousing, OLAP Queries, OLAP Cubes, Visualizing multidimensional data).

Recommended Preparation: Multivariate calculus, optimization

DSC 203: Data Visualization and Scalable Visual Analytics

Commonly used algorithms and techniques in data visualization. Interactive reasoning and exploratory analysis through visual interfaces. Application of data visualization in various domains including science, engineering, and medicine. Scalable interactive methods involving exploring big data and visualization methods. Techniques to evaluate effectivity and interpretability of analytical products for diverse users to obtain insights in support of assessment, planning, and decision making.

Prerequisites: DSC 202

DSC 204A: Scalable Data Systems *

Storage/memory hierarchy, distributed scalable computing (cluster, cloud, edge) principles. Big Data storage, management and processing at scale. Dataflow programming systems and programming models (MapReduce/Hadoop and Spark).

Prerequisites: DSC 202

DSC 204B: Big Data Analytics Using Spark *

This course is a hands-on introduction to big data analytics. Topics covered include: I/O bottleneck and the memory hierarchy; HDFS and Spark; RMS error minimization, PCA and percent of variance explained. Analysis of NOAA weather data. Data collection and curation. Limitations of train/test methodology and leaderboards. Kmeans and intrinsic dimension. Classification, Boosting and XGBoost. Margins. Neural Networks and tensorflow. Students will develop the skills and attitudes required to write jupyter notebooks that can be understood by domain experts.

Prerequisites: DSC 200, 210, 212

Cross-listing: CSE 255

DSC 205: Geometry of Data

This course will cover graph-based data modeling, analysis and representation. Topics include: spectral graph theory, spectral clustering, kernel-based manifold learning, dimensionality reduction and visualization, multiway data analysis, graph signal processing, graph neural networks.

Prerequisites: (DSC 210 or ECE 269), DSC 212, DSC 240

Recommended Preparation: Matlab/Python coding, linear algebra, probability theory/statistics. Review basic linear algebra (inner products, orthogonality, eigen-decomposition) and probability theory (multivariate random variable, statistical independence, covariance)

DSC 206: Algorithms for Data Science *

With the advent of large-scale machine learning, online social networks, and computationally intensive models, data scientists must deal with data that is massive in size, arrives fast, and must be processed within an interactive or online manner. This course studies the mathematical foundations of massive data processing, developing algorithms and analyzing them. We explore methods for sampling, sketching, and distributed processing of large scale databases, clustering, dimensionality reduction, and methods of optimization for the purpose of scalable statistical description, querying, pattern mining, and learning from data.

Prerequisites: DSC 212

DSC 210: Numerical Linear Algebra

Linear algebraic systems, least squares problems and regularization, orthogonalization methods, ill-conditioned problems, eigenvalue and singular value decomposition, principal component analysis, structured matrix factorization and fast algorithms, randomized linear algebra, JL lemma, sparse approximations

DSC 211: Introduction to Optimization

Continuity and differentiability of a function of several variables, gradient vector, Hessian matrices, Taylor approximation, fundamentals of optimization, Lagrange multipliers, convexity, gradient descent.

Prerequisites: DSC 210

DSC 212: Probability and Statistics for Data Science

Probability, random variables, distributions, central limit theorem, maximum likelihood estimation, method of moments, confidence intervals, hypothesis testing, Bayesian estimation, introduction to simulation and the bootstrap.

DSC 213: Statistics on Manifolds

This is a graduate topics course covering statistics with manifold constraints. Topics include: Frechet means and variances, principal geodesic analysis, directional statistics, random fields on manifolds, statistical distances between distributions, transport problems, and information geometry. Manifold constraints will be considered on simplexes, spheres, Stiefel manifold, stratified manifolds, cone of positive definite matrices, trees, compositional data, and other relevant manifolds.

Prerequisites: DSC 210, DSC 212

Recommended Preparation: Differential geometry.

DSC 214: Topological Data Analysis

Topology provides a powerful way to describe essential features of functions and spaces. In recent years topological methods have attracted much attention for analyzing complex data. Fundamental developments have been made and the resulting methods have been applied in many fields, e.g., graphics, visualization, neuroscience and material science. This course introduces basic concepts and topological structures behind these developments, algorithms for them, and examples of applications.

Recommended Preparation: Linear Algebra and programming.

DSC 215: Statistical Thinking and Experimental Design

We hold science in high regard, however, not all scientific claims are correct. How do we know which claims to trust and which not to? This fundamental question is at the heart of this course. The goal of this course is to enable the student to evaluate any paper in data science, regardless of application area. Topics covered include experimental design, claims, evidence and statistical significance, The Replication Crisis, falsifiability, philosophy of science, history of probability and statistics. About half of thee class meetings, as well as the final project, would be devoted to evaluating contemporary papers in data science. This class will be in the form of an open discussion, based on provided reading materials. The only prerequisite is a statistics class that covered hypothesis testing and P-values.

Recommended Preparation: Hypothesis testing and p-values, basic statistics.

DSC 231: Embedded Sensing and IOT Data Models and Methods

Sensory data and control is mediated by devices near the edge of sensor networks, referred to as IOT (Internet of Things) devices. Components of IOT platforms: signal processing, communications/networking, control, real-time operating systems. Interfaces to cloud computing stack, publish-subscribe protocols such as MQTT, embedded software/middleware components, metadata schema, metadata normalization methods, applications in selected CPS (cyber-physical system) applications.

Recommended Preparation: embedded systems and embedded software, basic courses in digital hardware, algorithms and data structures, programming, and computer architecture

DSC 234: Data-Centric AI and AI Engineering (officially "Data Systems for ML")

DSC 234: Data-Centric AI and AI Engineering (officially “Data Systems for ML”); 4 units.

This is a research-based course on data-centric aspects of the AI lifecycle, spanning development, deployment, and maintenance of AI applications. It is at the intersection of the areas of ML/AI, data management, and software systems. AI has long been ubiquitous in domains such as enterprise analytics, recommendation systems, social media analytics, and domain sciences. The rise of LLMs has made AI chatbots, RAG, and agentic applications pervasive for consumers as well. Students will learn about the landscape and evolution of such systems, the latest research, and some major open questions. This is a lecture-driven course with learning evaluation based on written quizzes and exams, peer discussions on cutting edge research papers, and hands-on AI programming assignments. This course is aimed primarily at MS students interested in building real-world AI applications, as well as PhD students interested in research in this space.

Prerequisites:

1) A course on ML algorithms is absolutely required. The course could have been at UCSD or elsewhere.

2) Python programming knowhow is also required. Any course or module on Python programming from anywhere suffices.

Or industrial experience or substantial project experience with applied ML/AI and Python programming suffices instead of above courses.

An introductory course on databases/data management and a course on NLP or LLMs specifically are also highly recommended but NOT strictly required.

Restricted to DS75 and DS76 students.

DSC 240: Machine Learning *

A graduate level course in machine learning algorithms: decision trees, principal component analysis, k-means, clustering, logistic regression, random forests, boosting, neural networks, kernel methods, deep learning.

Prerequisites: DSC 210, 212

Recommended Preparation: multivariate calculus and optimization

DSC 241: Statistical Models *

Linear/nonlinear models, diagnostics, polynomial regression, robust methods, regularization and penalization (ridge regression, lasso, etc.), bootstrap, model selection (cross-validation), generalized linear models, nonparametric regression, linear classification, classification and regression trees, boosting, neural networks.

Prerequisites: DSC 210, 212

Recommended Preparation: Basic programming or prior exposure to programming language such as R, Python, Matlab, etc.

DSC 242: High-dimensional Probability and Statistics

Concentration inequalities, Markov processes and ergodicity, martingale inequalities, empirical processes, sparse linear models in high dimensions, principal component analysis in high dimensions, estimation of large covariance matrices.

Recommended Preparation: Undergraduate probability theory

DSC 243: Advanced Optimization *

Linear/quadratic programming, optimization under constraints, optimization on the space of probabilities, gradient descent (deterministic and stochastic), convergence rate of gradient descent, acceleration phenomena in convex optimization, stochastic optimization with large data sets, complexity lower bounds for convex.

Prerequisites: DSC 211, DSC 212

DSC 244: Large-Scale Statistical Analysis *

Large-scale hypothesis testing, Family-wise error rate (FWER) and false discovery rate (FDR) control and estimation, empirical null distribution, empirical covariance matrices, empirical Bayes methods.

Prerequisites: DSC 210, DSC 212, DSC 241

DSC 245: Intro to Causal Inference *

Causal versus predictive inference, potential outcomes and randomized experiments (A/B testing), structural causal models (interventions, counterfactuals, causal diagram, do-operator, d-separation), causal structure learning (constraint and score-based algorithms, and functional causal model-based methods), identification of causal effect (back-door and front-door criterion, do-calculus), estimation of causal effect (matching, propensity score, doubly robust estimation, instrumental variables, conditional effects), advanced topics (causal discovery and inference in the presence of distribution shifts, selection bias, hidden confounders, cycles, nonlinear causal mechanisms, missing values, and causal representation learning.)

Prerequisites: DSC 212, DSC 240

Recommended Preparation: Basic statistics and machine learning.

DSC 250: Advanced Data Mining

Graph mining and basic text analysis (including keyphrase extraction and generation), set expansion and taxonomy construction, graph representation learning, graph convolutional neural networks, heterogeneous information networks, label propagation, and truth findings.

Recommended Preparation: Knowledge about Machine Learning and Data Mining, coding with python, C/C++, Java; statistics

DSC 251: Machine Learning in Control

Estimation of stability and uncertainty, optimal control, and sequential decision making.

Prerequisites: DSC 211, DSC 240

Recommended Preparation: Probability theory.

DSC 252: Statistical Natural Language Processing

Diving deep to the classical NLP pipeline: tokenization, stemming, lemmatization, part-of-speech tagging, named entity recognition, parsing, and machine translation. Finite-state transducer, context-free grammar, Hidden Markov Models (HMM), and Conditional Random Fields (CRF) will be covered in detail.

Recommended Preparation: Introduction-level Machine Learning

DSC 253: Advanced Data-Driven Text Mining

Unsupervised, weakly supervised, and distantly supervised methods for text mining problems, including information retrieval, open-domain information extraction, text summarization (both extractive and generative), and knowledge graph construction. Bootstrapping, comparative analysis, learning from seed words and existing knowledge bases will be the key methodologies.

Recommended Preparation: Knowledge about Machine Learning and Data Mining, coding with python, C/C++, Java; statistics

DSC 254: Statistical Signal and Image Analysis

A graduate level course on signal and image analysis spanning three main themes. Statistical signal processing: random processes, stochasticity, stationarity, Wiener filter, Kalman filter, matched filter; signal processing: time-frequency representations, wavelets, signal processing with sparse representation (dictionary learning); image processing: registration, image degradation and restoration: noise models + denoising, image pyramids, random fields.

Prerequisites: (DSC 210 or ECE 269), DSC 212

Need to verify background in signal processing from undergraduate or other graduate courses.

Restricted to DS75 and DS76 students.

DSC 260: Data Science, Ethics, & Society

Data science is transforming our lives, and so we must consider the ethical and societal impacts, implications, and constraints in our data science efforts. This course will consider foundational concepts including power, justice, bias, privacy, and explainability; societal practices including delegation, organizational incentives, and accountability; and governance mechanisms including law, regulation, and norms.

DSC 261: Responsible Data Science

Data Science lifecycle, data cleaning and quality management, data profiling, causal inference, algorithmic fairness (fairness definitions, impossibility results, causal fairness, building fair ML models, fairness beyond classification), algorithmic transparency (interpretability vs explainability, auditing-black-box algorithms, algorithmic recourse).

Prerequisites: DSC 212, DSC 240

Recommended Preparation: Machine learning, causal inference, data management

DSC 270: Interpretable & Explainable Machine Learning

This course will cover advances in interpretable and explainable machine learning. Topics include modern methods to learn simple, transparent models (e.g., rule lists, decision trees, sparse linear models); methods for post-hoc explainability (e.g., SHAP, LIME, saliency maps, prototype-generation methods); and overarching concepts in the interpretation, description, explanation of machine learning (e.g., model multiplicity, cognitive biases, user studies).

Prerequisites: DSC 240.

Students should be comfortable reading research papers with a critical perspective; have prior course work in machine learning (e.g., DSC 240); and have some interest in interpretability and explainability for your research or career.

DSC 277: Statistical Analysis in Physics (DSC 291 - section 2: FA26)

The course combines statistical theory with physics practice, allowing students to understand both the mathematical underpinnings of statistical methods and how they are used in real-world particle physics experiments. Students will learn uncertainty quantification, hypothesis testing, frequentist and Bayesian fitting, and apply them to report rigorous scientific results on real physics data.

Prerequisite: DSC200 or Python Programming: students should know how to set up python environment, do python programming and how to use basic python tools (numpy, scipy etc.)

DSC 291 (WI23 F00): Topics in Data Science - Statistical ML for Physical Problems

The class will focus on implementations for physical problems from a statistical point of view. Topics: neural networks, physics Informed neural networks (PINN), Bayes , Gaussian probabilities, linear models for regression and classification, kernel methods, graphical models, mixture models, sampling methods, and sequential estimation.

Required Background: graduate standing.

DSC 291 (WI23 J00): Topics in Data Science - Mobile and Ubiquitous Computing

This course will introduce students to the field of mobile sensing and ubiquitous computing (Ubicomp) – an emerging area that has led to numerous disruptive new technologies including smartwatches, smart glasses, smart homes, smart cars, and the most importantly smartphones.

Required Background: The prerequisites of this course are a background in probability. Students are also expected to be familiar with Java, Python (or Matlab). There will be taster assignment called “Is this class right for me?” at the beginning of the course.

DSC 291 (SP23 A00): Topics in Data Science - Deep Learning in Mathematics

The course will discuss recent developments in Deep Learning and Machine Learning Theory and geared toward understanding properties of neural networks and their connections to kernel machines. The course will be loosely based on Professor Mikhail Belkin’s article, “Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation” published online by Cambridge University Press: 04 August 2021.

DSC 291 (SP23 E00): Topics in Data Science - Trustworthy Machine Learning

This course provides a mathematical introduction to trustworthy machine learning, and will help prepare you for research in this area — whether that is theoretical, empirical, or applied. Contents: Machine Learning basics (linear regression, loss function and optimization), Deep learning basics (neural networks and applications), training-time integrity, test-time integrity, robustness verification (provable guarantees and certificate), applications (interpretability, AI safety, fairness, security)

Required background: [DSC210 Linear algebra + DSC 240 Machine learning or equivalent]

DSC 291 (SP23 F00): Topics in Data Science - Data Science in Biomedicine

This course will be a project-based course in which students are exposed to myriad unsolved challenges in translating various forms of data into actionable and accountable health insights. Small groups will focus on different problems and gain insight from technical lectures, patient insights, and professional medical perspectives. Topics will include data integration, ground truthing, dashboards, and more.

DSC 291 (FA23 B00): Topics in Data Science - Causal Discovery & Causality-R

This course covers developments in causal discovery (i.e., causal structure learning) and how causal understanding facilitates solving fundamental machine learning (ML) problems. In particular, topics in causal discovery include constraint-based methods, score-based methods, functional causal model-based methods, and recent advances of causal discovery in the presence of distribution shifts, selection bias, hidden confounders, etc. For causality-related ML, tasks such as reinforcement learning, representation learning, transfer learning, and adversarial attacks are included.

DSC 293: Faculty Research Seminar

Prerequisite course(s): Faculty approval required.

Weekly faculty research seminar. Individual HDSI colloquia and distinguished lecturers may be included at the discretion of the instructor.

DSC 294: Research Rotation

Prerequisite course(s): PhD students only. Faculty approval required.

Special topics research under the direction of an HDSI faculty member. The research topics may include training in specific research methodologies consisting of practical laboratory skills, computational skills or proof systems in a research group/laboratory in which the student may pursue doctoral dissertation research.

DSC 295: Academia Survival Skills

Prerequisite course(s): PhD students only. Faculty approval required.

Basic skills necessary to succeed as a researcher in Data Science including scripting, cloud computing skills, fellowship proposal preparation, CV preparation, writing reviews, preparing posters, etc.

DSC 299: Graduate Research

Prerequisite course(s): For MS students on Plan 1: Thesis track and PhD students only. Faculty approval required.

DSC 500: Teaching Assistantship

Prerequisite course(s): Appointed students only.

DSC 599: Teaching Methods in Data Science

Prerequisite: Appointed TAs only. Required for all 1st time TA appointed students only.

This course examines theoretical and practical communication and teaching techniques particularly appropriate to data science. Practical training in teaching methods in the field of data science includes, though not limited to: expected TA duties, evaluation methods, rules governing TA appointments, conduct and evaluation, practice effective teaching strategies including communications with students and instructors, conduct of discussion sessions, formulating learning objectives and implementation of active learning strategies.

Contact Us

Find us

Email us

Phone support

IMPORTANT!

Course offering and course descriptions are subject to change.