*Prerequisite course(s): None*

Computing structures and programming concepts such as object orientation, data structures such as queues, heaps, lists, search trees and hash tables. Laboratory skills include Jupyter notebooks, RESTful interfaces and various software development kits (SDKs).

*Prerequisite course(s): None*

Principles of data management, relational data model, relational algebra, SQL for data science, NoSQL Databases (document, key-value, graph, column-family), Multidimensional data management (data warehousing, OLAP Queries, OLAP Cubes, Visualizing multidimensional data).

*Prerequisite course(s): DSC 202*

Commonly used algorithms and techniques in data visualization. Interactive reasoning and exploratory analysis through visual interfaces. Application of data visualization in various domains including science, engineering, and medicine. Scalable interactive methods involving exploring big data and visualization methods. Techniques to evaluate effectivity and interpretability of analytical products for diverse users to obtain insights in support of assessment, planning, and decision making.

*Prerequisite course(s): DSC 202*

Storage/memory hierarchy, distributed scalable computing (cluster, cloud, edge) principles. Big Data storage, management and processing at scale. Dataflow programming systems and programming models (MapReduce/Hadoop and Spark).

*Prerequisite course(s): DSC 200, 210, 212*

This course is a hands-on introduction to big data analytics. Topics covered include: I/O bottleneck and the memory hierarchy; HDFS and Spark; RMS error minimization, PCA and percent of variance explained. Analysis of NOAA weather data. Data collection and curation. Limitations of train/test methodology and leaderboards. Kmeans and intrinsic dimension. Classification, Boosting and XGBoost. Margins. Neural Networks and tensorflow. Students will develop the skills and attitudes required to write jupyter notebooks that can be understood by domain experts.

*Prerequisite course(s): DSC 210 or ECE 269, DSC 212, DSC 240*

This course will cover graph-based data modeling, analysis and representation. Topics include: spectral graph theory, spectral clustering, kernel-based manifold learning, dimensionality reduction and visualization, multiway data analysis, graph signal processing, graph neural networks.

*Prerequisite course(s): DSC 212*

With the advent of large-scale machine learning, online social networks, and computationally intensive models, data scientists must deal with data that is massive in size, arrives fast, and must be processed within an interactive or online manner. This course studies the mathematical foundations of massive data processing, developing algorithms and analyzing them. We explore methods for sampling, sketching, and distributed processing of large scale databases, clustering, dimensionality reduction, and methods of optimization for the purpose of scalable statistical description, querying, pattern mining, and learning from data.

*Prerequisite course(s): None*

Linear algebraic systems, least squares problems and regularization, orthogonalization methods, ill-conditioned problems, eigenvalue and singular value decomposition, principal component analysis, structured matrix factorization and fast algorithms, randomized linear algebra, JL lemma, sparse approximations

*Prerequisite course(s): DSC 210*

Continuity and differentiability of a function of several variables, gradient vector, Hessian matrices, Taylor approximation, fundamentals of optimization, Lagrange multipliers, convexity, gradient descent

*Prerequisite course(s): None*

Probability, random variables, distributions, central limit theorem, maximum likelihood estimation, method of moments, confidence intervals, hypothesis testing, Bayesian estimation, introduction to simulation and the bootstrap.

*Prerequisite course(s): DSC 210, DSC 212*

This is a graduate topics course covering statistics with manifold constraints. Topics include: Frechet means and variances, principal geodesic analysis, directional statistics, random fields on manifolds, statistical distances between distributions, transport problems, and information geometry. Manifold constraints will be considered on simplexes, spheres, Stiefel manifold, stratified manifolds, cone of positive definite matrices, trees, compositional data, and other relevant manifolds.

*Prerequisite course(s): None*

Topology provides a powerful way to describe essential features of functions and spaces. In recent years topological methods have attracted much attention for analyzing complex data. Fundamental developments have been made and the resulting methods have been applied in many fields, e.g., graphics, visualization, neuroscience and material science. This course introduces basic concepts and topological structures behind these developments, algorithms for them, and examples of applications.

*Prerequisite course(s): None*

We hold science in high regard, however, not all scientific claims are correct. How do we know which claims to trust and which not to? This fundamental question is at the heart of this course. The goal of this course is to enable the student to evaluate any paper in data science, regardless of application area. Topics covered include experimental design, claims, evidence and statistical significance, The Replication Crisis, falsifiability, philosophy of science, history of probability and statistics. About half of thee class meetings, as well as the final project, would be devoted to evaluating contemporary papers in data science. This class will be in the form of an open discussion, based on provided reading materials. The only prerequisite is a statistics class that covered hypothesis testing and P-values.

*Prerequisite course(s): None*

Sensory data and control is mediated by devices near the edge of sensor networks, referred to as IOT (Internet of Things) devices. Components of IOT platforms: signal processing, communications/networking, control, real-time operating systems. Interfaces to cloud computing stack, publish-subscribe protocols such as MQTT, embedded software/middleware components, metadata schema, metadata normalization methods, applications in selected CPS (cyber-physical system) applications.

*Prerequisite course(s): DSC 210, 212*

A graduate level course in machine learning algorithms: decision trees, principal component analysis, k-means, clustering, logistic regression, random forests, boosting, neural networks, kernel methods, deep learning.

*Prerequisite course(s): DSC 210, 212*

Linear/nonlinear models, diagnostics, polynomial regression, robust methods, regularization and penalization (ridge regression, lasso, etc.), bootstrap, model selection (cross-validation), generalized linear models, nonparametric regression, linear classification, classification and regression trees, boosting, neural networks.

*Prerequisite course(s): None*

Concentration inequalities, Markov processes and ergodicity, martingale inequalities, empirical processes, sparse linear models in high dimensions, principal component analysis in high dimensions, estimation of large covariance matrices.

*Prerequisite course(s): DSC 211, DSC 212*

Linear/quadratic programming, optimization under constraints, optimization on the space of probabilities, gradient descent (deterministic and stochastic), convergence rate of gradient descent, acceleration phenomena in convex optimization, stochastic optimization with large data sets, complexity lower bounds for convex.

*Prerequisite course(s): DSC 210, DSC 212, DSC 241*

Large-scale hypothesis testing, Family-wise error rate (FWER) and false discovery rate (FDR) control and estimation, empirical null distribution, empirical covariance matrices, empirical Bayes methods.

*Prerequisite course(s): DSC 212, DSC 240*

Causal versus predictive inference, potential outcomes and randomized experiments (A/B testing), structural causal models (interventions, counterfactuals, causal diagram, do-operator, d-separation), causal structure learning (constraint and score-based algorithms, and functional causal model-based methods), identification of causal effect (back-door and front-door criterion, do-calculus), estimation of causal effect (matching, propensity score, doubly robust estimation, instrumental variables, conditional effects), advanced topics (causal discovery and inference in the presence of distribution shifts, selection bias, hidden confounders, cycles, nonlinear causal mechanisms, missing values, and causal representation learning.)

*Prerequisite course(s): None*

Graph mining and basic text analysis (including keyphrase extraction and generation), set expansion and taxonomy construction, graph representation learning, graph convolutional neural networks, heterogeneous information networks, label propagation, and truth findings.

*Prerequisite course(s): DSC 211, DSC 240*

Estimation of stability and uncertainty, optimal control, and sequential decision making.

*Prerequisite course(s): None*

*Diving deep to the classical NLP pipeline: tokenization, stemming, lemmatization, part-of-speech tagging, named entity recognition, parsing, and machine translation. Finite-state transducer, context-free grammar, Hidden Markov Models (HMM), and Conditional Random Fields (CRF) will be covered in detail.*

*Prerequisite course(s): None*

Unsupervised, weakly supervised, and distantly supervised methods for text mining problems, including information retrieval, open-domain information extraction, text summarization (both extractive and generative), and knowledge graph construction. Bootstrapping, comparative analysis, learning from seed words and existing knowledge bases will be the key methodologies.

*Prerequisite course(s): DSC 210 or ECE 269, DSC 212, DSC 220*

A graduate level course on signal and image analysis spanning three main themes. Statistical signal processing: random processes, stochasticity, stationarity, Wiener filter, Kalman filter, matched filter ; Signal processing: time-frequency representations, wavelets, signal processing with sparse representation (dictionary learning) ; Image processing: registration, image degradation and restoration: noise models + denoising, image pyramids, random fields

*Prerequisite course(s): None*

Data science is transforming our lives, and so we must consider the ethical and societal impacts, implications, and constraints in our data science efforts. This course will consider foundational concepts including power, justice, bias, privacy, and explainability; societal practices including delegation, organizational incentives, and accountability; and governance mechanisms including law, regulation, and norms.

*Prerequisite course(s): DSC 212, DSC 240*

Data Science lifecycle, data cleaning and quality management, data profiling, causal inference, algorithmic fairness (fairness definitions, impossibility results, causal fairness, building fair ML models, fairness beyond classification), algorithmic transparency (interpretability vs explainability, auditing-black-box algorithms, algorithmic recourse).

*Prerequisite course(s*

*Prerequisite course(s): *

*Prerequisite course(s): Faculty approval required.*

Weekly faculty research seminar. Individual HDSI colloquia and distinguished lecturers may be included at the discretion of the instructor.

*Prerequisite course(s): PhD students only. Faculty approval required.*

Special topics research under the direction of an HDSI faculty member. The research topics may include training in specific research methodologies consisting of practical laboratory skills, computational skills or proof systems in a research group/laboratory in which the student may pursue doctoral dissertation research.

*Prerequisite course(s): PhD students only. Faculty approval required.*

Basic skills necessary to succeed as a researcher in Data Science including scripting, cloud computing skills, fellowship proposal preparation, CV preparation, writing reviews, preparing posters, etc.

*Prerequisite course(s): For MS students on Plan 1: Thesis track and PhD students only. Faculty approval required.*

*Prerequisite course(s): Appointed students only.*

*Prerequisite course(s): Appointed students only. *

Expected TA duties, evaluation methods. Rule governing TA appointment, conduct and evaluation. Practice effective teaching strategies including communications with students and instructors, conduct of discussion sessions, formulating learning objectives and implementation of active learning strategies.