PhD Program Overview

Welcome to the HDSI PhD Graduate Program! 

The goal of the doctoral program is to create leaders in the field of Data Science who will lay the foundation and expand the boundaries of knowledge in the field. The doctoral program aims to provide a research-oriented education to students, teaching them knowledge, skills and awareness required to perform data driven research, and enabling them to, using this shared background, carry out research that expands the boundaries of knowledge in Data Science. The doctoral program spans from foundational aspects, including computational methods, machine learning, mathematical models and statistical analysis, to applications in data science.

Curriculum

PhD Course Requirements

Group A courses are introductory level graduate courses in the foundational areas of data science. Group B are core graduate level courses with prerequisites from Group A courses. Group C are advanced, specialized and free-standing courses, often part of the required courses in the Data Science specialization of the Graduate Program in other departments. In all three groups, required courses are indicated as such; they can not be substituted by other courses without exception approval from the graduate program committee.

The doctoral program is structured as a total of 52 units in courses from these group A, B, and C as described in detail here. Out of the 52 units, 48 units (or 12 courses) must be taken for letter grade and at least 40 units must be using graduate-level courses.

The remaining 4 (= 52 – 48) units are for professional preparation, consisting of 1 unit of faculty research seminar, 2 units of TA/tutor training and 1 unit of survival skills course taken for a passing (satisfactory) grade. Finally, as mentioned earlier, out of the 12 regular courses, at least 10 must be graduate-level courses; at most two can be upper-level undergraduate courses. 36 units or 9 courses must be completed within six quarters from the start of the degree program.

Group A: Preparatory Knowledge and Skill Areas There are five important knowledge and skills necessary for understanding (and advancing) core data science. It is, therefore, important that all our entering students either have background preparation or have courses available in the program to ensure a successful completion of the stipulated doctoral degree program. A student can receive credit towards the Ph.D. degree for a maximum of three courses from the list of courses below

Group B: Core Knowledge and Skill Areas

Doctoral students are required to take a minimum of 6 courses for letter-grade credit from Group B courses. Students can take more than 6 courses from this group to satisfy letter grade course requirements except (satisfactory completion of professional preparation) teaching, survival skills and research seminar courses. Students who satisfy all letter-grade course requirements are expected to enroll into individual research (DSC 298) in a section offered by the faculty advisor to meet residency requirements and maintain graduate student standing during the period of dissertation research.

Four core courses are required for all Ph.D. students, including those with a Bachelors in Data Science. The four required courses are:

Group C:  Professional Preparation and Elective Courses 

Group C courses aim to provide either practical experiences in chosen specialization areas, or advanced training for students preparing for doctoral programs. The courses include required professional preparation courses: 2 unit TA/tutor training (DSC 599), 1 unit of academic survival skills (DSC 295) and 1 unit faculty research seminar (DSC 293), all of which must be completed with a Satisfactory (S) grade using the S/U option.

Research Rotation

Research rotations provide the opportunity for first-year PhD students to obtain research experience in data analysis under the guidance of HDSI affiliated faculty members. Through the rotations, students can identify a faculty member under whose supervision their dissertation research will be completed.

A research rotation is a guided research experience lasting one quarter (10 weeks) obtained by registering for DSC 294 (4 units) with an instructor. All PhD students will participate in

a minimum of 2 research rotations during their first year, and with a minimum of two different faculty members, and as much as three rotations including summer quarter. A student may rotate twice under the same faculty member as long as they rotate with at least two faculty members. The goal is to expose students to new methodological approaches or domain knowledge and help them find an advisor for their PhD research.

Please refer to the HDSI Faculty page for a list of HDSI faculty members and their research interests. Students should reach out to faculty members to learn more about potential rotation opportunities.

An important goal of HDSI is to foster interdisciplinary collaboration. Specifically, collaboration between method researchers and domain researchers. Domain researchers specialize in a particular scientific field and collect or generate large datasets. Domain experts can partner with HDSI faculty and jointly advise a PhD. student. The site matchpoint.com was designed to facilitate the creation of such collaborations.

Outcome of each rotation program: A written report, or curated datasets with jupyter notebooks. Students are expected to work 10 hours per week.

Research rotations must be completed by the end of the Fall Quarter of the second year with a signed commitment form from a faculty advisor. Those who fail to identify a research advisor shall be advised to leave the doctoral program with an optional assessment for completion of a terminal MS-DS degree.

Formal Coursework

Requirements representing breadth and depth requirements consisting of 48 units of courses structured in three groups: foundations, core and depth areas; as well as 4 units of professional preparation including 1-unit HDSI Faculty Research Seminar, 2-units of TA/Tutor training and 1-unit of Research Skills courses to be completed with a Satisfactory grade.

Preliminary Advisory Assessment

The goal of the preliminary assessment examination is to assess students’ preparation for pursuing a PhD in data science, in terms of core knowledge and readiness for conducting research. The preliminary assessment is an advisory examination.

There is an oral presentation that needs to be completed before the end of Fall quarter of the second year. The student will propose the topic of the presentation (e.g., the outcome of a research rotation or a literature survey). The Graduate Program Committee will set up a committee consisting of two members (students can suggest committee members but there is no guarantee that only those suggested faculty members will be chosen for the committee). The oral presentation from the student will be followed by a Q&A session by the committee members. The committee will assess both the oral presentation as well as the students academic performance so far (especially in the required core courses). Students who did not get a satisfactory evaluation will receive a recommendation from the Graduate Program Committee regarding ways to remedy the lacking preparation (e.g, suggestion of courses to be taken), or an opportunity to receive a terminal MS in Data Science degree provided the student can meet the degree requirements of the MS program.

In a technical area of choice by the student by a committee set by the Graduate Committee (GradCom). This examination is to be completed before the start of the second year. Preliminary examinations will normally be scheduled annually in the Spring quarter through Summer quarter of the first year. The goal of the preliminary assessment is to assess student preparation in background courses and identify any required courses consistent with the planned research area. In rare cases, the assessment outcome may include a requirement to retake the examination. The preliminary assessment must be successfully completed no later than completion of two years (or six quarter enrollment) in the Ph.D. program

Passing a Research Qualifying Examination (UQE)

A research qualifying examination (UQE) is conducted by the dissertation committee consisting of four or more members approved by the graduate division as per senate regulation 715(D). One senate faculty member must have a primary appointment in the department outside of HDSI. Faculty with 25% or less partial appointment in HDSI may be considered for meeting this requirement on an exceptional basis upon approval from the graduate division.

The goal of UQE is to assess the ability of the candidate to perform independent critical research as evidenced by a presentation and writing a technical report at the level of a peer-reviewed journal or conference publication. The examination is taken after the student and his or her adviser have identified a topic for the dissertation and an initial demonstration of feasible progress has been made. The candidate is expected to describe his or her accomplishments to date as well as future work. The research qualifying examination must be completed no later than fourth year or 12 quarters from the start of the degree program; the UQE is tantamount to the advancement to PhD candidacy exam.

A petition to the Graduate Committee is required for students who take UQE after the required 12 quarters deadline. Students who fail the research qualifying examination may file a petition to retake it; if the petition is approved, they will be allowed to retake it one (and only one) more time. Students who fail UQE may also petition to transition to a MS in Data Science track.

Dissertation Defense Examination and Thesis Requirements

Students must successfully complete a final dissertation defense oral presentation and examination to the Dissertation Committee consisting of four or more members approved by the graduate division as per senate regulation 715(D). The primary Thesis Adviser, who will chair the Dissertation Committee, must be a senate faculty member with an appointment of 0% or more at HDSI. One senate faculty member in the Dissertation Committee must have a primary appointment in a department outside of HDSI. Partially appointed faculty in HDSI (at 25% or less) are acceptable in meeting this outside-department requirement as long as their main (lead) department is not HDSI.

A dissertation in the scope of Data Science is required of every candidate for the PhD degree. HDSI PhD program thesis requirements must meet Regulation 715(D) requirements. The final form of the dissertation document must comply with published guidelines by the Graduate Division.

The dissertation topic will be selected by the student, under the advice and guidance of Thesis Adviser and the Dissertation Committee. The dissertation must contain an original contribution of quality that would be acceptable for publication in the academic literature that either extends the theory or methodology of data science, or uses data science methods to solve a scientific problem in applied disciplines.

The entire dissertation committee will conduct a final oral examination, which will deal primarily with questions arising out of the relationship of the dissertation to the field of Data Science. The final examination will be conducted in two parts. The first part consists of a presentation by the candidate followed by a brief period of questions pertaining to the presentation; this part of the examination is open to the public. The second part of the examination will immediately follow the first part; this is a closed session between the student and the committee and will consist of a period of questioning by the committee members.

Successful defense of the dissertation presentation in a final examination to the doctoral dissertation committee;  Approved dissertation that must explicitly address the reproducibility requirement. This requirement can be met by providing supplementary online material consisting of code, data repositories, any evidence of use by external parties and/or where necessary through validated proof of results.

Special Requirements: Generalization, Reproducibility and Responsibility

A candidate for doctoral degree in data science is expected to demonstrate evidence of generalization skills as well as evidence of reproducibility in research results. Evidence of generalization skills may be in the form of — but not limited to — generalization of results arrived at across domains, or across applications within a domain, generalization of applicability of method(s) proposed, or generalization of thesis conclusions rooted in formal or mathematical proof or quantitative reasoning supported by robust statistical measures. Reproducibility requirement may be satisfied by additional supplementary material consisting of code and data repository. The dissertation will also be reviewed for responsible use of data.

Annual Review

Annual review of the progress in the doctoral program by the graduate committee of HDSI faculty council.

Teaching Requirements

Including completion of teacher training course (DSC 599) and minimum of one quarter of teaching experience at half-time (50%) appointment as a Teaching Assistant over the course of the degree program;

Time Limits

Assuming a student has no deficiencies and is full-time enrolled in the program, our normative length of time pre-candidacy is 3 years and 2 years in candidacy. Extension of total time from matriculation to degree beyond six years will require petition and approval from the graduate division. HDSI has instituted several mechanisms and incentives to ensure expeditious time-to-degree. These include a full-time graduate students advisor in HDSI Graduate Affairs , preliminary assessment examination and advisory in the first year, and annual 13 review of each graduate student in the Ph. D. program led by the assigned faculty academic advisor of the student, graduate scholarships funded by HDSI foundation accounts to cultivate a culture of excellence in research and dedicated staff for computing and data curation services to ensure a smooth and easy access to necessary experimental platforms.

Obtaining an MS in Data Science

PhD students may obtain an MS Degree in Data Science along the way or a terminal MS degree, provided they complete the requirements for the MS degree.