Learn About Our Talent
Connect With Our Talent
Please complete this Opportunity Submission Form to provide the information needed to promote your opportunity to our talent pool. There is no fee associated. We simply request that you notify HDSICareerServices@ucsd.edu if you decide to move forward with hiring one of our applicants and provide feedback in a survey that will be sent at the end of the quarter. For additional questions or concerns, please contact Margaret Zuhlke, Career Advisor and Alumni Relations Specialist, at firstname.lastname@example.org.
Data Science Talent Day
Data Science Talent Day is UC San Diego’s annual recruiting event specifically dedicated to data science talent at UC San Diego. The half-day event is hosted by the Data Science Student Society (DS3) in partnership with the Halıcıoğlu Data Science Institute and provides employers with a unique opportunity to meet, network, and recruit our aspiring data scientists who are seeking internships and job opportunities.
Senior Capstone Project
- The undergraduate Data Science program offered through the Halicioglu Data Science Institute offers a unique experience of a required two-quarter senior capstone project. Quarter 1 of the capstone covers two parallel topics:
- The basics of “data science methodology” for a large project, including best practices for data handling and project reproducibility (“lecture”).
- Beginning research into your choice of domain. Acquaintance with a domain is made via replicating a specified result in a domain of inquiry (“discussion”).
- The replication of the domain result in (2) will use the best practices learned in (1). This work then serves as a foundation for project proposals due at the end of the quarter. The projects will be worked on, in groups, in the second quarter. While the methodology portion is taught in a traditional lecture setting, most of the material covered in this course will be done through self-guided learning: readings, data exploration, and ensuing discussion.
- Data Scientists typically work on projects in groups. As such, large assignments throughout the sequence (the replication; project proposal; project itself) are worked on in groups. Like in a career or research situation, these groups will be formed using a variety of factors, including academic background, mutual interests, and a little randomness.
- Student’s Capstone projects fall within a domain of emphasis as defined by the project described in this proposal. Students will acquaint themselves with this domain through replicating an existing study/project, the details of which should be sketched below.
- The information provided below should:
- Give a rough introduction to the domain and the primary data source used for the projects in the domain area.
- Sketch the required learning outcomes for students. That is, a description of the result/outcome students will replicate to acquaint themselves with the domain (and build off of) and the necessary “new tools” students will have to learn to work in the domain (these may be new statistical techniques, ML techniques, or significant domain research).
- List potential “extensions” to the replication project — i.e. possible Capstone project ideas — that students can use either as inspiration for their personal (group) project during the second quarter.
Enter a short description of the domain area being considered
List background knowledge in the domain that students will have to learn to work on this project. Try to keep it minimal and list a few references to keep it concrete.
List the primary data sources necessary for doing the replication project; additionally list secondary data sources that may serve useful for enriching the primary data.
List (1) the question/investigation/problem framing the “replication project”, and (2) list other potential investigations that gives students ideas for what a capstone in this domain may look like.
- [primary investigation]
- [additional investigation]
List technical subjects that are necessary for working in the proposed domain. Pay particular attention to subject matter that students likely won’t know coming into the project (e.g. “multiple hypothesis testing”, “advanced graph computations”, “spatial auto-correlation”, “natural experiments via difference-in-differences”). List references that could serve as minimal resources from which students can learn this material.
Reference for what students will be doing in 180A (first quarter) in the domain proposed in this template. Each of the bullets in “required guided assignments” will require an assignment to be developed. In total, they will make up the “result replication” using methodological best-practices in data science.
- Data ingestion pipeline
- Data cleaning step that facilitates statistical analyses. Choices are backed up by sound reasoning about the data. (point out some “gotchas” in the data; this helps acquaint students with the domain and data source).
- Data quality analysis; descriptive statistics; understanding/handling missing data.
- Replicate study.
- Create end-to-end pipeline
- Find related data.
- Formulate a project proposal in the domain.
- Begin the data pipeline and descriptive statistics.
Potential Mentor FAQ
Domain experts host “domains of inquiry” that can support multiple projects clustered around a specific domain. The scale of the Capstone Sequence (200 – 250 students/year) necessitates opening mentorship to multiple teams of 2-4 students. However, the structure of the Capstone Sequence encourages group discussion and teams within a domain to loosely work together, reducing the number of 1-on-1 meetings. The size of your domain may vary (from 1-5 teams), but structuring the Capstone Sequence in this way ensures:
- there are adequate guard-rails for students new to independent work,
- students have material to follow and learn from when developing their own work,
- the scope of projects remain manageable for the time frame and appropriate for the level of the students.
The Capstone Sequence is a two quarter commitment.
- One hour per week is spent in discussion sections answering questions students have on the weekly readings and tasks.
- Approximately one hour per week holding Office Hours (either drop in, or by appointment).
- Quarter 1 consists of a “result replication” to acquaint students with the domain and take students through a dry run of putting together a larger project in code. They also form their project groups and write/present their project proposal.
- In Quarter 2, students execute on their project.
Choosing a known, doable, result for students to replicate ensures that it’s scoped narrowly enough for students to succeed, while being broad/relevant enough to support investigations from numerous groups. The project template is meant to ensure this balance is achieved.
Domains are advertised ahead of the first quarter and students enroll in their preferred domain. During the first week, students may also change domains.
After 6 weeks of learning their domain individually, students form groups (or are placed into groups) and write project proposals in their domain area, with the guidance of their domain expert. Groups then work on these projects in Quarter 2.
- Ideally, students formulate their own project closely related to the result-replication in Quarter 1. Formulating questions is a valuable skill, while hewing close to the result-replication will allow students to reuse their code.
- However, you may approve any project you are comfortable advising as a domain expert. However, other students in the domain will be required to follow along and peer review their classmates, so a project too far afield may hinder those interactions.
- Fixing the project ahead of time and requiring students to work on a specific coordinated task in the area is ok. In fact, typically students want such guidance. You should make these expectations clear at the beginning of the course.
- Lecture (1 hour/week).
- Quarter 1: students learn methodological standards for a data science project (software development, reproducibility, environment independence).
- Quarter 2: students learn approaches to project/team management and effective (written, visual, oral) communication.
- Discussion (1 hour/week)
- Quarter 1: Guided self-learning of a domain through replication of a known result. Section consists largely of introducing the reading student’s will do the following week and answering questions students may have about the reading.
- Quarter 2: Project teams report on their progress from the previous week as well as plans for the following week. The class helps teams solve issues they may be having (“weekly check-ins”).
- Lab (2 hour/week)
- 1 hour/week of lab is “methodology” office hours (associated with the lecture).
- 1 hour/week of lab is “domain” office hours (help with the domain).
Not as much as it may seem; the sequence is designed to reduce the overhead required to scale mentorship across multiple groups. The following list the main class-prep tasks:
- Q1: Weekly plan (readings + tasks).
- Q1: Questions for topics each week.
- Quarter 2 requires no course preparation, only student/group meetings with feedback.
Students work on many assignments to both keep them on-track as well as to structure regular instructor feedback for student work. These assignments are largely pre-made and consistent across domains (with content specific to each domain).
- Weekly questions / check-ins / peer reviews.
- Project checkpoints (for replication and project).
- Finished products (e.g. reports, code pipelines, presentation)
The sequence is structured to minimize grading, while still giving thorough, actionable feedback.
- Weekly assignments are meant to be delivered in person (e.g. a verbal update); this is where you give feedback. The corresponding written assignments are turned in only for completion.
- Grades are given with a coarse precision (letter grade A/B/C/F with no plus/minus), with feedback in writing or given in office hours. This both helps maintain consistent grades across wildly different domains, as well as eases the grading. The letter grade assignment is typically “obvious”. However, it is very important that students receive actionable feedback from their assignments, as that is the very point of the checkpoint.
- Grading does not (need to) involve code. You should be able to generally assess the correctness of the code from the figures in the report. The methodological standards of the code are graded by the lecture staff. (You can grade code, however, if you so desire).
Winter 2020 - Spring 2020
- Analyzing the Media’s Effect on SD Police Traffic Stops
- Does the Race of a Police Officer Influence Traffic Stop Outcomes?
- Is Predictive Policing Preventative? An analysis of predictive policing systems and their effect on crime
- Effects of Recreational Marijuana Legalization on Traffic Collisions in California
Quantifying Artistic Style
- Name That Raga: An Analysis and Classification of Indian Classical Music
- Red Means Go: Analyzing YouTube Thumbnail Trends
- RestoreNet: Quantifying Image Restoration of World War II Images
- What Makes a Super Bowl Commercial Super?
- Book Cover Reader
Clustering the Human Genome
- Genetic Differences Between Glioblastoma Multiforme and Low-Grade Glioma Populations: How Does Copy Number Variation Distinguish Each Disease?
- Meta-Analysis to find Genes related to T2Ds
- Predicting Disease Risk Through Machine Learning
- Cancer Survivability & Genetic Mutations across Germ Layers
- Evaluating miRNA as Biomarkers for pre-Type 1 Diabetes
- Genome-wide association studies on Alzheimer’s Disease
Malware / Graph Learning
- Exploration of Graph Embedding Techniques in HinDroid
- HinReddit: Understanding Hate and Conflict in Reddit Communities
- Heterogenous Graph Embedding for Record Linkage
- Improving and Explaining HinDroid
- The Food Chain – A Personalized Restaurant Recommender System
- CodeHonestly – Advanced Python Code Plagiarism Checker
- Malware Category Detection
- Android Malware Detection with metapath2vec
Fall 2020 - Winter 2021
Below are the domains for the 2020-2021 Capstone Projects. The project presentations are planned for March 12, 2021. For more information on the presentations, please contact HDSICareerServices@ucsd.edu.
- Explainable AI
- Autonomous Vehicles
- Malware and Graph Learning
- Text Mining and NLP
- Recommender Systems
- A Knowledge-Graph for Substance Abuse Epidemiology
- Large-scale multiple testing
- Spatial-temporal analyses of infectious disease dynamics
- Graph Data Analysis
- The Spread of Misinformation
- Conflict and Collaboration in Online Communities
- Genetic Basis of Mental Health
- VPN X-Ray (offered by HDSI Industry Partner, Viasat)
- Particle Physics
- COVID-19 and Microbiome
- Cyber-Physical Systems (CPS) using IOT Devices
- System Usage Report (SUR, a.k.a. DCA) (offered by HDSI Industry Partner, Intel)
- Spatial agent-based modeling for school reopening