HDSI Seminar Series | Spurious or Causal? Studying Failure Modes of Deep Learning and Ways to Fix Them

Title: Spurious or Causal? Studying Failure Modes of Deep Learning and Ways to Fix Them

Abstract: Over the last decade, deep models have enjoyed wide empirical success. However, in practice, these models are not reliable due to their sensitivity against adversarial or natural input distributional shifts as well as a lack of meaningful reasoning behind their predictions. In this talk, I will show that a root cause of these issues is the heavy reliance of deep models on non-causal and spurious features in their inferences. I will then explain our progress in understanding failure modes of deep learning and outline a roadmap towards developing trustworthy learning paradigms.

Bio: Soheil Feizi is an assistant professor in the Computer Science Department at University of Maryland, College Park. Before joining UMD, he was a post-doctoral research scholar at Stanford University. He received his Ph.D. from Massachusetts Institute of Technology (MIT) in EECS with a minor degree in mathematics. He has received an NSF CAREER award in 2020 and is the recipient of several other awards including two best paper awards, a teaching award and multiple faculty awards from industry such as IBM, AWS and Qualcomm. He received a Simons-Berkeley Research Fellowship on deep learning foundations in 2019. He is the recipient of the Ernst Guillemin award for his M.Sc. thesis, as well as the Jacobs Presidential Fellowship and the EECS Great Educators Fellowship at MIT.

HDSI Awards 10 Graduate Prize Fellowships

The Halıcıoğlu Data Science Institute (HDSI) has awarded 10 new Graduate Prize Fellowships to incoming Ph.D. students to support their data-science related research at UC San Diego for the next four years. This influx of data science expertise will boost the research and teaching capabilities for HDSI.

They will work directly with our data science students as teaching assistants for one course, helping strengthen what has become one of the largest data-science education programs in the nation.

The Graduate Prize Fellowships provide substantial financial support to Fellows throughout their four-year award. The awardees are drawn from multiple academic disciplines, in keeping with the multidisciplinary mission of HDSI. Doctoral student appointments will be shared by HDSI and five departments: Bioinformatics and Systems Biology-biomedical informatics track; Computer Science & Engineering (CSE); Mathematics; Cognitive Science; and Electrical and Computer Engineering (ECE).

Awardees are:

  • Fatemeh Amrollahi — Bioinformatics
  • Yeohee Im–ECE
  • Ranak Roy Chowdhury–CSE
  • Matthew Feiglis–Cognitive Science
  • Zhankui He–CSE
  • Side Li–CSE
  • Tara Mirmira–CSE
  • Yanyi Wang–Math
  • Yuyao Wang–Math
  • Weiwei Wu–Math

HDSI Celebrates 1st Year Accomplishments and Vision

Undergraduate scholarship winner presents poster


Capping the celebration of its first year in operations, the Halıcıoğlu Data Science Institute (HDSI) has released a video that spotlights its role as a national leader in data science programs.

The video, which features a day-long symposium held on HDSI’s first anniversary, highlights leading figures in data science research, industry and academia, as well as current students. The focus of the symposium was on both first-year accomplishments, and discussions of the future of the data science field.

In its first year, HDSI demonstrated the historic vision of University of California San Diego by taking the lead on one of the biggest forces in modern life, noted Bob Continetti, UC San Diego Senior Associate Vice Chancellor. With the innovative HDSI approach, he said, “This is how were going to respond by the ever-increasing rate of change around us.”

The symposium was accompanied by another video summing up HDSI first-year accomplishments since launching March 1, 2018. Among the first-year achievements:

  • Establishing itself as one of the largest academic data science programs in the nation
  • Launched a full data science major and minor, drawing students from every class at the university
  • Creating the Institute’s first postdoctoral scholar program
  • Building a core of more than 200 faculty members affiliated across more than 20 academic disciplines, from mathematics to the medical school
  • Attracted more than a dozen industry partners contributing to enhancing educational program, research opportunities and a job pipeline for students

Another leading academician in the data science field appearing on one of the symposium panels was David Culler, UC Berkeley Dean for Data Science and the former Department Chair, Electrical Engineering and Computer Sciences. Culler spoke on the urgent need for responsible data science taught and utilized in research under the guidance of experienced leading universities.

“We’re in a moment where were trying to understand the complicated dynamics of the planet when we also have the ability to see every square yard of it through remote sensing and so forth,” said Culler on the challenges. “What we are seeing are the frontiers of knowledge in essentially every domain are integrated in character.”

The video features undergraduate Luyanda Mdanda accepting his prize for winning the best scientific poster among inaugural group of HDSI Undergraduate Research Scholarship winners.

Mdanda, a junior, was one of more than 40 undergraduates who won competitive scholarships to conduct data science research projects during the academic year. HDSI funded the scholarships that drew students from more than 15 different academic majors, from political science to computer science.



Death and Data Science: Discovering the Reality vs the Reporting

How much do people obsess about violent death, and how did HDSI Fellow Brad Voytek’s students use data science to prove those obsessions defy reality?

The surprising answers discovered by Voytek’s students were delivered in their final class project, “Death: Reality vs. Reported.” Based on their analysis of millions of data points over a 17-year period, they found the reality is that passive death like heart disease may be among the most common killers. And yet rarities like lethal terrorism grab most of the headlines. They found that terrorism, compared to the number of deaths it causes, was 3906-times over-represented in the media.

The team of four undergraduates used data science tools like Python to zero in on the nexus of: What do Americans actually die from? What do they search the most for on Google about death? What does the major media report about death? Their findings proved so dramatic that their school project on the media itself drew headlines. And their work recently drew endorsement from the world’s most famous tech guru, Bill Gates.

“I’m always amazed by the disconnect between what we see in the news and the reality of the world around us,” Gates tweeted in mid-June, linking to their work. Considering the philanthropist claims 47.5 million Twitter followers, that’s pretty impressive recognition.

UC San Diego rising junior Owen Shen reacted with cautious pride to this latest praise. “This thing really started snowballing,” said Shen from his family’s home in Northern California’s East Bay. The recognition from Gates is especially meaningful to him because this week Shen started his summer internship working for Microsoft, the tech giant co-founded by Gates.

Shen calls himself lucky for the widespread attention their data science Death project has garnered from luminaries like Gates. Although Shen, being 20 years old, seemed more impressed with the earlier praise his team gained from his own more Millennial and GenX heroes like Liv Boeree, a British professional poker player and philanthropist (with a degree in astrophysics), as well as Jeff Dean, a computer scientist who leads the Google Brain project in the company’s Research Group.

Shen and team did the work for Voytek’s class, Data Science in Practice. Voytek is a neuroscientist and professor in the Department of Cognitive Science who helped pioneer the Institute undergraduate education program, and the interdisciplinary draw of more than 200 faulty members affiliated with HDSI, the university’s data science hub.

Shen is a computer science major at UC San Diego who took Voytek’s class last year as a freshman because of his interest in analyzing data as well as coding. Shen teamed with classmates Hasan Al-Jamaly, Maximillian Siemers, and Nicole Stone. Together they searched death records from the Centers for Disease Control and Prevention for 12 causes of death from 1999-2016, including cancer, stroke, overdose, car accidents, homicide and terrorism. They searched headlines from 1999-2016 in both the New York Times and the Guardian news publication from the United Kingdom. Then they looked at Google search trends for their 12 cause-of-death markers from 2004-16 (starting with the first year available).

Once they got the raw counts, one of the challenges was to clean up the data, explained Shen, by relativizing all the columns so they could compare the correct proportions. In data terms, comparing apples to pineapples doesn’t deliver the sharpest results. Among their findings: Cancer and suicide were by far the enduring leading search interests, with terrorism being perennially a Top 5 obsession.

Said Voytek, their professor, “It’s fantastic work.”

What intrigued Stone was how deeply divergent people’s fears were about death, what they searched Google for, in comparison to the reality of death they found. “The things people think they need to worry about are more far apart from what the data shows they should be worrying about. Why? We didn’t get into that, but someone should,” said Stone, who majored in Cognitive Science specializing in machine learning, and minored in Computer Science.

Among the Death Reporting vs. Reality team, Al-Jamaly just graduated and took a job as a software engineer in the Bay Area. And Siemers, who was a visiting international student, returned to Europe to pursue a Ph.D.

For his part, Shen didn’t stop with the final report for Voytek’s class, but kept developing their project into a more mainstream format complete with interactive color-coded graphics. As a result of his more user-friendly data presentation the wider recognition for their original work keeps growing, with milestones like the project carrying top honors on the popular Reddit internet site as one of the most popular data presentations of all time. Shen plans to continue working in data analysis at UC San Diego and beyond, saying, “I have more questions I would like to get answers for.”