Death and Data Science: Discovering the Reality vs the Reporting
How much do people obsess about violent death, and how did HDSI Fellow Brad Voytek’s students use data science to prove those obsessions defy reality?
The surprising answers discovered by Voytek’s students were delivered in their final class project, “Death: Reality vs. Reported.” Based on their analysis of millions of data points over a 17-year period, they found the reality is that passive death like heart disease may be among the most common killers. And yet rarities like lethal terrorism grab most of the headlines. They found that terrorism, compared to the number of deaths it causes, was 3906-times over-represented in the media.
The team of four undergraduates used data science tools like Python to zero in on the nexus of: What do Americans actually die from? What do they search the most for on Google about death? What does the major media report about death? Their findings proved so dramatic that their school project on the media itself drew headlines. And their work recently drew endorsement from the world’s most famous tech guru, Bill Gates.
“I’m always amazed by the disconnect between what we see in the news and the reality of the world around us,” Gates tweeted in mid-June, linking to their work. Considering the philanthropist claims 47.5 million Twitter followers, that’s pretty impressive recognition.
UC San Diego rising junior Owen Shen reacted with cautious pride to this latest praise. “This thing really started snowballing,” said Shen from his family’s home in Northern California’s East Bay. The recognition from Gates is especially meaningful to him because this week Shen started his summer internship working for Microsoft, the tech giant co-founded by Gates.
Shen calls himself lucky for the widespread attention their data science Death project has garnered from luminaries like Gates. Although Shen, being 20 years old, seemed more impressed with the earlier praise his team gained from his own more Millennial and GenX heroes like Liv Boeree, a British professional poker player and philanthropist (with a degree in astrophysics), as well as Jeff Dean, a computer scientist who leads the Google Brain project in the company’s Research Group.
Shen and team did the work for Voytek’s class, Data Science in Practice. Voytek is a neuroscientist and professor in the Department of Cognitive Science who helped pioneer the Institute undergraduate education program, and the interdisciplinary draw of more than 200 faulty members affiliated with HDSI, the university’s data science hub.
Shen is a computer science major at UC San Diego who took Voytek’s class last year as a freshman because of his interest in analyzing data as well as coding. Shen teamed with classmates Hasan Al-Jamaly, Maximillian Siemers, and Nicole Stone. Together they searched death records from the Centers for Disease Control and Prevention for 12 causes of death from 1999-2016, including cancer, stroke, overdose, car accidents, homicide and terrorism. They searched headlines from 1999-2016 in both the New York Times and the Guardian news publication from the United Kingdom. Then they looked at Google search trends for their 12 cause-of-death markers from 2004-16 (starting with the first year available).
Once they got the raw counts, one of the challenges was to clean up the data, explained Shen, by relativizing all the columns so they could compare the correct proportions. In data terms, comparing apples to pineapples doesn’t deliver the sharpest results. Among their findings: Cancer and suicide were by far the enduring leading search interests, with terrorism being perennially a Top 5 obsession.
Said Voytek, their professor, “It’s fantastic work.”
What intrigued Stone was how deeply divergent people’s fears were about death, what they searched Google for, in comparison to the reality of death they found. “The things people think they need to worry about are more far apart from what the data shows they should be worrying about. Why? We didn’t get into that, but someone should,” said Stone, who majored in Cognitive Science specializing in machine learning, and minored in Computer Science.
Among the Death Reporting vs. Reality team, Al-Jamaly just graduated and took a job as a software engineer in the Bay Area. And Siemers, who was a visiting international student, returned to Europe to pursue a Ph.D.
For his part, Shen didn’t stop with the final report for Voytek’s class, but kept developing their project into a more mainstream format complete with interactive color-coded graphics. As a result of his more user-friendly data presentation the wider recognition for their original work keeps growing, with milestones like the project carrying top honors on the popular Reddit internet site as one of the most popular data presentations of all time. Shen plans to continue working in data analysis at UC San Diego and beyond, saying, “I have more questions I would like to get answers for.”