Faculty Feature Article Series: I Am Data Science
with Associate Professor and HDSI Fellow, Bradley Voytek, Ph.D.
by Trista Sobeck & Bobby Gordon
Bradley Voytek is an Associate Professor in the Department of Cognitive Science, the Halıcıoğlu Data Science Institute, and the Neurosciences Graduate Program at UC San Diego. He is both an Alfred P. Sloan Neuroscience Research Fellow and a Kavli Fellow of the National Academies of Sciences, as well as a founding Faculty Fellow of the UC San Diego Halıcıoğlu Data Science Institute and the Undergraduate Data Science program, where he serves as Vice-Chair.
Sci-fi-loving Bradley Voytek first became intrigued with the idea of data science when he began reading Isaac Asimov’s book, Foundation. In the book, a galaxy-spanning empire features scientists called “psychohistorians” that realize they can quantify and predict the ebbs and flows of the behavior of its quadrillions of inhabitants. Thus set Voytek’s mind spinning off in a variety of directions about data.
“While I was working toward my Ph.D. in neuroscience in the mid-2000s, I grew very frustrated with how much there was to learn about the brain,” remembers Voytek.
“There are more than 3 million peer-reviewed scientific articles that have been published, related to neuroscience,” he reports. “That’s simply too much for any one person to ever begin to understand! I started to wonder if I could analyze all those papers, sort of like Asimov’s psychohistorians,” explains Voytek.
What is one to do when there’s so much to learn? If you’re Bradley Voytek, you teach yourself some text mining. “My wife [Jessica Bolger Voytek] and I published a paper showing that there’s latent structure in all of that research, and that you can algorithmically extract that information, and even use it to help generate novel hypotheses,” explains Voytek.
Those early discoveries led him to work as a data scientist in industry for a few years. He then returned to academic neuroscience research.
Neuroscience and Data Science Make a Beautiful Couple
As Voytek continued to dive deep into information about the brain, he didn’t just observe the sheer amount of data that exists, but how disconnected it is from the rest. “Neuroscience is a rapidly changing field that is increasingly moving towards ever larger and more diverse datasets that are analyzed using increasingly sophisticated computational statistical methods,” he says.
Voytek says there is now a strong need for neuroscientists who can think deeply about problems that incorporate information from a wide array of domains. These include psychology and behavior, cognitive science, genomics, pharmacology and chemistry, biophysics, statistics, and AI/ML (Artificial Intelligence/Machine Learning).
This is where data science enters the picture and assists neuroscience and its vast amount of information. Data science provides a framework for combining any large, multidimensional, heterogeneous datasets to answer questions and solve problems.
Voytek explains further. “Determining what data one needs, and how to effectively combine datasets, is a creative process. For example, a neural data scientist might be tasked with combining the following types of information:
1) demographic information and 2) multiple cognitive and behavioral measures, from people from whom we might collect; 3) biometric data, 4) motion capture data to understand motor control, and 5) eye-tracking to study attention, along with; 6) structural and 7) functional brain imaging data collected using methods with different spatial and temporal resolution (such as fMRI and EEG), and then place those results into context relative to; 8) average human brain gene expression patterns and 9) the existing knowledge embedded within the peer-reviewed neuroscience literature (>3,000,000 papers).”
Voytek says that the data science side of his lab tries to combine and integrate all these different kinds of data to solve problems that just can’t otherwise be addressed without taking a data-driven approach.
He points to a paper his former Ph.D. student, Richard Gao published in the journal eLife this past November. “To further emphasize the data science approach, this paper only used large datasets collected by numerous other research groups, and we published the whole paper as an executable document, where people can see all of the code that took us from raw data to final scientific figures and statistical results,” illustrates Voytek.
Start with What You Love to Understand the World
Just as Voytek’s journey into data science started out with his interest in science fiction and passion for neuroscience, he believes starting with something you love and then asking questions about it is a fantastic place to start—with projects.
“This begins by thinking about something you love—music, sports, film, whatever—and, starting from there, then think about potentially related questions: Which hip-hop artist has the largest vocabulary? Who’s got the best 3-point record in the NBA? How well do movie trailers track the film? It’s amazing to consider how many interesting questions can be addressed using some of the huge amounts of data that just exist everywhere now,” says Voytek.
Because data science does so much for so many fields of study and research, it is more than “just” statistics + programming. According to Voytek, “Statistics, machine learning, programming… these are all tools that data scientists use, but they’re not the essence of the field.”
Voytek concludes, “Data Science is the process of determining how best to quantify our world, how to then combine these different kinds of data—freeform text, images, health, what have you—to try and understand the complexities of our world and ourselves.”
Connect with Dr. Bradley Voytek
UC San Diego/HDSI Faculty Profile: https://datascience.ucsd.edu/about/hdsi-faculty-fellows/name/brad-voytek/