Data Science Freshman Makes His First Clouds

Elevating computer code to the level of art was something University of California San Diego freshman Yuan Gao did just for fun.

Gao majors in Data Science, in a major affiliated with the Halıcıoğlu Data Science Institute (HDSI) at UC San Diego. Like many freshmen, he choose an elective to get his feet wet in his first months on a college campus.

But for Gao, a lanky youth with a shy and ready smile, why take some low-pressure elective on a pass-fail basis, when he could instead dive deeper into data science?

In his elective Workshop in Data Science (DSC 96), Gao went above and beyond instructor expectations by designing a word cloud. A word cloud is a graphic visualization of a slice of time and place on the Internet, resembling the visual arts technique of montage but achieving the graphic design feature based on computer coding.

“I’ve seen them before and I thought, why not? It’s a novel way to show information in a visual way,” said Gao. He developed his hand-made cloud from the Facebook page for HDSI, the independent institute where he is a Facebook “friend.”

Gao’s instructor Colin Jemmott designed the elective Workshop class just for such experimentation, or as he puts it, “for students to get their hands dirty with code.” Computer code being the mostly hidden back-end that makes the digital world run.

Jemmott offers his small class of under 20 students the chance to get hands-on with independent projects. He taught them about “web scraping,” taking code from existing webpages, and working with BeautifulSoup in Python, a popular open source graphically-oriented coding language.

Jemmott explained what led him to suggest word cloud creation: “They had already used the Python library NLTK (Natural Language Toolkit) to tokenize and clean text – Including removing stop words and special characters– and then count occurrences.” Basically, teaching them how to dive in, extract data, analyze and utilize it.

These seemingly complicated computer science tasks are considered fun challenges to these students, all raised as digital natives. As part of his classwork for Jemmott, Gao took all his lessons and combined them into one project – his HDSI word Cloud synthesizing in one image the digital traffic and discussion on the casual forum of Facebook.

Gao, an international student from Jiangxi, China, chose UC San Diego specifically because of its Data Science major opportunities. He’s enrolled in the HDSI-affiliated program that is the largest Data Science undergraduate major in the nation.

While always technically savvy growing up, Gao said he was inspired for his choice of study by his father, Hongliang Gao, who has seen the increasing use of data for his work in the criminal justice system in their native country. “My father told me: ‘The future is data,’” Gao explained.

That attracted him to the emerging discipline of Data Science, a field that focuses on taking the rising tide of Big Data and focuses on the processes and systems that enable people to extract knowledge from the data. While he likes computer science, a discipline where UC San Diego has long been a national leader, he wanted to focus more on making context from data.

“It’s a new field, and I like that, it can cover it can cover any problem in any field I’m interested in, in the future,” said Gao.

Contact: Lisa Petrillo, Communications/Media

Halıcıoğlu Data Science Institute, UC San Diego

858-246-2491 |