Upcoming Events

West Big Data Hub All Hands Meeting 

Sept. 20, 2018 (all day)
Location: Boise State, Boise, Idaho

Calling all HDSI partners and data enthusiasts! Join colleagues from industry, academia, nonprofits, and government on September 20 for West Big Data Innovation Hub's 2018 Open Annual All Hands Meeting in scenic Boise, Idaho!

Your leadership and participation will strengthen our community as we embark on the next phase of #BDHubs. Besides the updates on our flagship initiatives, community-led sessions on Data Science Education, Data Ethics, Smart & Connected Communities, Health Data, Infrastructure and other topics, this year we will feature hands-on facilitated 'collaboration sprints' for road-mapping current and future projects. There will be a talented graphic facilitator to help us visualize a shared, collaborative trajectory and create a tangible take-home #WestData18 souvenir. 

Also new this year: 'Pay It Forward' Registration options to support Early Career community members who might not otherwise be able to join. To learn more and register, visit

Adventures in an NLP Kaggle Contest

Sept. 27, 2018 (6:00 - 8:00 p.m.)
Location: ScaleMatrix Launch Center, 5795 Kearny Villa Road, San Diego, CA, 92123

Ryan Chesler will present his work on the Kaggle Toxic Comment Classification competition, where he placed 3rd out of 4551 (abstract below). This will be followed by a discussion and networking session.

Abstraction: Identifying toxic comments online is a key capability for facilitating productive discourse. Kaggle recently hosted a Toxic Comment Classification Challenge competition with the objective of identifying types of toxic comments, such as threats, obscenity, insults, and identity-based hate. In this presentation, Ryan Chesler will give an overview of his work on this project, where he ultimately placed 3rd out of 4551. Ryan will start by covering some basics of how words are embedded as numbers, including Word2Vec using Gensim and the sklearn packages CountVectorizer and TfidfVectorizer. Thereafter, he'll move to more advanced topics and discuss the models he used for the Toxic Comment challenge--what worked and what didn’t work. This talk will have content for both beginners and more advanced people, and code snippets and visualization will be available for download. To learn more and register, visit

UC Health Hackathon 2018

Oct. 6-7, 2018
Location: UC San Diego Medical Education Telemedicine (MET) Building

Join us for the 2nd Annual UC Health Hack this Fall from October 6–7, 2018 where participants can try their skills (and learn new ones) at solving issues in integrative medicine and global health through the use of technologies like Machine Learning, Artificial Intelligence (AI), Internet-of-Things (IoT), Sensors and Wearables, and more! The UC Health Hack will give participants the opportunity to meet and greet industry leaders and partners in the emerging fields of health informatics and data science in multiple domains such as patient care to population health. Special guests and subject matter experts from Apple will be attending in-person and virtually for office hours to demo Apple’s HealthKit and CareKit. Additional subject matter experts from Amazon Web Services will be in attendance to provide technical mentorship for AWS SageMaker, IoT, Big Data, and other AWS services. Tableau for Data Visualizations will be showcased and technical experts will be in attendance for mentorship on Tableau.

We’re excited to present even more experts and special guest attendees to share their knowledge and help participants explore and learn through interactive workshops and presentations! Expertise is not required, and we encourage all students, especially those with engineering, medical, and entrepreneurial backgrounds to participate! Are you an expert? Register to participate as a mentor to other learners! Participants can compete in teams of up to 5, and registering is free! There will also be food and beverages! And lastly, we have cash prizes for winning teams! Register today as a participant, mentor, or sponsor at:

HDSI Distinguished Lecturer Series: Scott Chacon

 Scott_Chacon_HeadshotOct 11, 2018 (2:00 – 3:30 p.m.)
Location: San Diego Supercomputer Center Auditorium, UC San Diego

Scott Chacon is the former cofounder of GitHub; now cofounder and CEO of Chatterbug, an online language learning school based in San Francisco and Berlin. Scott helped grow GitHub from 4 cofounders to 450 employees over 8 years, eventually being acquired by Microsoft for $7.5 billion. Scott is also the author of Pro Git, published by Apress and found online at In unrelated news, he holds a WSET Level 3 certification in Wines and Spirits and hosts wine education nights for charity.

Sensitive Data Access Workshop (Invite Only)

Oct. 16 (1:00-5:00 p.m.) and Oct. 17 (8:30 a.m. - 5 p.m.), 2018
Location: BRF II, UC San Diego

The Halıcıoğlu Data Science Institute is proud to sponsor the Sensitive Data Access Workshop at University of California, San Diego. This workshop will convene stakeholders from University of California, San Diego (UCSD), University of California, San Francisco (UCSF), University of Virginia (UVA), and Yale University to plan a pilot project for protected human research data access across institutions. 

The workshop will take place on the UCSD campus in La Jolla, California. The focus for the two days is to strategize and agree upon principles and needs for building a pilot to give researchers access to protected human research data access between institutions. The workshop structure and resulting pilot are intended to be scalable and based on principles of open data sharing and global collaboration.

Please note that this event is by invite only.


Oct. 19, 2018 (8:15 a.m. - 5:30 p.m.) 
Location: San Diego Supercomputer Center Auditorium, UC San Diego 

SoCalDB Day is a one-day workshop-style event for data management researchers and practitioners from academia and industry in Southern California to meet and discuss their latest research and experiences. Modeled after similar events in other regions, this is the first ever edition of SoCalDB Day. It will bring together attendees from UC San Diego, UC Irvine, UCLA, UC Riverside, UC Santa Barbara, USC, and many companies. To learn more and register, visit

Symposium on Next Generation Emperical Dynamic Tools for Understanding and Predicting Fisheries and Ecosystems 

Nov. 13-16, 2018 
Location: Southwest Fisheries Science Center - Santa Cruz Laboratory

Workshop Conveners: Dr. Stephan Munch (SWFSC), Dr. George Sugihara (SIO-UCSD)

Nonlinear forecasting, causal inference and control arise in diverse scientific disciplines ranging from physics to medicine to neurobiology.  Recent application to fisheries and environmental data indicate that these methods can identify hidden causal drivers and produce models that outperform more traditional deductive modelling approaches in terms of prediction skill. These empirical dynamic methods (EDM) are seen as next-generation tools for inductive model building and scientific inquiry and involve out-of-sample prediction for rigorous model validation.   To encourage additional applications of EDM to US fisheries and to foster the development of management tools using attractor-based approaches, we are offering a three-day symposium- workshop on nonlinear time series forecasting, causal inference and control.

This three-day workshop will focus on the theory behind EDM and its specific application to ecological time series. Day one will consist of invited presentations describing successful applications of the methods to fisheries data.  On day two, we will focus on approaches to constructing management advice from nonlinear forecasts using a) steady-state optima and b) scenario exploration, and c) control theory. On day three, we will describe in greater depth the mathematical and statistical background for nonlinear forecasting and provide a tutorial on time- delay embedding using the R package for EDM (rEDM).

Target Audience: The workshop is aimed to provide next generation tools to National Marine Fisheries Service (NMFS) scientists and is open to anyone interested in seeing how EDM can be used in fisheries for better understanding mechanism, for prediction, and for decision support.  The third day of the workshop is specifically geared toward people interested in implementing the methods on data that they are familiar with.

Background and HDSI Objective:  Current models used to manage US fish stocks are based on classical population growth equations that are applied deductively but are poorly predictive in real world fishery settings. EDM will be offered to NMFS scientists as an alternative inductive (data-driven) approach that that can pass the validation test of predictive skill.  For more background see PNAS commentary:

7th International Symposium on Data Assimilation (ISDA 2019)

Jan. 21–24, 2019
Location: Kobe, Japan

The symposium will focus on the cross-cutting issues shared in broad applications of data assimilation from geoscience to various physical and biological sciences. In particular, the symposium will enhance discussions among researchers with various background on, for example, non-Gaussian and nonlinear data assimilation problems, Big Data Assimilation (BDA), high-performance computation (HPC), Uncertainty Quantification (UQ), advanced intelligence (AI) and machine learning, multi-scale and multi-component treatments, observational issues, and mathematical problems. We will announce again when we are ready to accept applications.

Important Dates (Tentative):

  • Application deadline: October 2018
  • Speakers confirmed and program made available: November 2018
  • Registration deadline: December 2018
  • Symposium: January 21–24, 2019

Information Theory and Applications Workshop

Feb. 10-15, 2019
Location: Catamaran Resort, Pacific Beach, San Diego, California

A relaxed gathering of researchers applying theory to diverse areas in science and engineering.

For more event details and registration, click here.

Association of Computing Machinery

2020 Knowledge Discovery and Data Mining Conference

Aug. 2020
Location: San Diego, California

The City of San Diego has been selected as the venue for the Association of Computing Machinery’s 2020 Knowledge Discovery and Data Mining Conference, one of the premier global meetings on artificial intelligence, machine learning and data science. The annual conference, which brings together several thousand leading researchers and practitioners from academia and industry, is expected to be held in August 2020 in downtown San Diego. Rajesh Gupta, director of the Halıcıoğlu Data Science Institute at UC San Diego, and Yan Liu, director of the USC Machine Learning Center, will serve as general co-chairs for the meeting. San Diego hosted the Knowledge Discovery and Data Mining conference in 2011 and 1999.

This year’s KDD 2018 Conference will take place in London in late August. Further information on San Diego’s KDD 2020 Conference will be made available later this year.

Past Events

Recent Event: Big Data, Big Dilemmas: Data Science and Ethics

Big Data, Big Dilemmas: Data Science and Ethics

Sept. 12, 2018 (6:30-9:00 p.m.)
Location: 515 15th Street NW, F St., Washington, DC 20004

Whether you live in Washington, D.C. or Washington State, issues of data integrity, privacy, automation, and regulation are now a part of everyday life. Join UC San Diego to look deeper into today’s data science studies, research, and real world projects that are informing tomorrow’s practices and innovations.

Our conversation partners will give you a look beneath the surface of perplexing questions, ethical conundrums, and potential outcomes to help you be more informed about data science and the issues relevant to you.

You’ll also learn how UC San Diego’s new Halicioğlu Data Science Institute and the undergraduate Data Science Major are educating the next generation of data scientists in critical and practical skills and ethics. Learn More.

Recent Event: Invited Presentation at SDSC/UCSD

Invited Presentation at SDSC/UCSD

Deep Learning Acceleration of Progress toward Delivery of Fusion Energy

Aug. 29, 2018 (2 p.m.)
Location: San Diego Supercomputer Center Room 408

Speaker: Professor William Tang
Princeton University
Dept. of Astrophysical Sciences, Plasma Physics Section,
Executive Committee, Princeton Institute for Computational Science & Engineering (PICSciE), and
Principal Research Physicist, Princeton Plasma Physics Laboratory

Accelerated progress in producing accurate predictions in science and industry have been accomplished by engaging modern big-data-driven statistical methods featuring machine/deep learning/artificial intelligence (ML/DL/AI). Associated techniques being formulated and adapted have enabled new avenues of data-driven discovery in key scientific applications areas such as the quest to deliver Fusion Energy – identified by the 2015 CNN “Moonshots for the 21st Century” series as one of 5 prominent grand challenges. An especially time-urgent and very challenging problem facing the development of a fusion energy reactor is the need to reliably predict and avoid large-scale major disruptions in magnetically-confined tokamak systems such as the EUROFUSION Joint European Torus (JET) today and the burning plasma ITER device in the near future. Significantly improved methods of prediction with better than 95% predictive accuracy are required to provide sufficient advanced warning for disruption avoidance or mitigation strategies to be effectively applied before critical damage can be done to ITER -- a ground-breaking $25B international burning plasma experiment with the potential capability to exceed “breakeven” fusion power by a factor of 10 or more. This truly formidable task demands accuracy beyond the near-term reach of hypothesis-driven /”first-principles” extreme-scale computing (HPC) simulations that dominate current research and development in the field.

Recent HPC-relevant advances in the deployment of deep learning recurrent and convolutional neural networks in Princeton’s new Deep Learning Code -- "FRNN (Fusion Recurrent Neural Nets) Code on modern GPU systems. This is clearly a “big-data” project in that it has direct access to the huge JET disruption data-base of over a half-petabyte to drive these studies. FRNN implements a distributed data parallel synchronous stochastic gradient approach with Tensorflow libraries at the backend and MPI for communication. This deep learning software has demonstrated excellent scaling up to 6000 GPU's on “Titan” at the Oak Ridge National Laboratory – an achievement that has helped establish the practical feasibility of using leadership class supercomputers to greatly enhance training of neural nets to enable transformational impact on key discovery science application domains such as Fusion Energy Science. Powerful systems on which FRNN is currently deployed include: (1) Japan’s TSUBAME 3 – where over 1000 Pascal P100 GPU's have already enabled impressive hyper-parameter tuning production runs; and (2) ORNL’s SUMMIT featuring the new VOLTA GPU’s on which FRNN’s new “half-precision” algorithmic capability has produced attractive scaling results. Summarily, statistical Deep Learning software trained on very large data sets hold exciting promise for delivering much-needed predictive tools capable of accelerating scientific knowledge discovery in HPC. The associated creative methods being developed also has significant potential for cross-cutting benefit to a number of important application areas in science and industry.

Past Events in 2018

SeedMe Workshop

Collaborative Data Sharing Infrastructure for Researchers

Aug. 24, 2018 (9 a.m. - 5 p.m.)
Location: San Diego Supercomputer Center

SeedMe workshop will provide a hands on experience to create and use data sharing infrastructure for research purpose. The workshop will introduce the SeedMe platform that provides powerful data organization and data sharing tools. The platform supports extensive customization and rapid collaboration by providing an ability to collocate data, its description and its discussion in one spot. The workshop attendees will take away a fully functioning data sharing website for exploring its use in their projects. See more information on the workshop page.

Workshop Information

Open ML Hackathon

Deep Learning Competition: Image Classification

July 25, 2018 (5-9 p.m.)
Location: 5575 Morehouse Drive, San Diego, California

HDSI is proud to endorse Qualcomm's Machine Learning Competition taking place on July 25th! 

During the event, you'll learn how to build and deploy an image classifier in the real world. This is a great opportunity to strengthen your machine learning skills, where you will learn about ways to solve problems such as data generalization and augmentation. You'll also be able to team up with other students to build the best model and possibly win prizes.

Prerequisites: Participants should be familiar enough with Python to install libraries and run code. Participants should also have taken an online ML course or trained a model before.

More Information and Registration

Halıcıoğlu Data Science Institute Reception

Join Us In Welcoming HDSI into San Diego's Tech Community

June 14, 2018 (5:30 - 7:00 p.m.)
CBRE | 4301 La Jolla Village Dr #3000, San Diego, CA 92122

Join your fellow tech executives in welcoming the institute and professors that will be integral in training and educating the data science professionals of San Diego’s future.

Wine, Beer, & hors-d'oeuvres will be provided.

Host Committee - Tech San Diego Senior Data Science Advisory Committee

  • Annika Jimenez, Dexcom
  • Eliot Weitz, ViaSat
  • Greg Schibler, Neustar
  • Michael Zeller, Software AG
  • Patrick Deglon, Teradata
  • Sameer Chopra, ID Analytics
  • Scott Zoldi, FICO

HDSI Distinguished Lecturer Series: Amanda Cox

May 31, 2018 (2:00 – 3:00 p.m.)
San Diego Supercomputer Center Auditorium

Amanda Cox, Editor of New York Times’ The Upshot – Data-driven reporting focused on politics, policy and economic analysis. Download the flyer for more information.

San Diego Clinical Research Network (SDCRN)

Artificial Intelligence and Machine Learning in Drug Discovery: Beyond the Hype

May 30, 2018 (5:30-8:30 p.m.)
Knobbe Martens, 12790 El Camino Real, San Diego, CA, 92130

Innovators and entrepreneurs from some of the hottest companies in the field will present real-world case studies based on cutting-edge science demonstrating how to harness artificial intelligence and machine learning to transform aspects of the life sciences industry, from R&D through personalized medicine. This is followed by an interactive panel discussion around the current state of the field, regulatory and other development hurdles, and what the future of AI and ML in drug discovery holds.

Now open for registration.

Contact: Teresa Gallagher, San Diego Clinical Research Network 858-692-1835

Data Science and Engineering

Master's Degree for Working Engineering Professionals Information Session

May 23, 2018

For more information or to register, please see the JSOE Master of Advanced Study Degree Program.

Workshop: Methods and Message

May 3, 2018 (4-5:30 p.m.)

Virginia Eubanks
Associate Professor
Political Science
University at Albany, SUNY

Location: MCC Building, Room 250

When we intervene in public debates on issues of inequality, how do we make our choices around methods, storytelling, and political engagement matter? Stories about science and technology are particularly challenging as they often implicitly reproduce dominant narratives of inequality. Join Virginia Eubanks, author of Automating Inequality: How High-Tech Tools Profile, Police and Punish the Poor for a conversation about academic research, long-form journalism, and how we chose to frame our messages. We welcome participation of people in any stage of writing, reporting, community, or academic work.

Design@Large Speaker Series

May 2, 2018 (4-6 p.m.)

Virginia Eubanks
Associate Professor
Political Science
University at Albany, SUNY

Title: Automating Inequality: How High-Tech Tools Profile, Police and Punish the Poor
Location: CSE Building, Room 1202
RSVP: Eventbrite

Abstract: Today, automated systems control which neighborhoods get policed, which families attain needed resources, and who is investigated for fraud. While we all live under this new regime of data analytics, the most invasive and punitive systems are aimed at the poor.

Virginia Eubanks will discuss her new book, Automating Inequality, which systematically investigates the impacts of data mining, policy algorithms, and predictive risk models on poor and working-class people in America. The book is full of heart-wrenching and eye-opening stories, from a woman in Indiana whose benefits are literally cut off as she lays dying to a family in Pennsylvania in daily fear of losing their daughter because they fit a certain statistical profile.

Her visit is co-sponsored by the Design Lab and the Institute for Practical Ethics.

Design@Large Speaker Series talks are free and open to the public.

HDSI Distinguished Lecture Series

April 27, 2018 (4-6 p.m.)

David Danks
L.L. Thurstone Professor of Philosophy & Psychology
Head, Department of Philosophy
Carnegie Mellon University

Title: Handling Complexity
Location: Humanities & Social Sciences 7077
UC San Diego

Abstract: It is utterly banal to observe that the world is a complex place. The more challenging question is what to do in the face of that complexity. In this talk, I will address this question along two different lines. First, I will consider how we humans cognitively handle the world’s complexity. I will argue, on both theoretical and empirical grounds, that one way we succeed (when we do) is through rapid, dynamic, goal-driven attention allocation. I then explore the numerous philosophical implications of this aspect of our cognition, particularly in the philosophy of science. Second, I will consider our use of technology—particularly, computational technologies—to handle the world’s complexity on our behalf. As an example, I will briefly describe some of the causal discovery algorithms that we have developed for complex types of data. I will then turn more generally to AI, robotic, and other technologies with autonomous capabilities. Our use of these systems requires that we have appropriate trust in them, and I will present a series of challenges for the development of such trust (at least, in the near-future). This relative lack of appropriate trust raises a number of ethical and policy concerns, and I will conclude my talk by considering ways to ethically develop and deploy AI and other autonomous technologies, in light of the world’s complexity.

Computational Neuroscience Seminar

April 16, 2018 (4-5 p.m.)

Henry D. I. Abarbanel

Physics and Scripps Institution of Oceanography, UC San Diego

Fung Auditorium, Powell-Focht Bioengineering Building
UC San Diego

Organized by the Institute for Neural Computation.

Sponsored by Brain Corp.

Download flyer

Halıcıoğlu Data Science Institute Symposium and Faculty Open House

March 2, 2018
Qualcomm Institute/Atkinson Hall

Student Poster and Presentations:

12:30 pm Dr. Elizabeth Simmons, Executive Vice Chancellor, Academic Affairs
12:40 pm Dr. Rajesh Gupta, Co-Director Halıcıoğlu Data Science Institute
12:45 pm Dr. Terry Sejnowski, Co-Director, Institute for Neural Computation
12:53 pm Dr. Molly Roberts, Political Science
1:01 pm Dr. Gordon Hanson, Global Policy and Strategy
1:09 pm Dr. Amaro, Chemistry/BioChemistry
1:17 pm Dr. Anders Dale, Neuroscience
1:25 pm Dr. George Sugihara, Climate Atmospheric Science/Physical Oceanography
1:33 pm Dr. Yoav Freund, Computer Science and Engineering
1:41 pm Dr. Brad Voytek, Cognitive Science
1:49 pm Dr. Terry Jernigan, Cognitive Science
2:00 pm Q&A
2:15 pm Student posters (Lobby)
3:00 pm Student presentations and judging (Auditorium)
4:00 pm Student posters (Lobby)
5:00 pm Closing

Faculty and Student Open House

March 2, 2018 
Qualcomm Institute (formerly CalIT2)

Join Halıcıoğlu Data Science Institute Co-Directors Rajesh Gupta and Jeffrey Elman, along with Executive Vice Chancellor for Academic Affairs Elizabeth Simmons, to celebrate the official launch of the institute and learn more about its programs.

12:30 – 2:00 Formal Program (Open to Faculty by Invitation Only)
2:00 – 5:00 Student Presentations and Posters