Upcoming Events 

2020 Knowledge Discovery and Data Mining Conference

August 2020-Save the date: The City of San Diego has been selected as the venue for the Association of Computing Machinery’s 2020 Knowledge Discovery and Data Mining Conference (KDD), one of the premier global meetings on artificial intelligence, machine learning and data science.

SIGKDD BannerRajesh Gupta, director of the Halıcıoğlu Data Science Institute at UC San Diego, will serve as general co-chair for the meeting along with Yan Liu, director of the University of Southern California Machine Learning Center.

The annual KDD conference gathers thousands of leading researchers and practitioners from both academia and industry. The nonprofit organization’s mission is “to provide the premier forum for advancement, education, and adoption of the ‘science’ of knowledge discovery and data mining from all types of data stored in computers and networks of computers.” San Diego last hosted the international event in 2011.

Check the Institute site for updates on the upcoming San Diego hosting of KDD 2020.

Past Events

Past Events List

Workshop on Ethics and Policy Implications of Big Data

UC San Diego Scientists in a LabFeb 15-16, 2019: HDSI as a data science leader is gathering leading social scientists and computer scientists, academics, activists, ethicists and practitioners interested in the ethics and policy implications of algorithms and Big Data. The workshop will explore the interface of social sciences and ethics with the state-of-the-art and immediate horizons of algorithms, Big Data and automation. Panels are set up to facilitate lively and wide-ranging discussion.

Conference co-sponsors: Halıcıoğlu Data Science Institute , the Institute for Practical Ethics and the Dean of Social Sciences at UC San Diego. Conference details:

Speakers include:

Cecilia Aragon - University of Washington-Seattle

Stuart Geiger - University of California, Berkeley

Johannes Himmelreich - Stanford University

William FitzGerald - The Worker Agency (previously Google)

Per-Erik Milam - University of Twente

Joan Donovan - Data and Society Institute, New York

Sumandro Chattapadhyay - Centre for Internet and Society, India

Akos Rona-Tas  - UC San Diego

Margaret Hu - Washington and Lee School of Law 

Jackson Poulson - Hodge Star Scientific Computing (previously Google)

Reetika Khera - IIT, Delhi

Emory Roane - Privacy Rights Clearing House

Lilly Irani - UC San Diego

Dana Nelkin - UC San Diego

Molly Roberts - UC San Diego

Juan Pablo Pardo Guerra - UC San Diego

Colloquium: Thoughts on Edge Intelligence

Marilyn-Wolf.jpgFeb. 19, 2019, 11 am - 12:30 pm: Machine learning (ML) methods have exploded in the past half-dozen years and is being applied to a huge range of problems in many areas. Initial results relied on server-oriented computations, but many applications will require deploying aspects of machine learning throughout the network hierarchy. Several factors motivate the development of Edge Intelligence architectures and algorithms: network bandwidth, power consumption, latency, privacy, etc. This talk will start the motivation for edge intelligence with several examples from manufacturing and health care and outline some important problems.

Bio: Marilyn Wolf is Farmer Distinguished Chair in Embedded Computing Systems and GRA Eminent Scholar at the Georgia Institute of Technology.  She received her BS, MS, and PhD in electrical engineering from Stanford University.  She was with AT&T Bell Laboratories from 1984 to 1989 and was on the faculty of Princeton University from 1989 to 2007.  Her research interests include cyber-physical systems, Internet-of-Things, embedded computing, embedded computer vision, and VLSI systems. She has received the ASEE Terman Award and IEEE Circuits and Systems Society Education Award. She is a Fellow of the IEEE and ACM.

 Where: From 11 am - 12:30 pm. San Diego Supercomputer Center, Synthesis Center (floor B1).

Data Science Talk: Integrated Analysis of Cancer Data and Precision Medicine 


Feb. 12, 11 a.m.- 12:30 p.m.: Battling cancer using the latest tools in data science strengthened with precision medicine will be the topic of a talk by bioinformatics expert Ron Shamir of Tel Aviv University. In the next in the Distinguished Lecturer Series, Shamir is set to discuss his work in clustering and analysis of multiple large-scale data types. Shamir’s group develops algorithms in bioinformatics for understanding the genome and human disease, and the software tools they developed are in use around the world. He works on large biological dataset analysis and applications to basic science and medicine. While inquiry of each dataset separately often provides insights, his approach uses integrative analysis to reveal more holistic systems-level findings. His work demonstrates the power of integrated analysis in cancer on two levels: In analysis of one omic in many cancer types together, and in analysis of multiple omics for the same cancer. His lecture will highlight a novel method for identifying and ranking driver genes in an individual's tumor, and demonstrate his findings of the advantages over prior art.

Speaker bio: Shamir is the founder and head of the Edmond J. Safra Center for Bioinformatics at Tel Aviv University. Shamir, who earned his Ph.D. from UC Berkeley, is a Sackler professor of bioinformatics in the university’s Blavatnik School of Computer Science. He has published more than 300 scientific works, including 17 books and edited volumes, and has supervised more than 50 research students. He co-founded and served as president of the Israeli Society of Bioinformatics and Computational Biology. He is a recipient of the Landau Prize in Bioinformatics, the Kadar family prize for excellence in research, and a Fellow of the International Society for Computational Biology and the Association for Computing Machinery. Where: 11 am, San Diego Supercomputer Center.

Fair Data Stewardship

SDSC Image with LogosFAIR Data is data that is Findable, Accessible, Interoperable and Reusable by computers. The FAIR data approach has already impacted both academic and industrial research practices, and has been embraced by the European Commission, the World Economic Forum, the G7 as well numerous national organizations, as well as the World Economic Forum and the G7 largest economies in the world.

Supporting Fair Data Interoperability

Feb. 11-15, 2019 (5 day course)FAIR Data Stewardship, as a new profession, is rapidly gaining momentum. New requirements from national and international funders are driving the need for training of competent, professional data stewards and data managers with knowledge of the FAIR principles and their application. This course introduces the required knowledge and skills in a broader data stewardship context, including topics like semantic data modeling, metadata modeling, the FAIRification process, publishing FAIR Data Points, and other topics related to managing research project's data requirements. After completion of the course participants will be able to work with domain specialists in making their data FAIR and preserving them for re-use. Learn more.

An Ontology-Driven Approach for Fair Data Interoperability

Feb. 11-13, 2019 or Feb. 11-15, 2019 (3 or 5 day course): In this context, FAIR Data Stewardship, as a new profession, is rapidly gaining momentum. New requirements from national and international funders are driving the need for training of competent, professional data stewards and data managers with knowledge of the FAIR principles and their application.

This course provides a solid introduction to the subjects related to semantics in FAIR. It starts with the basics of ontology-driven conceptual modeling providing the foundational knowledge for creating good conceptual models/ontologies. Once the ontologies are created and validated using a foundational ontology language the participants will learn how to make them computer-actionable using Semantic Web technologies such as OWL/RDF. Then the participants will learn how to semantically enrich and make data linkable by applying the ontologies to data. In addition, defining validation rules for the datasets are taught allowing for the data to be validated against both the syntax and semantics. Lastly the course will introduce the FAIR metadata approach so the FAIR/RDF data will be well described with their metadata. Learn more.

Data science in theory and action: Information Theory and Applications Workshop

ISDA WorkshopFeb. 10-15, 2019: This week-long workshop is billed as a relaxed gathering of researchers applying theory to diverse areas in science and engineering.

It is being held at the Catamaran Resort in Pacific Beach, near the UC San Diego campus.

Deadlines will be in January 2019 for papers, abstracts and posters. See the Information Theory event website for more details and registration.

Distinguished Lecturer Series: Nitesh Chawla

What does Quantified Self tell about Qualified Self?

nitesh chawlaPOSTPONED: Feb. 7, 2019, 11 am - 12:30 pm: The Auditorium, SDSC

University of Notre Dame researcher Nitesh Chawla will discuss his research on utilizing Big Data for the common Good at the HDSI Distinguished Lecturer Series.

Abstract: The quantification of our health and wellness attributes, whether in EMR or 'omics or through health wearables are beginning to provide a lens into our health and wellness states at a scale and granularity that were not possible before. Now, imagine integrating this data with our social determinants of health. Collectively, these data can provide an unprecedented opportunity for both personalized health and wellness, and population health management. We can begin to go from quantified self to qualified self to help infer and explain our individualized our health and wellness attributes. This presents a number of interesting problems for machine learning and network science from algorithms to deployment.  In this talk, I will present our work on leveraging data, network science, machine learning, design and interdisciplinary thinking towards personalized and participatory healthcare.

Where: San Diego Supercomputer Center Auditorium. 

Double Feature: An Afternoon of Applied Mathematics

Murray and Adylin Rosenblatt Endowed Lectures

Feb. 7, 2019 3 p.m. - 5 p.m.: All are welcome to the lecture series featuring two leading scholars, and offering a refreshment break between the lectures.

DaubechiesAt 3 p.m.: Mathematicians Helping Art Conservators and Art Historians, presented by Professor Ingrid Daubechies, Departments of Mathematics, Electrical and Computer Engineering, Duke University. Dr. Daubechies will deliver a Rosenblatt lecture as part of the Distinguished Diversity Colloquium, co-sponsored by the Mathematics Department Diversity Committee. Dr Daubechies will discuss how mathematics can help art historians and conservators in studying and understanding artworks, their manufacture process and their state of conservation. Her presentation will review examples of collaborations that have led -- and are still leading -- to interesting new challenges in signal and image analysis. In other applications, it’s possible to virtually rejuvenate artworks, bringing a different understanding and experience of the art to museum visitors as well as to experts. 

TavareAt 5 p.m.: Some Statistical Problems in Cancer Genomics, by Professor Simon Tavare, a mathematician/statistician who is Director of the Irving Institute of Cancer Dynamics at Columbia University. The starting point for Dr. Tavare’s talk comes from population genetics: how to best estimate evolutionarily relevant parameters from DNA sequence data taken from samples of individuals? His talk includes examples concerning inference of the number of distinct DNA sequences in a sample, given only information about the frequency of point mutations in the samples. The example provides an introduction to inference from typical cancer sequencing data; he offers an overview on cancer evolution, the sort of statistical and computational problems it poses. He plans to discuss novel experimental methods being developed to understand the 3D structure of tumors, paving the way for some challenging inferential problems that will require engagement from data scientists and others.

Where: Natural Sciences Building (NSB Auditorium) on the UC San Diego campus. To register:

Secrets of Learning: The Intersection of Neuroscience, Cognitive Science and AI

judy fanJan. 29, 2019, 11 am - 12:30 pm: How does the human mind transform a cascade of sensory information into meaningful knowledge? Judy Fan, forthcoming assistant professor in UC San Diego Department of Psychology, will discuss Cognitive Tools for Learning and Communication. She launches the first talk in the Colloquium Series from Halıcıoğlu Data Science Institute. She will highlight recent progress towards understanding the contribution of perceptual, action, and social reasoning systems to effective visual learning and communication. Fan's research addresses questions at the intersection of computational neuroscience, cognitive science, and artificial intelligence. Her background includes studies in neurobiology and statistics at Harvard University, earning her Ph.D. in psychology from Princeton University, and served as a post-doctoral student at Stanford University. Where: From 11 am - 12:30 pm. San Diego Supercomputer Center, Synthesis Center (floor B1).

Data across disciplines: 7th International Symposium on Data Assimilation (ISDA 2019)

ISDAJan. 21–24, 2019: This event in Kobe, Japan, will focus on the cross-cutting issues shared in broad applications of data assimilation in disciplines from geoscience to physical and biological sciences.

The symposium aims to enhance discussions among researchers from cross- disciplinary backgrounds. Examples under discussion include: non-Gaussian and nonlinear data assimilation problems; Big Data Assimilation (BDA); high-performance computation (HPC); Uncertainty Quantification (UQ); advanced intelligence (AI) and machine learning; multi- scale and multi-component treatments; observational issues; and mathematical problems.

Application deadline is slated for October 2018, with speaker confirmation and program availability in November. Registration deadline will be in December 2018. See the ISDA website for updates and more information.

Symposium connecting statistics and data science: Beyond Big, Missing or Corrupted Data

Stock PhotoJan. 19-20, 2019: Big Data, corrupted data, missing data – how to handle it all? HDSI in its role as a data science leader is co-sponsoring a two-day symposium -- Statistics & Data Science Symposium: Beyond Big, Missing or Corrupted Data. The event will gather outstanding researchers pushing the frontiers of statistics as applied to the data. Speakers include leaders from Caltech, Facebook, Microsoft, Princeton and the U.S. Census Bureau. Focus will be on the impact of data-driven science on the field of statistics, addressing such key questions as:

  • Do algorithms even exist in these new contexts?
  • How to quantify uncertainty in scientific discoveries?
  • How to develop efficient algorithms that are robust to the complex nature of the observed data?

Sponsored by the UC San Diego Department of Math and HDSI, registration remains open for the campus-based event Jan. 19-20. Learn more about special registration and reservation options.

The advent of new technologies has created great opportunities for data-driven discoveries spanning the worlds of science and industry and impacting both equally. In the past few years, data sources and availability have greatly changed. Examples include the abundance of data related to in some way to human behavior, policy implementations, interactions with a large number of computer devices and advertising materials, electronic records. Many datasets are now coming in a range of multimodal forms typically collected as streams of information; they now present highly unstructured patterns where a single phenomenon of interest is observed through multiple types of measurement devices, with each device possibly collecting only partial information of interest. Join our distinguished speakers and assembled experts to discuss issues like how can we disentangle and yet utilize all of the complex data in order to enrich statistical algorithms and models?

The event organizers are UC San Diego researchers Jelena Bradic and Dimitris Politis

Among the invited experts: From UC San Diego, Danna Zhang and Wenxin Zhou; Alekh Agarwal, Microsoft Research; Animashree Anandkumar, Caltech; Guang Cheng, Purdue University; Jianqing Fan, Princeton University; Sudeep Srivastava, Facebook; from UC Berkeley, Bin Yu, Peng Ding and Yian Ma; Yen-Chi Chen, University of Washington; from University of Chicago, Chao Gao and Veronika Rockova; Tucker Mcelroy, Census Bureau; Stanislav Minsker, University of Southern California; Annie Qu, University of Illinois, Urbana–Champaign; Aaditya Ramdas, Carnegie Mellon University; Zhao Ren, University of Pittsburgh; Srijan Sengupta, Virginia Tech; Stefan Wager, Stanford University; Anru Zhang, University of Wisconsin-Madison; Xianyang Zhang, Texas A&M University; Yinchu Zhu, University of Oregon; and Nan Zou, University of Toronto. 

Distinguished Lecturer Series: Computer talk leading expert in voice-user interface come to HDSI 

C. PearlJan. 17, 2019, 11 a.m., The Auditorium, SDSC: Want to talk nicely with your machinery – and have them talk back? HDSI is featuring Google’s leading expert on voice-user interfaces, Cathy Pearl, in its next Distinguished Lecturer Series. Pearl is a UC San Diego alumnae and current Head of Conversation Design Outreach at Google. She earned her degrees in Cognitive Science from UC San Diego, and a Computer Science master’s from Indiana University. She is the author of the 2016 book,  Designing Voice User Interfaces," published by O'Reilly Media and has worked with many innovators in the voice-recognition sphere including Nuance. Her broad background makes her a popular speaker, having done some rocket science, as a software designer for NASA, integrating voice recognition for banks, airlines and healthcare. She’s also known for her digital expertise on love and romance, writing a Love Data blog focused on “data-driven love in our modern world” – even performing analysis of the use of Twitter in dating. Event set for 11 am in San Diego Supercomputer Center.

Data Science and the Shopping Season

Geoff Hueter HeadshotThursday Dec. 6, 6:30- 8:15 p.m.: Physicist Geoff Hueter will be speaking on “Using Data Science to Increase Shopper Productivity.” Heuter, a UC San Diego alumnus, is CTO and co-founder of Certona Corp., a San Diego-based technology firm specializing in artificial intelligence. The event is free to members of IEEE and UCSD personnel, $5 cash for the public, with sponsorship from: IEEE Computer Society San Diego Chapter; IEEE Big Data Science & Engineering SIG,; Computational Intelligence Society; the Engineering in Medicine and Biology Society; and Women in Engineering. Location: Atkinson Hall, 9500 Gilman Drive #0436. Learn more and register.

Free student access to Data West forum and recruiters

Wednesday Dec. 5, 2018: Students are invited for free entry into a leading data-science-focused research and business forum -- Data West -- drawing practitioners working in the leading edge of knowledge discovery through data. Many participating companies will be recruiting interns. The annual Data West event, which includes leaders from Halıcıoğlu Data Science Institute, covers a wide range of knowledge topics and industrial applications in different meeting formats, from new ventures in data analytics, AI and machine learning, to smart manufacturing and new technologies including blockchain and cyber-physical systems, to large-scale system architectures and HPC performance. Among the companies attending and recruiting for student interns are: AEEC, Booz Allen Hamilton, Chatham Hill, Collibra, Decision Sciences, and Dell Technologies. Registration.

Meet the Boss: Startups

Meet the Boss Information BannerNov. 28, 2018: The secret to startup success will be the topic Nov. 28 at the Meet the Boss: Startups event from 6-8 p.m. at The Loft. Alumni panelists: Taner Halıcıoğlu, a UC San Diego instructor and the force behind the Halıcıoğlu Data Science Institute; Julie Wartell, who teaches GIS in Urban Studies and works as an independent advisor on public safety issues; and Ping Yeh, who has led teams from startups to Fortune 100 corporations to produce products merging core intellectual property. Hosted by Student Alumni Associates. Free food and refreshments. Tickets limited.

Ocean ecosystems: Symposium on Next Generation Emperical Dynamic Tools for Understanding and Predicting Fisheries and Ecosystems

Nov. 13-16, 2018: This three-day workshop will focus on managing data to build more accurate models for predicting impacts on fisheries and ocean ecology. It is aimed to provide next generation tools to marine fisheries science, and is open to anyone interested in seeing how this data- management system can be used in fisheries for better understanding mechanism, for prediction, and for decision support.

OceanThe San Diego-based workshop will be held at NOAA Southwest Fisheries Science Center, 8901 La Jolla Shores Drive, La Jolla, in the Santa Cruz Laboratory. It is sponsored by Halıcıoğlu Data Science Institute and the National Oceanic and Atmospheric Administration (NOAA)-Southwest Fisheries.

Workshop leaders are Stephan Munch of Southwest Fisheries and George Sugihara of Scripps Institution of Oceanography at UC San Diego.

Nonlinear forecasting, causal inference and control arise in diverse scientific disciplines ranging from physics to medicine to neurobiology. Recent application to fisheries and environmental data indicate that these methods can identify hidden causal drivers and produce models that outperform more traditional deductive modeling approaches in terms of prediction skill. These empirical dynamic methods (EDM) are seen as next-generation tools for inductive model building and scientific inquiry and involve out-of-sample prediction for rigorous model validation.

To encourage additional applications of EDM to U.S. fisheries and to foster the development of management tools using attractor-based approaches, we are offering this workshop on nonlinear time series forecasting, causal inference and control.

Why this workshop matters to ocean ecology and fisheries management: Current models used to manage U.S. fish stocks are based on classical population growth equations that are applied deductively but are poorly predictive in real world fishery settings. EDM will be offered to NMFS scientists as an alternative inductive (data-driven) approach that that can pass the validation test of predictive skill.

Day one will feature invited presentations describing successful applications of the methods to fisheries data. Day two is set to focus on approaches to constructing management advice from nonlinear forecasts using steady-state optima; scenario exploration; and control theory.

The final day will focus on the mathematical and statistical background for nonlinear forecasting; there will be tutorials on time-delay embedding using the R package for EDM (rEDM). The third day of the workshop is specifically geared toward people interested in implementing the methods on data that they are familiar with.

For more information, see the related commentary in PNAS (Proceedings of the National Academy of Sciences).

Biocom's 3rd Annual Big Data Summit: Advancements in Innovation Through AI + Deep Learning

Nov. 6, 2018: BioCom BannerBiocom and The CO Network invite you to our 3rd Annual Big Data Summit. This one-day event is an annual convergence of industry executives, bio-technologists and data scientists dedicated to deploying emerging technologies within the life science industry. Through a blend of illuminating keynotes, dynamic panel discussions, and interactive use-case studies from life science organizations, this year’s summit will focus specifically on the disruptive and transformative role of AI and Deep Learning. This conference is aimed at giving attendees the knowledge, tools and solutions critical to Big Data success within our industry.

This years gathering will focus on bringing together thought-leaders at the forefront of innovations in artificial intelligence, blockchain technology, computational drug discovery, genomics and robotics. Join us on November 6th, in San Diego to experience the future with the people responsible for creating it. Learn more and register.

Google Statistician to Give Talk on Data Analysis

Tim Hesterberg PhotoNov. 7, 2018, 1-2 p.m: A data science expert from Google will be at UC San Diego to give a talk entitled, Surveys and Big Data for Estimating Brand Lift. The seminar featuring Tim Hesterberg, senior statistician from Google, is sponsored by The Division of Biostatistics and Bioinformatics in the Department of Family Medicine and Public Health. The event will be held at the School of Medicine campus, in the MET (Medical Education and Telemedicine Building), Lower Auditorium.

Before his career at Google, Hesterberg worked in academia and the software and utilities industries. He earned his Ph.D. in statistics from Stanford University, and wrote Mathematical Statistics with Resampling and R; as well as co-authoring the ASA Guidelines for Undergraduate Statistics Programs. Now, as he likes to say, he can tell people how to analyze data!

Google Brand Lift Surveys estimates the effect of display advertising using surveys. Challenges include imperfect A/B experiments, response and solicitation bias, discrepancy between intended and actual treatment, comparing treatment group users who took an action with control users who might have acted, and estimation for different slices of the population. We approach these issues using a combination of individual-study analysis and meta-analysis across thousands of studies. This work involves a combination of small and large data survey responses and logs data, respectively. Learn more.

Distinguished Lecturer Series: Danielle Bassett 

D BassettNov. 1, 2018, 12:30 - 2:00 p.m: Dani Bassett is a physicist and systems neuroscientist and was awarded a MacArthur fellowship in 2014. Bassett's group studies biological, physical, and social systems by using and developing tools from network science and complex systems theory. A current focus lies in network neuroscience – developing analytic tools to probe the hard-wired pathways and transient communication patterns inside of the brain in an effort to identify organizational principles, to develop novel diagnostics of disease, and to design personalized therapeutics for rehabilitation and treatment of brain injury, neurological disease, and psychiatric disorders.

She will be speaking 12:30-2 p.m at the San Diego Supercomputer Center auditorium. 

Science and Politics: A GPS Science Policy Fellows Roundtable Event

Science and Politics: A GPS Science Policy Fellows Roundtable Event BannerOct. 30, 2018: Should scientists try and influence policy? Should policymakers and elected officials be required to share their science platform during their campaigns? How can science really make the most impact on policy outcomes? Join us as we bring together scientists and policymakers to discuss these questions and more. Learn more and register.

Technology Forum Breakfast: High Performance Data Center Network for Deep Learning and HPC Workloads with GigaIO

Oct. 25, 2018: Join us for breakfast 8:30-10:30 a.m. as Alan Benjamin, President and CEO of GigaIO Networks, discusses: High Performance Data Center Network for Deep Learning and HPC Workloads.

SDSCBenjamin is set to talk from his experience in industry about the massive increase in the amount of data that is being collected, analyzed and stored. This new reality is driving a splintering of computing into CPUs, GPUs, application-specific FPGAs (Field Programmable Gate Array semiconductor devices), and other specific compute nodes as general-purpose CPUs can no longer handle the load. This splintering is creating new cluster architectures that must enable greater flexibility, utilization and scale. Today’s networks were never designed to handle this amount of information or resource types at the latency and bandwidth necessary to achieve the desired outcomes.

The Carlsbad-based GigaIO has developed a hyper-performance network for software- defined infrastructure that enables data centers to scale their systems, creating a flexible infrastructure that accelerates system performance.

With assistance from the San Diego Supercomputer Industrial Partner Program, users are testing GigaIO’s composable network. The program helps them add GPUs and storage to the environment, and determine the best ratio of CPU/GPU/storage for their application. Learn more about the Technology Forum Breakfast.

Kaggle Data Hackathon

Kaggle Hackathon BannerOct. 20, 2018: Inspired by the success of our last event, we are kicking off a weekly Kaggle working group. The focus of these events is to break into groups and work on Kaggle contests to increase our data analysis skills through doing. Therefore, bring your laptops and come prepared to work :-) We welcome people of different experience levels and will break into groups accordingly.

For upcoming sessions, we will focus on the Two Sigma Challenge--using news to predict stock movements:

Learn more about the Kaggle Data Hackathon.

Deep into data management: SoCalDB Daylong Workshop

Oct. 19, 2018: SoCalDB Day is a one-day workshop-style event for data management researchers and practitioners from academia and industry in Southern California to meet and discuss their latest research and experiences.

SoCal DBThe event is set for 8:15 a.m. to 5:30 p.m. at the San Diego Supercomputer Center Auditorium, UC San Diego.

Modeled after similar events in other regions, this is the first-ever edition of SoCalDB Day. It will bring together attendees from UC San Diego, UC Irvine, UCLA, UC Riverside, UC Santa Barbara, University of Southern California, and leading industry experts including from Amazon, Oracle and Teradata.

Among the topics: Autonomous “self driving” databases; sensor networks; bridging the gap from academic research to successful startups; and utilizing machine learning to improve database performance.

Learn more about SoCalDB Day 2018.

Clean Water Matters: CA Water Data Challenge Summit

Oct. 18, 2018: Data science played the central role in Gov. Brown’s efforts to improve access to enough clean water for Californians. This daylong event will be a culmination of the work of scores of people, business and organizations for a vital common cause. The Summit and Awards Ceremony will be held both online through live streaming, as well as in person in Los Angeles.

Cal Water Data ChallengeAnnually, up to 1 million Californians lack access to clean, safe drinking water. The Challenge teams and individuals worked to develop resources and tools to improve that access. Many of the partners launched this Data Challenge to demonstrate the power of open data to address one of California’s most pressing issues.

The live-cast will run from 10 am-noon and 1:30-3 p.m. on Oct. 18 to celebrate the participants and solutions from the California Safe Drinking Water Data Challenge. The event will be held at the Los Angeles River Center and Gardens, 570 W Ave 26 #100, Los Angeles, 90065.

Events and community-led activities included engagements like hackathons, online tutorials, fireside chats and even a National Day of Civic Hacking. This summit and award ceremony will showcase contributions from innovators from multiple sectors. The winning participants will share in $12,000 in prizes. Learn more and register for the streaming or in-person event.

Human Health at Stake: Sensitive Data Access Workshop

Oct. 16-17, 2018: The Halıcıoğlu Data Science Institute is sponsoring a two-day workshop to build an effective means of providing health researchers access to vital research data, while also protecting sensitive and legally restricted medical information.

HDSI logoThe workshop will gather experts from leading research institutions on the UC San Diego campus, by invitation only.

Stakeholders from leading research institutions include: UC San Diego, UC San Francisco, University of Virginia and Yale University. The objective is to create a workable pilot project for protected human research data access across the institutions. The goal is to use the sophisticated precision of data science to find ways to gather data that leads to discovery of ways to advance human health. The workshop structure and resulting pilot are intended to be scalable, and based on principles of open data sharing and global collaboration.

The event is set for Oct 16 from 1-5 p.m. and Oct. 17 from 8:30 a.m. - 5 p.m. on the university’s School of Medicine campus, Health Sciences Biomedical Research Facility II.

HDSI Distinguished Lecturer Series: Scott Chacon

 Scott_Chacon_HeadshotOct 11, 2018 (2:00 – 3:30 p.m.)
Location: San Diego Supercomputer Center Auditorium, UC San Diego

Scott Chacon is the former cofounder of GitHub; now cofounder and CEO of Chatterbug, an online language learning school based in San Francisco and Berlin. Scott helped grow GitHub from 4 cofounders to 450 employees over 8 years, eventually being acquired by Microsoft for $7.5 billion. Scott is also the author of Pro Git, published by Apress and found online at In unrelated news, he holds a WSET Level 3 certification in Wines and Spirits and hosts wine education nights for charity.

UC Health Hackathon 2018

Oct. 6-7, 2018
Location: UC San Diego Medical Education Telemedicine (MET) Building

Join us for the 2nd Annual UC Health Hack this Fall from October 6–7, 2018 where participants can try their skills (and learn new ones) at solving issues in integrative medicine and global health through the use of technologies like Machine Learning, Artificial Intelligence (AI), Internet-of-Things (IoT), Sensors and Wearables, and more! The UC Health Hack will give participants the opportunity to meet and greet industry leaders and partners in the emerging fields of health informatics and data science in multiple domains such as patient care to population health. Special guests and subject matter experts from Apple will be attending in-person and virtually for office hours to demo Apple’s HealthKit and CareKit. Additional subject matter experts from Amazon Web Services will be in attendance to provide technical mentorship for AWS SageMaker, IoT, Big Data, and other AWS services. Tableau for Data Visualizations will be showcased and technical experts will be in attendance for mentorship on Tableau.

We’re excited to present even more experts and special guest attendees to share their knowledge and help participants explore and learn through interactive workshops and presentations! Expertise is not required, and we encourage all students, especially those with engineering, medical, and entrepreneurial backgrounds to participate! Are you an expert? Register to participate as a mentor to other learners! Participants can compete in teams of up to 5, and registering is free! There will also be food and beverages! And lastly, we have cash prizes for winning teams! Register today as a participant, mentor, or sponsor at:

Digital India: Opportunities and Challenges

Oct. 3, 2018 (5:00-6:30 p.m.)
Location: UC San Diego Ida and Cecil Green Faculty Club Atkinson Pavilion

India is digitizing rapidly and profoundly spurred largely by the Modi Government’s ambitious "Digital India" agenda while being facilitated by the growing spread of telecoms and internet access, social media platforms and digitization of the financial sector. Predictably, this has opened up a world of opportunities, new livelihoods and services, and new forms of social and political interaction for millions; empowering them substantively in the process. However, it has also thrown up profound public policy challenges and dilemmas, many of them unique and unprecedented.

Secretary of the Indian Department of Telecommunications and Pacific Leadership Fellow Aruna Sundararajan will discuss these aspects of India’s ongoing technological transformation and the public policy lessons that emerge from this engagement. As secretary, Sundararajan is leading the vast expansion of telecommunications services in India including BharatNet, the world’s largest rural broadband connectivity program.

PLF Sundararajan will present a public talk during her stay titled, “Digital India: Opportunities and Challenges”. The talk will take place on Wednesday, October 3rd at 5 p.m. in the dining room at the UC San Diego Faculty Club. For more information and to RSVP visit here:

Adventures in an NLP Kaggle Contest

Sept. 27, 2018 (6:00 - 8:00 p.m.)
Location: ScaleMatrix Launch Center, 5795 Kearny Villa Road, San Diego, CA, 92123

Ryan Chesler will present his work on the Kaggle Toxic Comment Classification competition, where he placed 3rd out of 4551 (abstract below). This will be followed by a discussion and networking session.

Abstraction: Identifying toxic comments online is a key capability for facilitating productive discourse. Kaggle recently hosted a Toxic Comment Classification Challenge competition with the objective of identifying types of toxic comments, such as threats, obscenity, insults, and identity-based hate. In this presentation, Ryan Chesler will give an overview of his work on this project, where he ultimately placed 3rd out of 4551. Ryan will start by covering some basics of how words are embedded as numbers, including Word2Vec using Gensim and the sklearn packages CountVectorizer and TfidfVectorizer. Thereafter, he'll move to more advanced topics and discuss the models he used for the Toxic Comment challenge--what worked and what didn’t work. This talk will have content for both beginners and more advanced people, and code snippets and visualization will be available for download. To learn more and register, visit

West Big Data Hub All Hands Meeting 

Sept. 20, 2018 (all day)
Location: Boise State, Boise, Idaho

Calling all HDSI partners and data enthusiasts! Join colleagues from industry, academia, nonprofits, and government on September 20 for West Big Data Innovation Hub's 2018 Open Annual All Hands Meeting in scenic Boise, Idaho!

Your leadership and participation will strengthen our community as we embark on the next phase of #BDHubs. Besides the updates on our flagship initiatives, community-led sessions on Data Science Education, Data Ethics, Smart & Connected Communities, Health Data, Infrastructure and other topics, this year we will feature hands-on facilitated 'collaboration sprints' for road-mapping current and future projects. There will be a talented graphic facilitator to help us visualize a shared, collaborative trajectory and create a tangible take-home #WestData18 souvenir. 

Also new this year: 'Pay It Forward' Registration options to support Early Career community members who might not otherwise be able to join. To learn more and register, visit


Big Data, Big Dilemmas: Data Science and Ethics

Sept. 12, 2018 (6:30-9:00 p.m.)
Location: 515 15th Street NW, F St., Washington, DC 20004

Whether you live in Washington, D.C. or Washington State, issues of data integrity, privacy, automation, and regulation are now a part of everyday life. Join UC San Diego to look deeper into today’s data science studies, research, and real world projects that are informing tomorrow’s practices and innovations.

Our conversation partners will give you a look beneath the surface of perplexing questions, ethical conundrums, and potential outcomes to help you be more informed about data science and the issues relevant to you.

You’ll also learn how UC San Diego’s new Halicioğlu Data Science Institute and the undergraduate Data Science Major are educating the next generation of data scientists in critical and practical skills and ethics. Learn More.

Invited Presentation at SDSC/UCSD

Deep Learning Acceleration of Progress toward Delivery of Fusion Energy

Aug. 29, 2018 (2 p.m.)
Location: San Diego Supercomputer Center Room 408

Speaker: Professor William Tang
Princeton University
Dept. of Astrophysical Sciences, Plasma Physics Section,
Executive Committee, Princeton Institute for Computational Science & Engineering (PICSciE), and
Principal Research Physicist, Princeton Plasma Physics Laboratory

Accelerated progress in producing accurate predictions in science and industry have been accomplished by engaging modern big-data-driven statistical methods featuring machine/deep learning/artificial intelligence (ML/DL/AI). Associated techniques being formulated and adapted have enabled new avenues of data-driven discovery in key scientific applications areas such as the quest to deliver Fusion Energy – identified by the 2015 CNN “Moonshots for the 21st Century” series as one of 5 prominent grand challenges. An especially time-urgent and very challenging problem facing the development of a fusion energy reactor is the need to reliably predict and avoid large-scale major disruptions in magnetically-confined tokamak systems such as the EUROFUSION Joint European Torus (JET) today and the burning plasma ITER device in the near future. Significantly improved methods of prediction with better than 95% predictive accuracy are required to provide sufficient advanced warning for disruption avoidance or mitigation strategies to be effectively applied before critical damage can be done to ITER -- a ground-breaking $25B international burning plasma experiment with the potential capability to exceed “breakeven” fusion power by a factor of 10 or more. This truly formidable task demands accuracy beyond the near-term reach of hypothesis-driven /”first-principles” extreme-scale computing (HPC) simulations that dominate current research and development in the field.

Recent HPC-relevant advances in the deployment of deep learning recurrent and convolutional neural networks in Princeton’s new Deep Learning Code -- "FRNN (Fusion Recurrent Neural Nets) Code on modern GPU systems. This is clearly a “big-data” project in that it has direct access to the huge JET disruption data-base of over a half-petabyte to drive these studies. FRNN implements a distributed data parallel synchronous stochastic gradient approach with Tensorflow libraries at the backend and MPI for communication. This deep learning software has demonstrated excellent scaling up to 6000 GPU's on “Titan” at the Oak Ridge National Laboratory – an achievement that has helped establish the practical feasibility of using leadership class supercomputers to greatly enhance training of neural nets to enable transformational impact on key discovery science application domains such as Fusion Energy Science. Powerful systems on which FRNN is currently deployed include: (1) Japan’s TSUBAME 3 – where over 1000 Pascal P100 GPU's have already enabled impressive hyper-parameter tuning production runs; and (2) ORNL’s SUMMIT featuring the new VOLTA GPU’s on which FRNN’s new “half-precision” algorithmic capability has produced attractive scaling results. Summarily, statistical Deep Learning software trained on very large data sets hold exciting promise for delivering much-needed predictive tools capable of accelerating scientific knowledge discovery in HPC. The associated creative methods being developed also has significant potential for cross-cutting benefit to a number of important application areas in science and industry.

SeedMe Workshop

Collaborative Data Sharing Infrastructure for Researchers

Aug. 24, 2018 (9 a.m. - 5 p.m.)
Location: San Diego Supercomputer Center

SeedMe workshop will provide a hands on experience to create and use data sharing infrastructure for research purpose. The workshop will introduce the SeedMe platform that provides powerful data organization and data sharing tools. The platform supports extensive customization and rapid collaboration by providing an ability to collocate data, its description and its discussion in one spot. The workshop attendees will take away a fully functioning data sharing website for exploring its use in their projects. See more information on the workshop page.

Workshop Information

Open ML Hackathon

Deep Learning Competition: Image Classification

July 25, 2018 (5-9 p.m.)
Location: 5575 Morehouse Drive, San Diego, California

HDSI is proud to endorse Qualcomm's Machine Learning Competition taking place on July 25th! 

During the event, you'll learn how to build and deploy an image classifier in the real world. This is a great opportunity to strengthen your machine learning skills, where you will learn about ways to solve problems such as data generalization and augmentation. You'll also be able to team up with other students to build the best model and possibly win prizes.

Prerequisites: Participants should be familiar enough with Python to install libraries and run code. Participants should also have taken an online ML course or trained a model before.

More Information and Registration

Halıcıoğlu Data Science Institute Reception

Join Us In Welcoming HDSI into San Diego's Tech Community

June 14, 2018 (5:30 - 7:00 p.m.)
CBRE | 4301 La Jolla Village Dr #3000, San Diego, CA 92122

Join your fellow tech executives in welcoming the institute and professors that will be integral in training and educating the data science professionals of San Diego’s future.

Wine, Beer, & hors-d'oeuvres will be provided.

Host Committee - Tech San Diego Senior Data Science Advisory Committee

  • Annika Jimenez, Dexcom
  • Eliot Weitz, ViaSat
  • Greg Schibler, Neustar
  • Michael Zeller, Software AG
  • Patrick Deglon, Teradata
  • Sameer Chopra, ID Analytics
  • Scott Zoldi, FICO

HDSI Distinguished Lecturer Series: Amanda Cox

May 31, 2018 (2:00 – 3:00 p.m.)
San Diego Supercomputer Center Auditorium

Amanda Cox, Editor of New York Times’ The Upshot – Data-driven reporting focused on politics, policy and economic analysis. Download the flyer for more information.

San Diego Clinical Research Network (SDCRN)

Artificial Intelligence and Machine Learning in Drug Discovery: Beyond the Hype

May 30, 2018 (5:30-8:30 p.m.)
Knobbe Martens, 12790 El Camino Real, San Diego, CA, 92130

Innovators and entrepreneurs from some of the hottest companies in the field will present real-world case studies based on cutting-edge science demonstrating how to harness artificial intelligence and machine learning to transform aspects of the life sciences industry, from R&D through personalized medicine. This is followed by an interactive panel discussion around the current state of the field, regulatory and other development hurdles, and what the future of AI and ML in drug discovery holds.

Now open for registration.

Contact: Teresa Gallagher, San Diego Clinical Research Network 858-692-1835

Data Science and Engineering

Master's Degree for Working Engineering Professionals Information Session

May 23, 2018

For more information or to register, please see the JSOE Master of Advanced Study Degree Program.

Workshop: Methods and Message

May 3, 2018 (4-5:30 p.m.)

Virginia Eubanks
Associate Professor
Political Science
University at Albany, SUNY

Location: MCC Building, Room 250

When we intervene in public debates on issues of inequality, how do we make our choices around methods, storytelling, and political engagement matter? Stories about science and technology are particularly challenging as they often implicitly reproduce dominant narratives of inequality. Join Virginia Eubanks, author of Automating Inequality: How High-Tech Tools Profile, Police and Punish the Poor for a conversation about academic research, long-form journalism, and how we chose to frame our messages. We welcome participation of people in any stage of writing, reporting, community, or academic work.

Design@Large Speaker Series

May 2, 2018 (4-6 p.m.)

Virginia Eubanks
Associate Professor
Political Science
University at Albany, SUNY

Title: Automating Inequality: How High-Tech Tools Profile, Police and Punish the Poor
Location: CSE Building, Room 1202
RSVP: Eventbrite

Abstract: Today, automated systems control which neighborhoods get policed, which families attain needed resources, and who is investigated for fraud. While we all live under this new regime of data analytics, the most invasive and punitive systems are aimed at the poor.

Virginia Eubanks will discuss her new book, Automating Inequality, which systematically investigates the impacts of data mining, policy algorithms, and predictive risk models on poor and working-class people in America. The book is full of heart-wrenching and eye-opening stories, from a woman in Indiana whose benefits are literally cut off as she lays dying to a family in Pennsylvania in daily fear of losing their daughter because they fit a certain statistical profile.

Her visit is co-sponsored by the Design Lab and the Institute for Practical Ethics.

Design@Large Speaker Series talks are free and open to the public.

HDSI Distinguished Lecture Series

April 27, 2018 (4-6 p.m.)

David Danks
L.L. Thurstone Professor of Philosophy & Psychology
Head, Department of Philosophy
Carnegie Mellon University

Title: Handling Complexity
Location: Humanities & Social Sciences 7077
UC San Diego

Abstract: It is utterly banal to observe that the world is a complex place. The more challenging question is what to do in the face of that complexity. In this talk, I will address this question along two different lines. First, I will consider how we humans cognitively handle the world’s complexity. I will argue, on both theoretical and empirical grounds, that one way we succeed (when we do) is through rapid, dynamic, goal-driven attention allocation. I then explore the numerous philosophical implications of this aspect of our cognition, particularly in the philosophy of science. Second, I will consider our use of technology—particularly, computational technologies—to handle the world’s complexity on our behalf. As an example, I will briefly describe some of the causal discovery algorithms that we have developed for complex types of data. I will then turn more generally to AI, robotic, and other technologies with autonomous capabilities. Our use of these systems requires that we have appropriate trust in them, and I will present a series of challenges for the development of such trust (at least, in the near-future). This relative lack of appropriate trust raises a number of ethical and policy concerns, and I will conclude my talk by considering ways to ethically develop and deploy AI and other autonomous technologies, in light of the world’s complexity.

Computational Neuroscience Seminar

April 16, 2018 (4-5 p.m.)

Henry D. I. Abarbanel

Physics and Scripps Institution of Oceanography, UC San Diego

Fung Auditorium, Powell-Focht Bioengineering Building
UC San Diego

Organized by the Institute for Neural Computation.

Sponsored by Brain Corp.

Download flyer

Halıcıoğlu Data Science Institute Symposium and Faculty Open House

March 2, 2018
Qualcomm Institute/Atkinson Hall

Student Poster and Presentations:

12:30 pm Dr. Elizabeth Simmons, Executive Vice Chancellor, Academic Affairs
12:40 pm Dr. Rajesh Gupta, Co-Director Halıcıoğlu Data Science Institute
12:45 pm Dr. Terry Sejnowski, Co-Director, Institute for Neural Computation
12:53 pm Dr. Molly Roberts, Political Science
1:01 pm Dr. Gordon Hanson, Global Policy and Strategy
1:09 pm Dr. Amaro, Chemistry/BioChemistry
1:17 pm Dr. Anders Dale, Neuroscience
1:25 pm Dr. George Sugihara, Climate Atmospheric Science/Physical Oceanography
1:33 pm Dr. Yoav Freund, Computer Science and Engineering
1:41 pm Dr. Brad Voytek, Cognitive Science
1:49 pm Dr. Terry Jernigan, Cognitive Science
2:00 pm Q&A
2:15 pm Student posters (Lobby)
3:00 pm Student presentations and judging (Auditorium)
4:00 pm Student posters (Lobby)
5:00 pm Closing

Faculty and Student Open House

March 2, 2018 
Qualcomm Institute (formerly CalIT2)

Join Halıcıoğlu Data Science Institute Co-Directors Rajesh Gupta and Jeffrey Elman, along with Executive Vice Chancellor for Academic Affairs Elizabeth Simmons, to celebrate the official launch of the institute and learn more about its programs.

12:30 – 2:00 Formal Program (Open to Faculty by Invitation Only)
2:00 – 5:00 Student Presentations and Posters