Babak Salimi, Assistant Professor, Halıcıoğlu Data Science Institute (HDSI), UC San Diego has been awarded with the NSF Faculty Early Career Development Program (CAREER) award. This recognition underscores his exceptional potential to shape the landscape of research and education in the field of data science while leading advancements in his domain.
The NSF CAREER program stands as a cornerstone initiative within the National Science Foundation, aimed at supporting early-career faculty who demonstrate the promise to serve as academic role models in both research and education. Through this program, Professor Salimi’s project, titled “Causal Modeling for Data Quality and Bias Mitigation,” has been selected for its innovative approach and its potential to address critical challenges in data-driven decision-making.
REDEFINING ALGORITHMIC BIAS AS A DATA QUALITY MANAGEMENT CHALLENGE
Professor Salimi’s project redefines the challenge of algorithmic bias by framing it as a data quality management issue. This innovative perspective approaches bias through the dimensions of data quality management: consistency, completeness, and accuracy. The project hypothesizes that many issues attributed to algorithmic bias stem directly from the underlying data on which these algorithms operate. By focusing on these key aspects of data quality from the initial stages of data collection, preparation, and processing, the initiative seeks to prevent the perpetuation of biases before they can influence machine learning models and decisions.
In the era of big data and its pervasive influence across various sectors, algorithmic decision-making systems have become ever-present, impacting critical domains such as credit scoring, healthcare, and criminal justice. Despite their stated objectivity, these systems often harbor biases that lead to inaccurate and unfair outcomes, eroding trust and impeding progress. This paradigm shift not only aims to enhance the robustness and trustworthiness of data-driven systems but also sets a new standard for how biases in machine learning environments are understood and addressed. This comprehensive approach integrates causal modeling and traditional data management strategies, ensuring that the data used in critical decision-making processes is as unbiased and representative as possible.
A HOLISTIC APPROACH FOR DATA CLEANING AND DEBIASING
Professor Salimi introduces a robust, multi-faceted strategy to enhance the integrity of data-driven decisions across various sectors. At its core, the approach focuses on the following research thrusts:
Developing Novel Data Cleaning Techniques: This involves creating methods to enforce complex integrity constraints, crucial for repairing data to maintain consistency with specialized constraints. These constraints address the statistical nuances essential for the reliability, fairness, and generalizability of machine learning models.
Holistic Data Debiasing Framework: This involves creating a comprehensive framework specifically designed to address a broad spectrum of data biases and quality issues in a unified manner. It incorporates advanced debiasing algorithms essential for systematically correcting and refining data inputs. This enables the framework to produce outcomes that are not only fairer but also more accurate and trustworthy, enhancing the overall integrity and reliability of data-driven decisions.
Quantification of Uncertainty Due to Data Quality Issues: This initiative develops methodologies to assess and quantify the uncertainty in data-driven decisions, particularly those arising from unresolved biases and quality defects. These methods are crucial for accounting for data quality issues, especially when recovery from bias and quality defects is fundamentally infeasible. By identifying and quantifying these uncertainties, the approach enhances the transparency and reliability of decision-making processes.
Developing Techniques to Pinpoint and Respond to Biases and Errors: This involves tracing unexpected results from decision-making algorithms back to data collection, preparation, and processing. It incorporates root-cause analysis to uncover the sources of unexpected behavior in ML models, facilitating informed adjustments in the data pipeline. By adapting and improving data handling processes dynamically as new issues arise, these techniques promote continuous data quality improvement, leading to more reliable and trustworthy data-driven decisions.
IMPACT ON RESEARCH AND SOCIETY
The implications of this project extend beyond the academic sphere, influencing broader societal norms and practices. By developing new tools and methodologies for data quality and bias mitigation, this research paves the way for more equitable and reliable decision-making processes across various sectors. The educational initiatives associated with this project aim to broaden participation and enhance data literacy among diverse groups, fostering a more inclusive data science community. Additionally, this work will contribute to public discourse and policy by demonstrating the critical importance of high-quality, unbiased data in shaping ethical and effective data-driven systems.