By Kimberly Mann Bruch
A team of University of California San Diego data science graduate students have built an artificial intelligence system that can “listen” to the forest and pick out individual bird species, offering a new tool for research taking place in some of the world’s most biodiverse — and noisiest — habitats.
The team’s work, led by Chandrima Das, was developed for Cornell University’s Lab of OrnithologyBirdCLEF+ 2025 challenge, which asks researchers to identify 206 species, from birds to insects, using one‑minute audio clips recorded in Colombia’s Middle Magdalena Valley. More than 2,000 teams entered the challenge, applying machine learning to identify under‑studied species using their acoustic signatures.
Teaching computers to hear birds
Instead of training one giant model to recognize every species at once, Das said that the UC San Diego team trained a separate detector for each bird, giving the computer 206 different “ears,” each tuned to a single species. She explained that the system learned which patterns of sound such as pitch ranges and energy at different frequencies that tend to belong to a given species, then estimated the chance that bird is present in each chunk of audio.
This architectural choice offered several key advantages. The modular design made the system easier to interpret and debug: if one bird’s detector underperforms, researchers can inspect and retrain just that classifier rather than untangling a massive black-box neural network. The XGBoost-based approach also runs efficiently on modest hardware without requiring powerful GPUs.
“This design made the system easier to understand and adjust: if one bird’s detector struggles, researchers can inspect just that model rather than untangling a single black‑box network,” said Das, who is a graduate student in the School of Computing, Information and Data Sciences’ Halıcıoğlu Data Science Institute. “Our system also runs on modest hardware without needing powerful graphics cards, which is important for future use on low‑power field devices like remote sensors.”
Why this is hard in the real world
The recordings used in BirdCLEF+ 2025 come from real environments, not clean lab conditions: wind, insects, overlapping calls and distant traffic all make it difficult for both humans and machines to hear clearly. The most significant challenge was class imbalance: while some common species had hundreds of training recordings, rare species had only a handful of examples.
To help with this, the team created extra training examples for poorly represented species by slightly speeding up, slowing down or shifting the pitch of existing recordings. This significantly improved detection for several of the rarest birds.
The team included Das alongside fellow data science graduate students Shreejith Suthraye Gokulnath, Arya Gaikwad, Keerthana Senthilnathan and Shruti Prasad Sawant. Their work was published in the CEUR Workshop Proceedings.

In her undergraduate years, Chandrima published research using machine learning to help identify COVID‑19 drug targets and to analyze large-scale data, giving her an early look at how data can have a huge impact in this world. She later worked with computer vision models, co‑authoring a work on detecting diabetic retinopathy from retinal scans to support more accurate diagnosis. Outside of the lab, Das has focused on making AI usable for everyday problems from building AI assistants to answering questions about complex board‑game manuals, a salary analytics dashboard to support better career planning and a multimodal nutrition application with multi-agent framework to offer real‑time dietary feedback. Today, her work connects directly to the themes of the BirdCLEF+ project, at the San Diego Zoo Wildlife Alliance, where she helps the population sustainability team build computer‑vision pipelines for automatically detecting wildlife in images and videos. She also works at the San Diego Supercomputer Center, where she assists with the development of outage‑prediction and explainable AI tools for San Diego Gas and Electric.




