Voice assistants and speech recognition tools have struggled for years to understand the way many Black people naturally speak.
Commands get misinterpreted. Responses often miss the mark, and as a result, users usually feel unheard unless they change their voices to fit technology that wasn’t built with them in mind. Howard University and Google Research are working to change that.
The two institutions have collaborated on Project Elevate Black Voices and have released a first-of-its-kind dataset: over 600 hours of African American English (AAE) collected from communities across 32 states, according to a press release from Howard University, obtained by AFROTECH™.
The goal is to help artificial intelligence systems recognize, respect, and respond to the full range of Black speech, not just edited, flattened, or code-switched versions of it.
Project Elevate Black Voices is a multi-year research initiative co-designed and led by Black researchers to build high-quality AAE speech data that can improve automatic speech recognition (ASR) systems, according to the school’s partnership website.
The project takes a community-first approach, prioritizing trust, privacy, and data ownership. From city-based activations to ethical licensing, the initiative is grounded in one principle: ensuring that Black voices are not only included in innovation but centered in its design and direction.
“African American English has been at the forefront of United States culture since almost the beginning of the country,” Dr. Gloria Washington, Howard University researcher and co-principal investigator, said in the press release. “Voice assistant technology should understand different dialects of all African American English to truly serve not just African Americans, but also other persons who speak these unique dialects. It’s about time that we provide the best experience for all users of these technologies.”
Rather than relying solely on academic labs, researchers brought the work to the community. Project leaders hosted curated events in cities across the country, led by local Black panelists who facilitated open discussions on culture, language, and the role of AI in everyday life.
The sessions created safe, trusted spaces for people to share real experiences with voice technology and how often those technologies fall short.
“As a community-based researcher, I wanted to carefully curate the community activations to be a safe and trusted space for members of the community to share their experiences about tech and AI and also to ask those uncomfortable questions regarding data privacy,” Dr. Lucretia Williams, project lead and Howard researcher, said in the press release.
Following each event, participants were invited to contribute their voices over a three-week window. That collection effort yielded a rich and diverse dataset of AAE speech patterns, including dialects and diction that are often overlooked or misinterpreted by mainstream AI systems.
Researchers also noted a critical challenge: many Black users have learned to adapt their voices when interacting with voice assistants, creating an artificial gap in authentic speech data.
Howard University will own the dataset, but Google is allowed to use it to improve its ASR technologies, according to the website.
The Howard African American English Dataset 1.0 will be made available first to researchers and institutions within the HBCU network. Howard also notes that the goal is to ensure that the data is applied in ways that align with the interests of Black communities and uphold the values of community-driven research. Access for external organizations will be evaluated later, with priority given to those organizations committed to inclusive and equitable tech development.
“Working with our outstanding partners at Howard University on Project Elevate Black Voices has been a tremendous and personal honor,” said Courtney Heldreth, co-principal investigator at Google Research, via the press release. “It’s our mission at Google to make technology that’s useful and accessible, and I truly believe that our work here will allow more users to express themselves authentically when using smart devices.”
This release marks more than a technical milestone. It’s a reimagining of how AI should be built, with ethical guardrails, cultural specificity, and a deep respect for language as lived experience. It’s also a model for how institutions can use their influence to shape technology that works for the people who’ve routinely been left out of the conversation.