Summary: Researchers have developed a computational framework that maps the way the brain processes speech during actual conversations. Using electrocortical contrast (ECOG) and AI speech models, this study analyzed brain activity over 100 hours and revealed how different regions process sound, speech patterns, and word meanings.
The findings show that the brain processes language in sequence. Before you talk, move from thought to language and work in the opposite direction to interpret the spoken language. This framework accurately predicts brain activity even in new conversations, surpassing previous models.
These insights can improve speech recognition technology and support individuals with communication disorders. This research provides a deeper understanding of how the brain can be more effortlessly involved in conversation.
Important Facts:
Layering Processing: The brain processes speech at three levels: sound, speech patterns, and word meaning. After hearing it, it deciphers the meaning. Real-world insights: AI models accurately predicted brain activity during natural conversation.
Source: Hebrew University, Jerusalem University
A new research led by Dr. Ariel Goldstein, Hebrew University’s Business School at Hebrew University in Jerusalem, Google Research, was developed in collaboration with Princeton University’s Institute of Neuroscience, based on the Computatis Frame, based on the NYU Langone Comprehepsy Center, in collaboration with Husson Labs at Princeton University’s Institute of Neuroscience, and Dr. Flinker and Dr. Devinkey.
This study bridges acoustic, speech and word-level linguistic structures and provides unprecedented insight into how the brain processes daily speech in a real environment.
Published in the Nature Human Behavior, the study used a technique called electrocortical contrast (ECOG) to document brain activity beyond 100 hours of natural natural conversation.
To analyze this data, the team used a speech-to-text model called Whisper. This helps to break down the language into three levels: simple sounds, speech patterns, and word meanings. These layers were compared to brain activity using advanced computer models.
The results showed that the framework can predict brain activity very accurately. Even when applied to conversations that were not part of the original data, the model correctly matched different parts of the brain with a specific linguistic function.
For example, areas involved in hearing and speaking are tailored to sound and sound patterns, but involved in high-level understanding in line with the meaning of words.
This study also found that the brain processes language in turn. Before we speak, our brains move from thinking about words to forming sounds.
The framework used in this study was more effective than the old methods of capturing these complex processes.
“Our findings help us understand how our brain handles conversations in real life environments,” Dr. Goldstein said.
“By connecting the different layers of language, we are revealing the mechanisms behind what we all do naturally.
This research has potential practical applications, ranging from improving speech recognition technology to developing better tools for people with communication challenges. It also offers new insights into how your brain makes conversations feel so easy, whether you chat with friends or have an argument.
This study illustrates an important step in building more advanced tools and studying how the brain processes language in real-world situations.
About this speech processing and neuroscience research news
Author: Yarden Mills
Source: Hebrew University, Jerusalem University
Contact: Yarden Mills – Hebrew University, Jerusalem University
Image: Image credited to Neuroscience News
Original research: Open access.
“Unified acoustic-to-language embedded space captures the neural foundations of natural language processing in everyday conversation,” Ariel Goldstein et al. Natural human behavior
Abstract
The unified acoustic-to-language embedded space captures the neural foundations of natural language processing in everyday conversation.
This study introduces a unified computational framework that connects acoustic, speech and word-level linguistic structures to study the neural foundations of everyday conversation in the human brain.
Using electrocorticalography, neural signals were recorded over 100 hours of speech production and understanding as participants engaged in open-ended real-life conversations. We extracted low-level acoustic, mid-level audio, and contextual word embeddings from text models (whispers) from multimodal audio.
During speech generation and understanding, we developed an encoding model that linearly maps these embeddings to brain activity.
Surprisingly, this model accurately predicts neural activity at each level of the language processing hierarchy over new conversation times not used to train the model.
The model’s internal processing hierarchy is consistent with the cortical hierarchy of speech and language processing, with sensory and motor regions more suitably aligned with the model’s speech embedding, and the model’s language embedding and higher level language embedding.
The whispering model captures a temporary sequence of interlinguistic encodings that are encoded before word clarification (speech production) and post-speech joints (speech understanding). The embeddings learned in this model are superior to symbolic models in capturing neural activities that support natural speech and language.
These findings support a paradigm shift towards a unified computational model that captures the entire processing hierarchy for speech understanding and production in real-world conversations.