How can we understand spoken language so quickly?


Scientists have come a step closer to understanding how we’re able to understand spoken language so rapidly, and it involves a huge and complex set of computations in the brain.

In a study published today in the journal PNAS, researchers at the University of Cambridge developed novel computational models of the meanings of words, and tested these directly against real-time brain activity in volunteers.

“Our ability to put words into context, depending on the other words around them, is an immediate process and it’s thanks to the best computer we’ve ever known: the brain in our head.

It’s something we haven’t yet managed to fully replicate in computers because it is still so poorly understood,” said Lorraine Tyler, Director of the Centre for Speech, Language and the Brain at the University of Cambridge, which ran the study.

Central to understanding speech are the processes involved in what is known as ‘semantic composition’ – in which the brain combines the meaning of words in a sentence as they are heard, so that they make sense in the context of what has already been said.

This new study has revealed the detailed real-time processes going on inside the brain that make this possible.

By saying the phrase: “the elderly man ate the apple” and watching how the volunteers’ brains responded, the researchers could track the dynamic patterns of information flow between critical language regions in the brain.

As the word ‘eat’ is heard, it primes the brain to put constraints on how it interprets the next word in the sentence: ‘eat’ is likely to be something to do with food.

The study shows how these constraints directly affect how the meaning of the next word in the sentence is understood, revealing the neural mechanisms underpinning this essential property of spoken language – our ability to combine sequences of words into meaningful expressions, millisecond by millisecond as the speech is heard.

By saying the phrase: “the elderly man ate the apple” and watching how the volunteers’ brains responded, the researchers could track the dynamic patterns of information flow between critical language regions in the brain.

“The way our brain enables us to understand what someone is saying, as they’re saying it, is remarkable,” said Professor Tyler. “By looking at the real-time flow of information in the brain we’ve shown how word meanings are being rapidly interpreted and put into context.”

During speech, how does the brain integrate information processed on different timescales and in separate brain areas so we can understand what is said?

This is the language binding problem.

Dynamic functional connectivity (brief periods of synchronization in the phase of EEG oscillations) may provide some answers.

Here we investigate time and frequency characteristics of oscillatory power and phase synchrony (dynamic functional connectivity) during speech comprehension.

Twenty adults listened to meaningful English sentences and non-sensical “Jabberwocky” sentences in which pseudo-words replaced all content words, while EEG was recorded.

Results showed greater oscillatory power and global connectivity strength (mean phase lag index) in the gamma frequency range (30–80 Hz) for English compared to Jabberwocky. Increased power and connectivity relative to baseline was also seen in the theta frequency range (4–7 Hz), but was similar for English and Jabberwocky.

High-frequency gamma oscillations may reflect a mechanism by which the brain transfers and integrates linguistic information so we can extract meaning and understand what is said.

Slower frequency theta oscillations may support domain-general processing of the rhythmic features of speech.

Our findings suggest that constructing a meaningful representation of speech involves dynamic interactions among distributed brain regions that communicate through frequency-specific functional networks.

Dynamic Functional Connectivity During Meaningful Spoken Language Comprehension: Addressing the Language Binding Problem

How it that we can create a coherent and meaningful representation of a multi-word utterance when different features of the speech signal are processed by separate brain areas and at different timescales as the speech signal unfolds?

is This so-called “language binding problem” continues to be a central question in the neuroscience of language (Hagoort, 2005).

Functional connectivity, mediated by the phase synchronization of neuronal oscillations, provides a window into the brain’s language networks (Weiss and Mueller, 2003Giraud and Poeppel, 2012) and may provide a mechanism to help address the language binding problem.

However, relatively few studies have investigated functional connectivity during speech perception (e.g., Luo and Poeppel, 2007Kikuchi et al., 2011).

The goal of this study is to better understand the time and frequency characteristics of the functional networks that support meaningful spoken language processing in the brain.

Many previous studies have used event-related potentials (ERPs) to examine the neural basis of speech comprehension.

The high temporal precision of ERPs has been crucial for investigating how language processing unfolds in the brain over time. ERPs, however, measure localized brain responses and cannot reveal the dynamic interactions between brain areas that support language comprehension in real-time.

With time-frequency analysis of EEG oscillations, one can measure both changes in local brain activity and long-range communication among distributed brain regions during language processing.

Oscillatory power (amount of energy at a particular frequency) is thought to reflect local neuronal activity, which may be due to the number (or strength) of neurons firing at a particular frequency, as well as how synchronous their firing is (Cohen, 2014).

Additionally, a correlation in the phase of oscillations at two different electrodes (i.e., coordinated fluctuations of rhythmic excitability of neural populations recorded from different electrodes) is thought to reflect long-distance synchronization, and thus interaction, among distributed brain regions even if those regions are not physically connected (Buzsáki and Wang, 2012Siegel et al., 2012Fries, 2015).

The brain’s ability to change the extent to which neurons in different areas synchronize their patterns of firing is thought to be a mechanism by which it coordinates and integrates the flow of information within a network of participating structures (Bastos and Schoffelen, 2016).

Dynamic functional connectivity, as measured through changes in cross-trial phase synchronization over time, has been used to investigate the brain networks supporting many aspects of sensory and cognitive processing (Rodriguez et al., 1999Singer, 2007Siegel et al., 2012Fries, 2015).

As of yet, however, it has been underused to examine the brain networks supporting speech perception.

Here we explore the time and frequency characteristics of both oscillatory power and phase synchrony (dynamic functional connectivity) during meaningful spoken sentence processing.

Specifically, we ask whether there is a difference in the overall phase synchronization of EEG oscillations when healthy native English speaking adults listen to meaningful English sentences compared to non-sensical “Jabberwocky” sentences, which lack semantic content. In Jabberwocky sentences, English open class words (nouns, verbs, adjectives, adverbs) are replaced with pseudo-words that, while obeying English phonotactic rules, are void of meaning (Carroll, 1883Yamada and Neville, 2007).

Without meaningful lexical-semantic content, both the memory retrieval and the binding stages of language comprehension that unify semantic with syntactic, and phonological information are disrupted (Hagoort, 2005).

Jabberwocky uses English closed-class words (e.g., articles, prepositions) however, which is thought to allow English listeners to create a rudimentary structural representation of the sentence and engage in syntactic processing, even in the absence of meaningful semantic information (although see Hahne and Jescheniak, 2001 and Yamada and Neville, 2007 for alternative views as to whether syntactic processing recruits identical neurocognitive processes without semantic information).

Comparing English to Jabberwocky thus allows us to investigate the brain processes specific to meaningful speech comprehension and integration, while controlling for other levels of language processing (e.g., phonology, syntax).

We predict that semantic integration will be reduced or absent while listening to Jabberwocky compared to English sentences and this will be indexed by a reduction in overall oscillatory phase synchrony.

Phase synchronization of EEG oscillations can occur at different frequencies. These frequencies reflect the rate at which neurons alternate between a state in which they are more or less excitable, likely to fire and efficient at processing incoming information (Schroeder et al., 2008).

The results of previous studies suggest that oscillations in the gamma (30–80 Hz) and theta frequency range (4–7 Hz) may be important in speech processing.

For example, in terms of local power changes, greater power was seen in the middle gamma frequency range (defined as 55–75 Hz) when participants listened to their native language compared to a foreign language or speech played backward, whereas listening to both languages resulted in a power increase in the theta frequency range (4–7 Hz; Peña and Melloni, 2012).

Increased phase synchronization in the theta frequency range was also reported when participants listened to normal speech compared to speech that was degraded to the point where it was unintelligible (Luo and Poeppel, 2007). Moreover, Molinaro et al. (2013) reported increased phase synchronization in both theta and gamma frequency bands when participants read words presented in highly constraining lexical/semantic contexts that pre-activated the expected words’ lexical/semantic representations compared to words in less constraining contexts that did not benefit from such anticipatory semantic preparation.

By investigating both local and long-range oscillatory responses (power and phase synchrony, respectively), the present study extends these findings to better elucidate the brain networks supporting the comprehension and integration of meaning in speech. Based on previous findings, we expected to see increased oscillatory power and phase synchrony (functional connectivity) in gamma and theta frequency ranges when participants listened to English compared to Jabberwocky speech.

University of Cambridge
Media Contacts:
Jacqueline Garget – University of Cambridge
Image Source:
The image is in the public domain.

Original Research: Open access
“Neural dynamics of semantic composition”. Bingjiang Lyu, Hun S. Choi, William D. Marslen-Wilson, Alex Clarke, Billi Randall, and Lorraine K. Tyler.
PNAS doi:10.1073/pnas.1903402116.


Please enter your comment!
Please enter your name here

Questo sito usa Akismet per ridurre lo spam. Scopri come i tuoi dati vengono elaborati.