Lip reading causes brain activity to synchronize with sound waves even when there is no audible sound


Brain activity synchronizes with sound waves, even without audible sound, through lip-reading, according to new research published in Journal of Neuroscience.

Listening to speech activates our auditory cortex to synchronize with the rhythm of incoming sound waves. Lip-reading is a useful aid to comprehend unintelligible speech, but we still don’t know how lip-reading helps the brain process sound.

Bourguignon et al. used magnetoencephalography to measure brain activity in healthy adults while they listened to a story or watched a silent video of a woman speaking.

The participants’ auditory cortices synchronized with sound waves produced by the woman in the video, even though they could not hear it.

The synchronization resembled that in those who actually did listen to the story, indicating the brain can glean auditory information from the visual information available to them through lip-reading.

This shows how the brain activity occurs

Speech entrainment during audio-only (top) and video-only (bottom). Image is credited to Bourguignon et al., JNeurosci 2019.

The researchers suggest this ability arises from activity in the visual cortex synchronizing with lip movement.

This signal is sent to other brain areas that translate the movement information into sound information, creating the sound wave synchronization.

Visual speech information synchronously presented with speech sound can affect speech perception. Consequently, visual speech information, such as the speaker’s face uttering the speech sound, is important in the perceptual process of auditory input in individuals with impaired hearing, as perception of ambiguous speech sound can become clearer with the congruent visual speech information [12].

This is commonly known as lip-reading. However, if the visual speech information is incongruent to the speech sounds, perception of the audio speech may be affected. For example, presentation of the /be/ sound (audio) with visual /ge/ (speaker’s face uttering the /ge/ sound), resulting in incongruent visual stimulus, often results in perception of the presented /be/ sound as the /de/ sound. This effect of incongruent visual speech information is well known as the McGurk effect [3].

Temporal synchronization between the audio and visual (A/V) stimuli is known to be critical for effective lip reading; i.e., there is a limited range referred to as the “temporal window” in which A/V perceptual binding occurs [47].

Based on psychophysical examination of the temporal window of McGurk phenomenon, high-rate fusion response (i.e., positive A/V integrative response) was observed in response to perfectly synchronous A/V stimuli (no lag) maintained over the temporal window of about 200–300 ms (i.e., +/-100 ms to +/-150 ms) with asymmetry between the audio vs. visual lags (more robust for audio lags) [47]. Greater temporal offset between the audio and visual stimuli caused lower fusion response whereas the asymmetry between audio and visual lags was maintained.

That is, when audio stimuli lead visual stimuli by about 500 ms, no significant fusion response could usually be obtained. In contrast, when visual stimuli lead audio stimuli with the same offset, some fusion response could be observed.

Since simultaneous visual input influences auditory input in A/V speech perception, it would be important to know when, where, and how the visual inputs affect the processing of auditory inputs. The left superior temporal sulcus (STS) receives neural projections from both auditory and visual cortices, and is known to be important in A/V multimodal coupling [814].

In addition, speech processing in the auditory cortex, which is an earlier processing site than the STS in auditory signal processing, may be modulated by the visual effects conveyed via the direct corticocortical (visual cortex to auditory cortex) pathway, which does not involve the STS [1518].

The visual effects on the auditory cortex can be observed in the auditory evoked N100 response of electroencephalography (EEG) and/or N100m response of magnetoencephalography (MEG) generated from around the auditory cortex [15]. The latencies of the N100(m) response to monosyllables are shortened and the amplitudes of those responses are decreased by the simultaneous presentation of visual speech information [12151924].

Moreover, as observed in the STS, dominant lip-reading effects in the left hemisphere also occur in these visual effects on the N100 response [1222]. On the other hand, the involvement of the right hemisphere in the processing of A/V coupling may complementally vary, based on the report of activation of the right hemisphere after total damage to the left temporal cortex [25].

Some predictive mechanism processing the preceding visual speech information is speculated to account for these facilitative speed-up and suppressive effects seen in the N100m caused by the presentation of visual speech information [19232426].

However, whether a similar type of temporal window to that obtained in psychophysical A/V perception such as the McGurk effect could also be seen in this predictive visual effect on the N100m response has not been clarified.

Considering that the temporal window appears to be one of the characteristics of the psychophysical A/V integrative phenomenon, what kind of temporal window could be observed in the auditory responses at the auditory cortex level with a latency of about 100 ms would be important to know to clarify the detailed mechanism of A/V coupling.

The present study examined the effects of asynchrony between audio and visual stimuli on the early auditory evoked fields (AEFs) using whole head MEG [2736], focusing on the visual effects on the N100m originating from the left auditory cortex which is the dominant hemisphere in A/V coupling [8142237].

The N100m in response to the monosyllabic sound /be/ presented with a moving image uttering /ge/ (visual stimuli to cause McGurk effects) was assessed under 5 different A/V offset conditions (audio lag -500 ms, -100 ms, 0 ms [synchronous, no delay], +100 ms, and +500 ms), as well as the control condition (audio stimuli with visual noise), and the effects of A/V asynchrony on N100m responses, which were recorded from the left hemisphere, were compared with those on the psychophysical responses, which were recorded during measurement of the N100m.

Media Contacts:
Calli McMurray – SfN

Original Research: Closed access
“Lip-Reading Enables the Brain to Synthesize Auditory Features of Unknown Silent Speech”. Mathieu Bourguignon, Martijn Baart, Efthymia C. Kapnoula and Nicola Molinaro.
Journal of Neuroscience doi:10.1523/JNEUROSCI.1101-19.2019.


Please enter your comment!
Please enter your name here

Questo sito usa Akismet per ridurre lo spam. Scopri come i tuoi dati vengono elaborati.