The human brain can recognize a familiar tune within 100 milliseconds of the onset of sound


The human brain can recognise a familiar song within 100 to 300 milliseconds, highlighting the deep hold favourite tunes have on our memory, a UCL study finds.

Anecdotally the ability to recall popular songs is exemplified in game shows such as ‘Name That Tune’, where contestants can often identify a piece of music in just a few seconds.

For this study, published in Scientific Reports, researchers at the UCL Ear Institute wanted to find out exactly how fast the brain responded to familiar music, as well as the temporal profile of processes in the brain which allow for this.

The main participant group consisted of five men and five women who had each provided five songs, which were very familiar to them.

For each participant researchers then chose one of the familiar songs and matched this to a tune, which was similar (in tempo, melody, harmony, vocals and instrumentation) but which was known to be unfamiliar to the participant.

Participants then passively listened to 100 snippets (each less than a second) of both the familiar and unfamiliar song, presented in random order.

Around 400 seconds was listened to in total. Researchers used electro-encephalography (EEG) imaging, which records electrical activity in the brain, and pupillometry (a technique that measures pupil diameter – considered a measure of arousal).

The study found the human brain recognised ‘familiar’ tunes from 100 milliseconds (0.1 of a second) of sound onset, with the average recognition time between 100ms and 300ms. This was first revealed by rapid pupil dilation, likely linked to increased arousal associated with the familiar sound, followed by cortical activation related to memory retrieval.

No such differences were found in a control group, compromising of international students who were unfamiliar with all the songs ‘familiar’ and ‘unfamiliar’.

Senior author, Professor Maria Chait, (UCL Ear Institute) said: “Our results demonstrate that recognition of familiar music happens remarkably quickly.

“These findings point to very fast temporal circuitry and are consistent with the deep hold that highly familiar pieces of music have on our memory.”

Professor Chait added: “Beyond basic science, understanding how the brain recognises familiar tunes is useful for various music-based therapeutic interventions.

“For instance, there is a growing interest in exploiting music to break through to dementia patients for whom memory of music appears well preserved despite an otherwise systemic failure of memory systems.”

“Pinpointing the neural pathway and processes which support music identification may provide a clue to understanding the basis of this phenomena.”

Study limitations

‘Familiarity’ is a multifaceted concept. In this study, songs were explicitly selected to evoke positive feelings and memories.

Therefore, for the ‘main’ group the ‘familiar’ and ‘unfamiliar’ songs did not just differ in terms of recognisability but also in terms of emotional engagement and affect.

The study found the human brain recognised ‘familiar’ tunes from 100 milliseconds (0.1 of a second) of sound onset, with the average recognition time between 100ms and 300ms.

While the songs are referred to as ‘familiar’ and ‘unfamiliar’, the effects observed may also be linked with these other factors.

While care was taken in the song matching process, this was ultimately done by hand due to lack of availability of appropriate technology.

Advancements in automatic processing of music may improve matching in the future.

Another limitation is the fact that only one ‘familiar’ song was used per subject. This likely limited the demands on the memory processes studied.

Anecdotally, we as human listeners seem remarkably apt at recognizing sound sources: the sound of a voice, approaching footsteps, or musical instruments in each of our cultures.

There is now quantitative behavioral evidence supporting this idea (for a review, see Agus et al., in press1). However, the underlying neural mechanisms for such an impressive feat remain unclear.

One way to constrain the range of possible mechanisms is to measure the temporal characteristics of sound source recognition.

Using a straight-forward operational definition of recognition as a correct response to a target sound defined by its category (e.g., a voice among musical instruments), Agus et al.2 have shown that reaction times for recognition were remarkably short, with an overhead compared to simple detection between 145 ms and 250 ms depending on target type.

When natural sounds were artificially shortened by applying an amplitude “gate” of variable duration, it was observed that recognition remained above chance for durations in the milliseconds range35.

However, none of these results speak directly to the processing time required for sound recognition. For reaction times, the comparison of recognition and simple detection times cannot be unequivocally used to estimate processing time6. For gating, recognizing a very short sound presented in isolation could still require a very long processing time: the short sound duration only constrain the type of acoustic features that are used7.

Similar questions about the processing time required for visual recognition of natural objects have been asked8,9. They have typically been addressed by what is known as the now-classic “rapid sequential visual presentation task” (RSVP)1013 (for a review, see Keysers et al.13).

Briefly, in RSVP, images are flashed in a rapid sequence, with images from one target category presented among many distractors belonging to other categories. Participants are asked to report the sequences containing an image from the target category.

The fastest presentation rate for which target recognition remains accurate is taken as a measure of processing time. The core hypothesis is that, for a target to be accurately recognized, it needs to have been sufficiently processed before the next distractor is flashed13.

Using a new auditory paradigm inspired by RSVP, the Rapid Audio Sequential Presentation paradigm (RASP; see Suied et al.14 for pilot data), the current study addresses the processing time of natural sounds like voices and instruments.

Natural sounds were presented in rapid succession and participants had to report sequences containing a sound from a target category (e.g., a voice) among several distractor sounds from other categories (e.g., musical instruments). Unlike for vision, in audition sounds cannot be flashed instantaneously.

Fortunately, gating studies have shown that short sounds were still recognizable5, so we used short gated sounds to create sufficiently rapid sequences. The experimental measure used is the fastest presentation rate for which the task could be performed above chance. This measure was taken as an estimate for the processing time needed for recognition.

Temporal processing is a vast field of inquiry for audition, with time constants found in the range of microseconds for sound localization15 up to seconds for acoustic memory16 and includes variable windows in the tens of milliseconds for e.g. pitch17,18 or speech perception19.

However, there are very little previous studies that aimed to characterize processing times with methods and intent similar to RASP. In seminal studies using backward masking, which can be viewed as a RASP paradigm for a sequence of single target – single distractor, the duration of a so-called “echoic memory” was evaluated (reviewed by e.g. Massaro20; Cowan21).

When a brief target was followed by a second masking sound, recognition of the target sound reaches a plateau with an inter-onset interval of about 250 ms. This duration was hypothesized to correspond to the duration of a memory store used for auditory recognition20.

In another series of experiments on sequence processing, Warren and colleagues measured the fastest sequences for which processing was local, i.e. for each individual sound, instead of global at the sequence level, as indexed for instance by order discrimination tasks2224. Overall, they advocated for a global mode of processing when sequence items were shorter than about 40 ms.

Finally, in some of our previous work, the RASP paradigm was introduced14. Sequences of short distractor sounds were presented with, in half of the trials, a short target sound at a random position in the sequence. Performances decreased with increased presentation rate, but recognition was still possible for presentation rates of up to 30 sounds per second.

This suggests a lower-bound limit for processing time needed for recognition as short as 30 ms. This seems remarkable because, even with the high temporal resolution of electrophysiological recordings, a minimum of 70 ms after stimulus onset had been needed to observe differences in evoked related potentials (ERPs) between categories of sounds25,26.

However, this first use of a RASP paradigm was more a proof of concept than an extensive mapping of recognition processing time. In the present study, the RASP paradigm was used to further investigate the lower-bound limit of natural sound recognition put forward by Suied et al.14 and test for a variety of potential interpretations.

We used a sound corpus presenting on the one hand a large acoustical variability, but on the other hand attempted to globally match as much as possible the target and distractor categories. As in Agus et al.2 and Suied et al.5, we used sung vowels and musical instruments as stimuli.

All were musical sounds so they could be drawn from the same pitch range, making pitch an irrelevant cue for recognition and forcing participants to use timbre cues (all stimuli were presented with the same duration and loudness within an experimental block)27.

We used here an even larger sound corpus than in previous studies, with 4 different sources for the voice category, 2 male and 2 female, singing 4 different vowels and four different musical instruments (Piano, Saxophone, Bassoon, Clarinet).

The large sound corpus was intended to prevent participants from identifying an artificially acoustic cue that would reliably indicate of a given target category, which would not be representative of natural sound recognition outside of the laboratory.

Participants were first tested in a gating experiment, measuring their ability to recognize short vowel and instrument sounds presented in isolation and without time limit to respond.

This participant selection experiment was intended to exclude participants who would not produce any meaningful result on the main experiments, where sequences of such short sounds were presented.

Four different RASP experiments were run to assess the fastest presentation rate for which target recognition was still possible. In each of these experiments, a symmetric design was used: voice targets to be reported in a stream of musical instruments as distractors and instrument targets to be reported in a stream of voice distractors.

In a first set of three experiments, we tested for the effect of single sound duration, number of sounds in a sequence, relative pitch between target and distractors, target position in the sequence on performance. In a fourth experiment, reaction times (RTs) were also measured. Overall, fast presentation rates were found for recognition in all conditions, with an advantage for vocal targets.

Media Contacts:
Henry Killworth – UCL
Image Source:
The image is in the public domain.

Original Research: Open access
“Rapid Brain Responses to Familiar vs. Unfamiliar Music – an EEG and Pupillometry study”. Robert Jagiello, Ulrich Pomper, Makoto Yoneya, Sijia Zhao & Maria Chait.
Scientific Reports doi:10.1038/s41598-019-51759-9.


Please enter your comment!
Please enter your name here

Questo sito usa Akismet per ridurre lo spam. Scopri come i tuoi dati vengono elaborati.