Inner speech – the experience of silent, verbal thinking – has been implicated in many cognitive functions, including problem-solving, creativity and self-regulation (Morin, 2009; Fernyhough, 2013; Alderson-Day and Fernyhough, 2015a), and disruptions to the ‘internal monologue’ have been linked to varieties of pathology, including hallucinations and depression (Frith, 1992; Nolen-Hoeksema, 2004).
Enhanced understanding of inner speech hence has implications for understanding of both typical and atypical cognition.
Although interest in inner speech has grown in recent years (Morin et al., 2011; Williams et al., 2012; Fernyhough, 2013), conceptual and methodological challenges have limited what is known about the neural processes underpinning this common experience.
Most neuroimaging studies to date have operationalized inner speech as a unitary phenomenon equivalent to a first-person monologue (Hinke et al., 1993; Simons et al., 2010).
Methods of eliciting inner speech have typically involved either subvocal recitation (e.g. covertly repeating ‘You are a x’ in response to a cue; McGuire et al., 1995) or prompting participants to make phonological judgements about words using inner speech (such as which syllable to stress in pronunciation; Aleman et al., 2005).
Such studies have shown recruitment during inner speech of areas associated with overt speech production and comprehension, such as left inferior frontal gyrus (IFG), supplementary motor area (SMA) and the superior and middle temporal gyri (McGuire et al., 1996; Shergill et al., 2002; Aleman et al., 2005).
However, inner speech is a complex and varied phenomenon. In behavioural studies, everyday inner speech is often reported to be involved in self-awareness, past and future thinking and emotional reflection (D’Argembeau et al., 2011; Morin et al., 2011), while in cognitive research, inner speech appears to fulfill a variety of mnemonic and regulatory functions (e.g. Emerson and Miyake, 2003; see Alderson-Day and Fernyhough, 2015a, for a review). Vygotsky (1987) posited that inner speech reflects the endpoint of a developmental process in which social dialogues, mediated by language, are internalized as verbal thought.
Following from this view, the subjective experience of inner speech will mirror the external experience of communication and often have a dialogic structure (Fernyhough, 1996, 2004), involving the co-articulation of differing perspectives on reality and, in some cases, representation of others’ voices.
Evidence for the validity of these distinctions is provided by findings from a self-report instrument, the varieties of inner speech questionnaire (VISQ: McCarthy-Jones and Fernyhough, 2011).
Studies with student samples have documented high rates of endorsement (>75%) for inner speech involving dialogue rather than monologue, alongside a number of other phenomenological variations (Alderson-Day et al., 2014; Alderson-Day and Fernyhough, 2015b).
Recognizing this complexity of inner speech, particularly its conversational and social features, is important both for ecological validity (Fernyhough, 2013) and for understanding atypical cognition (Fernyhough, 2004). Auditory verbal hallucinations (AVH) have been proposed to reflect misattributed instances of inner speech (Bentall, 1990; Frith, 1992), but studies inspired by this view have arguably relied on a relatively impoverished, ‘monologic’ view of inner speech.
In the context of a growing recognition of social and conversational dimensions of AVH (Bell, 2013; Ford et al., 2014), knowing more about the heterogeneity of inner speech could enhance AVH models (Jones and Fernyhough, 2007).
Almost no data exist on the neural basis of dialogic or conversational inner speech, and what there is has largely focused on imagining words or sentences spoken in other voices (often referred to as ‘auditory verbal imagery’).
For example, Shergill et al. (2001) asked participants either to silently rehearse sentences of the form ‘I like x…’ in their own voice (inner speech) or to imagine sentences spoken in another voice in the second or third person (auditory verbal imagery).
While sentence repetition was associated with activation of left IFG, superior temporal gyrus (STG), insula and the SMA, imagined speech in another person’s voice recruited a bilateral frontotemporal network, including right IFG, left pre-central gyrus and right STG.
Similarly, in an AVH study by Linden et al. (2011), auditory imagery for familiar voices, such as conversations with family members, was associated with bilateral activation in IFG, superior temporal sulcus (STS), SMA and anterior cingulate cortex in healthy participants.
Research on overt conversational processing has also implicated a bilateral network including right frontal and temporal homologues of left-sided language regions.
For example, Caplan and Dapretto (2001) compared judgements for logical and contextual violations of conversations in an functional magnetic resonance imaging (fMRI) task.
Whereas logic judgements were associated with a left-sided Broca–Wernicke network, judgements about pragmatic context recruited right inferior frontal and middle temporal gyri, along with right prefrontal cortex (PFC).
The involvement of right frontotemporal regions in pragmatic language processing is supported by evidence of selective impairments in prosody, humour and figurative language in cases of right-hemisphere damage (Mildner, 2007).
Finally, two recent studies by Yao et al. (2011; 2012) have indicated a specific role for right auditory cortex in the internal representation of other voices. In a study of silent reading, Yao et al. (2011) examined activation of left and right auditory cortex when participants read examples of direct and indirect speech (e.g. ‘The man said ‘I like cricket’’ vs ‘The man said that he likes cricket’).
Reading of direct speech was specifically associated with activation in middle and posterior right STS compared with indirect speech. The same areas were also active in a second study (Yao et al., 2012) when participants listened to examples of direct speech read in a monotonous voice, but that was not the case during listening to indirect speech. Yao et al. argued that the activation of these regions during silent reading and listening to monotonous direct speech might reflect an internal simulation of the suprasegmental features of speech, such as tone and prosody.
Taken together, these findings suggest that dialogic forms of inner speech are likely to draw on a range of regions beyond a typical left-sided perisylvian language network, including the right IFG, right middle temporal gyrus (MTG) and the right STG/STS.
Following Shergill et al. (2001) and, to a lesser degree, Yao et al. (2011), it could be hypothesized that the involvement of these regions is required for the simulation of other people’s voices to complement one’s own inner speech.
On such a view, dialogic inner speech could be conceptualized simply as monologic inner speech plus the phonological representation of other voices, leading to recruitment of voice-selective regions of right temporal cortex.
However, generating an internal conversation requires more than simply mimicking the auditory qualities of the voices involved. First, dialogic inner speech could draw on theory-of-mind (ToM) capacities, requiring not only just the representation of a voice but also the sense and intention of a plausible and realistic interlocutor.
If dialogic inner speech utilized such processes, then it should be possible to identify recruitment of typical ToM regions, including medial PFC (mPFC), posterior cingulate/precuneus and the temporoparietal junction (TPJ) area, encompassing posterior STG, angular gyrus and inferior parietal lobule (Spreng et al., 2009). Right TPJ has been associated with ToM in a number of fMRI and positron emission tomography (PET) studies, mostly based on false-belief tasks (Saxe and Powell, 2006), while left TPJ has been linked to mental state representation (Saxe and Kanwisher, 2003) and understanding of communicative intentions (Ciaramidaro et al., 2007).
A view of dialogic inner speech as drawing on ToM capacities would suggest that it should be associated with established ToM networks and posterior temporoparietal cortex, in addition to frontotemporal regions associated with voice representation.
A second key difference between dialogue and monologue concerns their structure and complexity. Generating an internal dialogue involves representational demands that are absent from sentence repetition or subvocal rehearsal.
Whereas, in monologue, a single speaker’s voice or perspective is sufficient, in dialogue more than one perspective must be generated, maintained and adopted on an alternating basis (Fernyhough, 2009). Internally simulating a conversation could also involve imagination of setting, spatial position and other details that distinguish interlocutors.
Therefore, any differences observed between dialogic and monologic inner speech may not reflect representation of other voices or agents, so much as indexing the requirement to generate and flexibly switch between conversational positions and situations ‘in the mind’s eye’.
If dialogic inner speech depended on such skills, it might be expected to recruit areas more typically associated with the generation and control of mental imagery, such as middle frontal gyrus (MFG), precuneus and superior parietal cortex (Zacks, 2007; McNorgan, 2012).
There are therefore reasons to believe that the production of dialogic inner speech will differ from monologic examples of the same process in three ways: recruitment of regions involved in representing other voices, involvement of ToM resources to represent other agents and the activation of brain networks involved in the generation and control of mental imagery.
To test these predictions, we employed a new fMRI paradigm for eliciting monologic (i.e. verbal thinking from a single perspective) and dialogic inner speech, so that the neural correlates of the two can be compared.
To investigate the cognitive processes involved in dialogic inner speech, we used a conjunction analysis (Price and Friston, 1997) to compare dialogue-specific activation with two other tasks: a ToM task (Walter et al., 2004) and a novel perspective-switching task. The ToM task was chosen because it included non-verbal scenarios requiring inferences about communication and the representation of other agents’ intentions; in this way, any conjunction between dialogue and ToM should not reflect overlaps in the processing of verbalized language.
The perspective-switching task was developed to match the switching and imagery-generation demands of the dialogic task, while avoiding the inclusion of social agents, which feature in many existing perspective-switching tasks.
Conjunctions observed between the perspective-switching and dialogic tasks should therefore reflect similarities in structure and task demands, rather than representations of agents and mental states tapped in the ToM task.
We predicted that (i) dialogic inner speech – in contrast to a monologic control condition – would activate not only right-hemisphere language homologue regions such as right IFG, MTG and STG but also areas typically associated with ToM processing, such as the TPJ and (ii) any further differences between dialogic and monologic scenarios would overlap with networks associated with perspective switching and mental imagery, such as the MFG or the superior parietal lobule.
Inner speech as motor imagery of speech
Speech production is a complex motor action, involving the fine-grained coordination of more than 100 muscles in the upper part of the body . In adult humans, its covert counterpart (referred to as inner speech or verbal imagery) has developed to support a myriad of different functions.
In the same way as visual imagery permits to mentally examine visual scenes, verbal imagery can be used as an internal tool, allowing –amongst other things– to rehearse or to prepare past or future conversations [11, 14]. Because speech production results from sequences of motor commands which are assembled to reach a given goal, it belongs to the broader category of motor actions .
Therefore, a parallel can be drawn between verbal imagery and other forms of motor imagery (e.g., imagined walking or imagined writing). Accordingly, studies on the nature of inner speech might benefit from insights gained from the study of motor imagery and the field of motor cognition [34, 35].
Motor imagery can be defined as the mental process by which one rehearses a given action, without engaging in the physical movements involved in this particular action. One of the most influential theoretical accounts of this phenomenon is the motor simulation theory [34, 36, 37].
In this framework, the concept of simulation refers to the “offline rehearsal of neural networks involved in specific operations such as perceiving or acting” . The MST shares some similarities with the theories of embodied and grounded cognition  in that both account for motor imagery by appealing to a simulation mechanism.
However, the concept of simulation in grounded theories is assumed to operate in order to acquire specific conceptual knowledge , which is not the concern of the MST. In other words, we should make a distinction between embodiment of content, which concerns the semantic content of language, and embodiment of form, which concerns “the vehicle of thought”, that is, proper verbal production .
A second class of explanatory models of motor imagery are concerned with the phenomenon of emulation and with internal models . Internal model theories share the postulate that action control uses internal models, that is, systems that simulate the behaviour of the motor apparatus [42, 43].
The function of internal models is to estimate and anticipate the outcome of a motor command. Among the internal model theories, motor control models based on robotic principles [44, 45] assume two kinds of internal models (that are supposed to be coupled and regulated): a forward model (or simulator) that predicts the sensory consequences of motor commands from efference copies of the issued motor commands, and an inverse model (or controller) that calculates the feedforward motor commands from the desired sensory states [17, 41].
Emulation theories [46, 47] borrow from both simulation theories and internal model theories and provide operational details of the simulation mechanism. In the emulation model proposed by , the emulator is a device that implements the same input-output function as the body (i.e., the musculoskeletal system and relevant sensory systems).
When the emulator receives a copy of the control signal (which is also sent to the body), it produces an output signal (the emulator feedback), identical or similar to the feedback signal produced by the body, yielding mock sensory percepts (e.g., visual, auditory, kinaesthetic) during motor imagery.
By building upon models of speech motor control [45, 48], a recent model describes wilful (voluntary) expanded inner speech as “multimodal acts with multisensory percepts stemming from coarse multisensory goals” .
In other words, in this model the auditory and kinaesthetic sensations perceived during inner speech are assumed to be the predicted sensory consequences of simulated speech motor acts, emulated by internal forward models that use the efference copies of motor commands issued from an inverse model .
In this framework, the peripheral muscular activity recorded during inner speech production is assumed to be the result of partially inhibited motor commands. It should be noted that both simulation, emulation, and motor control frameworks can be grouped under the motor simulation view and altogether predict that the motor system should be involved to some extent during motor imagery, and by extension, during inner speech production. We now turn to a discussion of findings related to peripheral muscular activity during motor imagery and inner speech.
Electromyographic correlates of covert actions
Across both simulationist and emulationist frameworks, motor imagery has consistently been defined as the mental rehearsal of a motor action without any overt movement. One consequence of this claim is that, in order to prevent execution, the neural commands for muscular contractions should be blocked at some level of the motor system by active inhibitory mechanisms .
Despite these inhibitory mechanisms, there is abundant evidence for peripheral muscular activation during motor imagery [49–51]. As suggested by , the incomplete inhibition of the motor commands would provide a valid explanation to account for the peripheral muscular activity observed during motor imagery.
This idea has been corroborated by studies of changes in the excitability of the motor pathways during motor imagery tasks . For instance,  measured spinal reflexes while participants were instructed to either press a pedal with the foot or to simulate the same action mentally.
They observed that both H-reflexes and T-reflexes increased during motor imagery, and that these increases correlated with the force of the simulated pressure. Moreover, the pattern of results observed during motor imagery was similar (albeit weaker in amplitude) to that observed during execution, supporting the motor simulation view of motor imagery.
Using transcranial magnetic stimulation, several investigators observed muscle-specific increases of motor evoked potentials during various motor imagery tasks, whereas no such increase could be observed in antagonist muscles [54, 55].
When considered as a form of motor imagery, inner speech production is also expected to be accompanied with peripheral muscular activity in the speech muscles. This idea is supported by many studies showing peripheral muscular activation during inner speech production [10, 30, 31, 56–58], during auditory verbal hallucinations in patients with schizophrenia , or during induced mental rumination .
Some authors also recently demonstrated that it is possible to discriminate inner speech content based on surface electromyography (EMG) signals with a median 92% accuracy . However, other teams failed to obtain such results .
Many of these EMG studies concluded on the involvement of the speech motor system based on a difference in EMG amplitude by contrasting a period of inner speech production to a period of rest.
However, as highlighted by , it is usually not enough to show an increase of speech muscle activity during inner speech to conclude that this activation is related to inner speech production.
Indeed, three sorts of inference can be made based on the studies of electromyographic correlates of inner speech production, depending on the stringency of the control procedure. The stronger sort of inference is permitted by highlighting a discriminative pattern during covert speech production, as for instance when demonstrating a dissociation between different speech muscles during the production of speech sounds of different phonemic class (e.g, contrasting labial versus non-labial words).
According to , other (weaker) types of control procedures include i) comparing the EMG activity during covert speech production to a baseline period (without contrasting phonemic classes in covert speech utterances), or ii) comparing the activity of speech-related and non-speech related (e.g., forearm) muscle activity.
Ideally, these controls can be combined by recording and contrasting speech and non-speech related muscles in different conditions (e.g., rest, covert speech, overt speech) of pronunciation of different speech sounds classes (e.g., labial versus non-labial).
Previous research studies carried out using the preferred procedure recommended by  suggest a discriminative patterns of electromyographic correlates according to the phonemic class of the words being covertly uttered [30, 31], which would corroborate the motor simulation view of inner speech.
However, these studies used limited sample sizes (often less than ten participants) and worked mostly with children. These factors limit the generalisability of the above findings because i) low-powered experiments provide biased estimates of effects, ii) following the natural internalisation process, inner speech muscular correlates are expected to weaken with age and iii) a higher sensitivity could be attained by using modern sensors and signal processing methods.
The present study intends to bring new information to the debate between the motor simulation view and the abstraction view of inner speech, by focusing on an expanded form of inner speech: wilful nonword covert production.
This work can be seen as a replication and extension of previous works carried out by McGuigan and collaborators [30, 31]. We aimed to demonstrate similar dissociations by using surface electromyography recorded over the lip (orbicularis oris inferior, OOI) and the zygomaticus major (ZYG) muscles.
More precisely, given that rounded phonemes (such as /u/) are articulated with orbicular labial contraction, whereas spread phonemes (such as /i/) are produced with zygomaticus contraction, if the motor simulation view is correct, we should observe a higher average EMG amplitude recorded over the OOI during both the overt and inner production of rounded nonwords in comparison to spread nonwords.
Conversely, we would expect a lower average EMG amplitude recorded over the ZYG during both the inner and overt production of rounded nonwords in comparison to spread nonwords. In addition, we would not expect to observe content-specific differences in EMG amplitude concerning the non speech-related muscles (i.e., forehead and forearm muscles).
reference link : https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0233282
Development of Inner Speech
Studying the development of inner speech can give us important information about its phenomenological qualities and psychological functions.
Researching inner speech in childhood presents specific methodological challenges, including participants’ compliance with dual-task demands (e.g., articulatory suppression), limitations on the richness of child participants’ experience sampling reports, and age-related restrictions on neuroimaging.
Private Speech as a Precursor of Inner Speech
The methodological challenges that attend the study of inner speech have led to a focus on its observable developmental precursor, private speech, as a window onto its development.
Key questions that have been examined include the emergence and apparent extinction of private speech, the social context within which self-directed speech is observed, and the role of verbal mediation in supporting specific activities.
Much of the prior literature on private speech was outlined in an extensive review by Winsler (2009); accordingly, this section provides a brief overview of private speech findings in children, with reference to some more recent studies.
As noted above, private speech is an almost universal feature of young children’s development. It was first described by Piaget in the 1920s, who considered it as evidence of young children’s inability to adapt their communications to a listener (hence, his term egocentric speech). Private speech has subsequently been shown to have a significant functional role in the self-regulation of cognition and behavior.
Typically emerging with the development of expressive language skills around age 2–3, private speech frequently takes the form of an accompaniment to or commentary on an ongoing activity.
A regular occurrence between the ages of 3 and 8, private speech appears to follow a trajectory from overt task-irrelevant speech, to overt task-relevant speech (e.g., self-guiding comments spoken out loud), to external manifestations of inner speech (e.g., whispering, inaudible muttering; Berk, 1986; Winsler, Diaz, & Montero, 1997).
In line with Vygotsky’s theory, the occurrence of self-regulatory private speech is associated in some studies with task performance and task difficulty (e.g., Fernyhough & Fradley, 2005), and demonstrates some of the structural changes, such as abbreviation, hypothesized to attend internalization (Goudena, 1992).
There is evidence to support Vygotsky’s claim that self-regulatory speech “goes underground” in middle childhood to form inner speech, with private speech peaking in incidence around age 5 (Kohlberg, Yaeger, & Hjertholm, 1968) and then declining in parallel with a growth in inner speech use (Damianova, Lucas, & Sullivan, 2012) as defined by Fernyhough and Fradley’s (2005) criteria.
However, there is also evidence for continuing high levels of private speech use well into the elementary school years (Berk & Garvin, 1984; Berk & Potts, 1991) and indeed into adulthood (Duncan & Cheyne, 2001; Duncan & Tarulli, 2009).
Examples of continued use of private speech, however, do not necessarily indicate similar functions or benefits for performance: comparing verbal strategy use on cognitive tasks in children aged 5–17, Winsler and Naglieri (2003) showed that 5-year-olds but not older children performed better on tasks when they used more overt speech, even though private speech persisted well beyond this age.
Despite its proposed origins in social interaction (Furrow, 1992; Goudena, 1987), social influences on private speech have not been studied extensively in recent years. In one recent exception, McGonigle-Chalmers, Slater, and Smith (2014) studied the extent to which private speech use is moderated by the presence of another person in the room when 3- to 4-year-old children attempted a novel sorting task.
Out-loud commentaries – which typically narrated or explained what was happening during the task – were significantly more prevalent when another person was in the room, suggesting a social, declarative function of private speech.
Ratings were also made of incomplete or mumbled speech commentaries, which were suggestive of inner speech being used during the task, but notably these did not change significantly with the presence or absence of another person.
Thus, the production of overt private speech may be socially sensitive while inner speech or more covert processes retain a necessarily private and self-directed role.
These findings are in line with Vygotsky’s original observations that private speech depends on children’s understanding that they are in the presence of an interlocutor who can understand them, and are consistent with his view that private speech emerges through a differentiation of the social regulatory function of social speech, with speech that was previously used to regulate the behavior of others gradually becoming directed back at the self.
They are also congruent with Piaget’s (1959) interpretation of private speech as representing a failed attempt to communicate, and with Kohlberg, Yaeger, and Hjertholm’s (1968) characterization of private speech as a “parasocial” phenomenon.
The social relevance of private speech is also supported by recent research on imaginary companions in childhood. Davis, Meins, and Fernyhough (2013) studied private speech during free play and imaginary companion (IC) status in a large sample of 5-year-olds (n = 148).
Children with an IC used significantly more covert private speech during free play than those without an IC, a relation that was evident even when controlling for effects of socioeconomic status, receptive language skill, and total number of utterances.
Although a causal direction cannot be specified, these findings suggest that individual differences in creative and imaginative capacities are important to consider in gauging the developmental role of private speech.
Thus, while Vygotsky’s model of the developmental significance of self-directed speech has been well supported by empirical research, private speech may have functions that go beyond self-regulation of cognition and behavior.
Private speech appears to have a role in emotional expression and regulation (Atencio & Montero, 2009; Day & Smith, 2013), planning for communicative interaction (San Martin, Montero, Navarro, & Biglia, 2014), theory of mind (Fernyhough & Meins, 2009), self-discrimination (Fernyhough & Russell, 1997), fantasy (Olszewski, 1987), and creativity (White & Daugherty, 2009).
Engaging in private speech has also recently been proposed to have a role in the mediation of children’s autobiographical memory (Al-Namlah, Meins, & Fernyhough, 2012). It seems likely that private speech is a multifunctional phenomenon; comparisons with the functionality of its putative counterpart, inner speech, are considered below.
The Cognitive Functions of Inner Speech in Childhood
Children’s adoption of inner speech is evidenced relatively early in development in the apparent emergence of the phonological similarity effect around age 7 (Gathercole, 1998).
The effect is typically evidenced when visually presented items that are phonologically similar prove harder to recall than phonologically dissimilar items, due to interference between item words that sound the same.
When children are asked to learn a set of pictures, those aged 7 and over tend to exhibit a phonological similarity effect, suggesting that visual material is being recoded into a verbal form via subvocal rehearsal (i.e., inner speech).
Children younger than 7, in contrast, tend not to demonstrate this effect, suggesting an absence of verbal rehearsal strategies (Henry, Messer, Luger-Klein, & Crane, 2012).
This conclusion has recently been questioned by Jarrold and Citroen (2013) who argue that the apparent emergence of the phonological similarity effect at age 7 does not necessarily reflect a qualitative change in strategy.
In a study of 5- to 9-year-old children, they tested recall for verbally versus visually presented items, while also varying the mode of response (verbal or visual reporting), to examine whether verbal recoding of visually presented items specifically showed a change with age.
While visual encoding plus verbal reporting demonstrated the most prominent phonological similarity effect, interactions between age and similarity were evident in each condition; that is, even when verbally recoded rehearsal was not specifically required.
In addition, a simulation model indicated that the lack of an effect in younger children could be explained by floor effects in recall for other, dissimilar items to be remembered.
Thus, evidence of phonological similarity effects may emerge around age 7 not because of an adoption of rehearsal strategies at this time, but as a result of gradual changes in overall recall skill.
Jarrold and Citroen’s finding does not undermine the idea that children may generally tend to utilize verbal rehearsal more with age, but suggests that the presence or absence of a phonological similarity effect should not be taken to indicate a specific, qualitative shift in children’s inner speech strategies (see also Jarrold & Tam, 2010).
Moreover, it highlights the need (also considered by Al-Namlah et al., 2006) to evaluate children’s use of verbal strategies in the context of their other skills, such as STM capacity.
Whether or not children’s use of inner speech undergoes a qualitative change in early to middle childhood, there is good evidence to suggest that it plays an increasingly prominent role in supporting cognitive operations in this developmental period.
Most of the work in this area has concerned the role of verbal strategies in supporting complex executive functions such as cognitive flexibility and planning. Concerning the former, the ability to represent linguistic rules to guide and support flexible behavior has been proposed as a core part of executive functioning development during childhood (Zelazo, Craik, & Booth, 2004; Zelazo et al., 2003).
In general, younger children (3- to 5-year-olds) will struggle with tasks requiring a switch between two different response rules, whereas older children will not.
Evidence to suggest that this involves verbal processes is provided by reductions in performance on such tasks under articulatory suppression (e.g., Fatzer & Roebers, 2012) and improvements in performance when participants are encouraged to use verbal cues (Kray, Gaspard, Karbach, & Blaye, 2013).
Younger but not older children appear to benefit from the prompt to use verbal labels, both on switching tasks (Kray, Eber, & Karbach, 2008) and in other contexts (see Müller, Jacques, Brocki, & Zelazo, 2009, for a review), suggesting a lack of spontaneous inner speech use at younger ages.
What exactly inner speech is doing to support performance in this way is not always clear: in a review of child and adult switching studies, Cragg and Nation (2010) noted that verbalized strategies speed up performance on switch and nonswitch trials but do not necessarily facilitate the act of switching itself.
If so, this would suggest that inner speech is helping to maintain a specific response set, or is acting as a reminder of task and response order, rather than being involved in flexible responding per se.
In any case, use of inner speech appears to become a key strategy in switching tasks during childhood, and there is evidence of this strategic use continuing into adulthood (e.g., Emerson & Miyake, 2003, see Cognitive Functions of Inner Speech in Adulthood).
Research on planning and verbal strategies in childhood has almost exclusively been conducted using tower tasks, such as the Tower of London task (Shallice, 1982) or the very similar Tower of Hanoi puzzle.
As noted previously, tower tasks require participants to move a set of rings or disks from one arrangement to another across three columns. Although fundamentally a visuospatial problem, the number of possible moves to a solution creates a problem-space bigger than visuospatial working memory capacity will typically allow, meaning that verbal strategies often come into play.
Private speech on such tasks has been observed to increase in relation to task difficulty in children (Fernyhough & Fradley, 2005) and correlates with other indicators of verbal strategy use, such as susceptibility to the phonological similarity effect on STM tasks (Al-Namlah et al., 2006).
Concerning inner speech specifically, Lidstone, Meins, and Fernyhough (2010) compared Tower of London performance in children under articulatory suppression, foot-tapping, and normal conditions. Performance (as indicated by percentage of correct trials) was selectively impaired during articulatory suppression, and the size of the performance decrement correlated with private speech use in the control condition, although this was only evident when participants were specifically instructed to plan ahead.
Effects of articulatory suppression on Tower of London performance have also been reported in the control groups of typically developing children in studies on autism (e.g., Wallace, Silvers, Martin, & Kenworthy, 2009), but these effects have not always been clearly separable from other dual-task demands (Holland & Low, 2010).
The apparent use of verbal strategies in recall, switching, and planning tasks, and correlations among them (e.g., Al-Namlah et al., 2006), are suggestive of a domain-general shift to verbal mediation in early childhood, affecting processes as different as STM and problem-solving.
However, it seems likely that inner speech use across domains may still follow separable trajectories and be guided by the specific demands of each task. The data from studies of cognitive flexibility and other executive domains suggest that, even within a given task, inner speech will only be a useful strategy in particular conditions: naming stimuli, for example, appears to speed up response execution, but naming the response required (e.g., stop or go) does not (Kray, Kipp, & Karbach, 2009).
There is also still a relative lack of research comparing strategy use across multiple domains. In one recent exception, Fatzer and Roebers (2012) observed strong effects of articulatory suppression on complex memory span (i.e., working memory), medium effects on a measure of cognitive flexibility, and little effect on a test of selective attention.
If these processes are taken to follow separate rates of maturation, it seems likely be that inner speech offers a domain-general tool that is only selectively deployed when it is most relevant and beneficial to the executive functioning process at hand.
How do Children Experience Inner Speech?
Asking people to reflect on the subjective qualities of their inner experience is fraught with difficulties, and the challenges are arguably more acute when working with children. Some attempts have been made to use experience sampling methods with children, although they have not to date focused on inner speech.
For example, Hurlburt (Hurlburt & Schwitzgebel, 2007, p. 111, Box 5.8) used DES with a 9-year-old boy, who noted that the construction of a mental image (of a hole in his backyard containing some toys) took a considerable amount of time to complete.
Complex or multipart images are known to take longer to generate than simple images (Hubbard & Stoeckig, 1988; Kosslyn et al., 1983), and this may particularly be the case for visual imagery in children.
If this were to apply also to inner speech, it suggests that the phenomenology of verbal thinking in children may lack a certain richness and complexity. In a series of studies, Flavell and colleagues (e.g., Flavell, Flavell, & Green, 2001; Flavell, Green, & Flavell, 1993) also found limited understanding of inner experience (such as of the ongoing stream of consciousness assumed to characterize many people’s experience) in preschool children.
This can be interpreted either in terms of young children’s weak introspective abilities (Flavell et al., 1993) or in terms of young children lacking adult-like inner speech, as a result of the time it takes to become internalized (Fernyhough, 2009b).
Children’s reluctance to report on inner speech, coupled with their apparent lack of awareness of it, should not necessarily be taken as indicating that they do not experience it in any form.
The suggestion of links between private speech and various imaginative and creative activities, such as engaging with an imaginary companion (Davis et al., 2013), also raises the interesting question of whether inner speech plays a similar role in the inner experience of young children.
The development of better methods to investigate inner speech phenomenology in children is needed to begin to answer this and related questions.
Do We Really Need Inner Speech?
A final question, again prompted by phenomenological investigation of inner speech, is whether we overestimate its presence and relevance. As Hurlburt et al. (2013) note, presuppositions about the ubiquity of inner speech may limit the accuracy of efforts to report on its incidence.
Introspective methods such as DES tend to result in lower incidence ratings than self-report measures. Alderson-Day and Fernyhough (2014) argue that DES may underestimate the incidence of inner speech for various reasons, including that the DES method may not be sensitive to transformations such as condensation (although see Hurlburt & Heavey, 2015).
There may be further, more profound reasons why differing assessments of inner experience can lead to such divergent characterizations of the phenomena. Hurlburt and Heavey (2015) argue that instruments such as the VISQ offer at best a self-theoretical description of any one participant’s inner experience.
Based on their observations of participants’ first DES sessions, they propose that (at least until participants become appropriately skilled through engagement in an iterative process like DES) people are frequently misguided about their own experience (Hurlburt et al., 2013).
Although it seems counterintuitive to suggest that individuals can be wrong about their own experience (cf. Jack, 2004), the question of how training in reporting on one’s own inner experience might increase the accuracy of self-reports of inner speech remains an intriguing one for future research.
Whether or not Hurlburt is correct, inner speech would certainly appear important to many people’s subjective views of their own experience. Evidence from bilingualism points to inner speech in first and second languages being associated strongly with personal identity and history (de Guerrero, 2005).
Correspondingly, loss of inner speech following brain injury, perhaps through its influence on the self-narration that typically accompanies everyday experience, may lead to the diminution of a sense of self (Morin, 2009b).
Evidence from cognitive studies also points to a prominent role for inner speech in a diverse range of functions, particularly in childhood. In adulthood, the cognitive benefit of verbalized strategies may wane or be superceded but, for many individuals, the importance of inner speech as a private activity at the core of experience would seem to remain.