The opinion that affect recognition should be banned from important decisions sounds like an angry cry…but what does it all mean?
Talk is heating up, actually, about artificial intelligence’s impact on our daily lives in ways that cause as much worries as wonder.
“Affect recognition.” In tech parlance represents a subset of facial recognition. Affect recognition is all about emotional AI, and it is about artificial intelligence put to use to analyze expressions with the aim of identifying human emotion.
Interpreting the expressions on your face?
How sound are those interpretations?
At a New York University research center, a report reminds its readers that this is not the best way to understand how people feel..
The report’s view is that, plain and simple, emotion-detecting AI should not be readily assumed to be able to make important calls on situations that can have serious impact on people: in recruitment, in monitoring students in the classroom, in customer services and last, but hardly least, in criminal justice.
There was a need to scrutinize why entities are using faulty technology to make assessments about character on the basis of physical appearance in the first place.
This is particularly concerning in contexts such as employment, education, and criminal justice.
The AI Now Institute at New York University issued the AI Now 2019 Report. The institute’s focus is on the social implications of artificial intelligence. The institute notes that AI systems should have appropriate safeguards or accountability structures in place, and the institute sounds concerns where this may not be the case.
Their 2019 report looks at the business use of expression analysis as it currently stands in making decisions.
Reuters pointed out that this was AI Now’s fourth annual report on AI tools. The assessment examines risks of potentially harmful AI technology and its human impact.
Turning to The Institute report said affect recognition has been “a particular focus of growing concern in 2019 – not only because it can encode biases, but because it lacks any solid scientific foundation to ensure accurate or even valid results.”
The report had strong wording: “Regulators should ban the use of affect recognition in important decisions that impact people’s lives and access to opportunities. Until then, AI companies should stop deploying it.”
The authors are not just indulging in personal; opinion; they reviewed research.
“Given the contested scientific foundations of affect recognition technology – a subclass of facial recognition that claims to detect things such as personality, emotions, mental health, and other interior states – it should not be allowed to play a role in important decisions about human lives, such as who is interviewed or hired for a job, the price of insurance, patient pain assessments, or student performance in school.”
The report went even further and said that governments should “specifically prohibit use of affect recognition in high-stakes decision-making processes.”
The Verge‘s James Vincent would not be surprised over this finding.
Back in July, he reported on research that looked at failings of technology to accurately read emotions through facial expressions; simply put, you cannot trust AI to do so. He quoted a professor of psychology at Northeastern University.
“Companies can say whatever they want, but the data are clear.”
Vincent reported back then on a review of the literature commissioned by the Association for Psychological Science, and five scientists scrutinized the evidence: “Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements.”. Vincent said “It took them two years to examine the data, with the review looking at more than 1,000 different studies.”
Since emotions are expressed in a huge variety of ways, it is difficult to reliably infer how someone feels from a simple set of facial movements.
The authors said that tech companies may well be asking a question that is fundamentally wrong. Efforts to read out people’s internal states from facial movements without considering various aspects of context were at best incomplete and at worst lacked validity.
While the report called for a ban, it might be fair to consider the concern is against the naive level of confidence in a technology still in need of improvement. The field of emotional analysis needs to do better.
According to The Verge article, a professor of psychology at Northeastern University believed that perhaps the most important takeaway from the review was that “we need to think about emotions in a more complex fashion.”
Leo Kelton, BBC News, meanwhile, relayed the viewpoint of AI Now co-founder Prof. Kate Crawford, who said studies had demonstrated considerable variability in terms of the number of emotional states and the way that people expressed them.
Reuters reported on its conference call ahead of the report’s release: “AI Now founders Kate Crawford and Meredith Whittaker said that damaging uses of AI are multiplying despite broad consensus on ethical principles because there are no consequences for violating them.” The current report said that AI-enabled affect recognition continued to be deployed at scale across environments from classrooms to job interviews. It was informing determinations about who is productive but often without people’s knowledge.
The AI Now report carried specific examples of companies doing business in emotion detecting products. One such company is selling video-analytics cameras that classify faces as feeling anger, fear, and sadness, sold to casinos, restaurants, retail merchants, real estate brokers, and the hospitality industry,.
Another example was a company with AI-driven video-based tools to recommend which candidates a company should interview. The algorithms were designed to detect emotional engagement in applicants’ micro-expressions.
The report included a company creating headbands that purport to detect and quantify students’ attention levels through brain-activity detection. (The AI report did not ignore to add that studies “outline significant risks associated with the deployment of emotional AI in the classroom.”)
Faces are a ubiquitous part of everyday life for humans. People greet each other with smiles or nods. They have face-to-face conversations on a daily basis, whether in person or via computers. They capture faces with smartphones and tablets, exchanging photos of themselves and of each other on Instagram, Snapchat, and other social-media platforms. The ability to perceive faces is one of the first capacities to emerge after birth: An infant begins to perceive faces within the first few days of life, equipped with a preference for face-like arrangements that allows the brain to wire itself, with experience, to become expert at perceiving faces (Arcaro, Schade, Vincent, Ponce, & Livingstone, 2017; Cassia, Turati, & Simion, 2004; Gandhi, Singh, Swami, Ganesh, & Sinhaet, 2017; Grossmann, 2015; L. B. Smith, Jayaraman, Clerkin, & Yu, 2018; Turati, 2004; but see Young and Burton, 2018, for a more qualified claim). Faces offer a rich, salient source of information for navigating the social world: They play a role in deciding whom to love, whom to trust, whom to help, and who is found guilty of a crime (Todorov, 2017; Zebrowitz, 1997, 2017; Zhang, Chen, & Yang, 2018). Beginning with the ancient Greeks (Aristotle, in the 4th century BCE) and Romans (Cicero), various cultures have viewed the human face as a window on the mind. But to what extent can a raised eyebrow, a curled lip, or a narrowed eye reveal what someone is thinking or feeling, allowing a perceiver’s brain to guess what that someone will do next?1 The answers to these questions have major consequences for human outcomes as they unfold in the living room, the classroom, the courtroom, and even on the battlefield. They also powerfully shape the direction of research in a broad array of scientific fields, from basic neuroscience to psychiatry.
Understanding what facial movements might reveal about a person’s emotions is made more urgent by the fact that many people believe they already know. Specific configurations of facial-muscle movements2 appear as if they summarily broadcast or display a person’s emotions, which is why they are routinely referred to as emotional expressions and facial expressions. A simple Google search for the phrase “emotional facial expressions” (see Box 1 in the Supplemental Material available online) reveals the ubiquity with which, at least in certain parts of the world, people believe that certain emotion categories are reliably signaled or revealed by certain facial-muscle movement configurations—a set of beliefs we refer to as the common view (also called the classical view; L. F. Barrett, 2017b). Likewise, many cultural products testify to the common view. Here are several examples:
- Technology companies are investing tremendous resources to figure out how to objectively “read” emotions in people by detecting their presumed facial expressions, such as scowling faces, frowning faces, and smiling faces, in an automated fashion. Several companies claim to have already done it (e.g., Affectiva.com, 2018; Microsoft Azure, 2018). For example, Microsoft’s Emotion API promises to take video images of a person’s face to detect what that individual is feeling. Microsoft’s website states that its software “integrates emotion recognition, returning the confidence across a set of emotions . . . such as anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise. These emotions are understood to be cross-culturally and universally communicated with particular facial expressions” (screen 3).
- Countless electronic messages are annotated with emojis or emoticons that are schematized versions of the proposed facial expressions for various emotion categories (Emojipedia.org, 2019).
- Putative emotional expressions are taught to preschool children by displaying scowling faces, frowning faces, smiling faces, and so on, in posters (e.g., use “feeling chart for children” in a Google image search), games (e.g., Miniland emotion games; Miniland Group, 2019), books (e.g., Cain, 2000; T. Parr, 2005), and episodes of Sesame Street (among many examples, see Morenoff, 2014; Pliskin, 2015; Valentine & Lehmann, 2015).3
- Television shows (e.g., Lie to Me; Baum & Grazer, 2009), movies (e.g., Inside Out; Docter, Del Carmen, LeFauve, Cooley, and Lassetter, 2015), and documentaries (e.g., The Human Face, produced by the British Broadcasting Company; Cleese, Erskine, & Stewart, 2001) customarily depict certain facial configurations as universal expressions of emotions.
- Magazine and newspaper articles routinely feature stories in kind: facial configurations depicting a scowl are referred to as “expressions of anger,” facial configurations depicting a smile are referred to as “expressions of happiness,” facial configurations depicting a frown are referred to as “expressions of sadness,” and so on.
- Agents of the U.S. Federal Bureau of Investigation (FBI) and the Transportation Security Administration (TSA) were trained to detect emotions and other intentions using these facial configurations, with the goal of identifying and thwarting terrorists (R. Heilig, special agent with the FBI, personal communication, December 15, 2014; L. F. Barrett, 2017c).4
- The facial configurations that supposedly diagnose emotional states also figure prominently in the diagnosis and treatment of psychiatric disorders. One of the most widely used tasks in autism research, the Reading the Mind in the Eyes Test, asks test takers to match photos of the upper (eye) region of a posed facial configuration with specific mental state words, including emotion words (Baron-Cohen, Wheelwright, Hill, Raste, & Plumb, 2001). Treatment plans for people living with autism and other brain disorders often include learning to recognize these facial configurations as emotional expressions (Baron-Cohen, Golan, Wheelwright, & Hill, 2004; Kouo & Egel, 2016). This training does not generalize well to real-world skills, however (Berggren et al., 2018; Kouo & Egel, 2016).
- “Reading” the emotions of a defendant—in the words of Supreme Court Justice Anthony Kennedy, to “know the heart and mind of the offender” (Riggins v. Nevada, 1992, p. 142)—is one pillar of a fair trial in the U.S. legal system and in many legal systems in the Western world. Legal actors such as jurors and judges routinely rely on facial movements to determine the guilt and remorse of a defendant (e.g., Bandes, 2014; Zebrowitz, 1997). For example, defendants who are perceived as untrustworthy receive harsher sentences than they otherwise would (J. P. Wilson & Rule, 2015, 2016), and such perceptions are more likely when a person appears to be angry (i.e., the person’s facial structure looks similar to the hypothesized facial expression of anger, which is a scowl; Todorov, 2017). An incorrect inference about defendants’ emotional state can cost them their children, their freedom, or even their lives (for recent examples, see L. F. Barrett, 2017b, beginning on page 183).
But can a person’s emotional state be reasonably inferred from that person’s facial movements? In this article, we offer a systematic review of the evidence, testing the common view that instances of an emotion category are signaled with a distinctive configuration of facial movements that has enough reliability and specificity to serve as a diagnostic marker of those instances. We focus our review on evidence pertaining to six emotion categories that have received the lion’s share of attention in scientific research—anger, disgust, fear, happiness, sadness, and surprise—and that, correspondingly, are the focus of the common view (as evidenced by our Google search, summarized in Box 1 in the Supplemental Material). Our conclusions apply, however, to all emotion categories that have thus far been scientifically studied. We open the article with a brief discussion of its scope, approach, and intended audience. We then summarize evidence on how people actually move their faces during episodes of emotion, referred to as studies of expression production, following which we examine evidence on which emotions are actually inferred from looking at facial movements, referred to as studies of emotion perception. We identify three key shortcomings in the scientific research that have contributed to a general misunderstanding about how emotions are expressed and perceived in facial movements and that limit the translation of this scientific evidence for other uses:
- Limited reliability (i.e., instances of the same emotion category are neither reliably expressed through nor perceived from a common set of facial movements).
- Lack of specificity (i.e., there is no unique mapping between a configuration of facial movements and instances of an emotion category).
- Limited generalizability (i.e., the effects of context and culture have not been sufficiently documented and accounted for).
We then discuss our conclusions, followed by proposals for consumers on how they might use the existing scientific literature. We also provide recommendations for future research on emotion production and perception with consumers of that research in mind. We have included additional detail on some topics of import or interest in the Supplemental Material.
Studies of healthy adults from the United States and other developed nations
We now review the scientific evidence from studies that document how people spontaneously move their facial muscles during instances of anger, disgust, fear, happiness, sadness, and surprise, as well as how they pose their faces when asked to indicate how they express each emotion category. We examine evidence gathered in the lab and in naturalistic settings, sampling healthy adults who live in a variety of cultural contexts. To evaluate the reliability, specificity, and generalizability of the scientific findings, we adapted criteria set out by Haidt and Keltner (1999), as discussed in Table 2.
Spontaneous facial movements in laboratory studies
A meta-analysis was recently conducted to test the hypothesis that the facial configurations in Figure 4 co-occur, as hypothesized, with the instances of specific emotion categories (Duran, Reisenzein, & Fernández-Dols, 2017). Thirty-seven published articles reported on how people moved their faces when exposed to objects or events that evoke emotion. Most studies included in the meta-analysis were conducted in the laboratory. The findings from these experiments were statistically summarized to assess the reliability of facial movements as expressions of emotion (see Fig. 6). In all emotion categories tested, other than fear, participants moved their facial muscles into the expected configuration more reliably than what we would expect by chance. Reliability levels were weak, however, indicating that the proposed facial configurations in Figure 4 have limited reliability (and to some extent, limited generalizability; i.e., a scowling facial configuration is an expression of anger, but not the expression of anger). More often than not, people moved their faces in ways that were not consistent with the hypotheses of the common view. An expanded version of this meta-analysis (Duran & Fernández-Dols, 2018) analyzed 131 effect sizes from 76 studies totaling 4,487 participants, with similar results: The hypothesized facial configurations were observed with average effect sizes (r) of .31 for the correlation between the intensity of a facial configuration and a measure of anger, disgust, fear, happiness, sadness, or surprise (corresponding to weak evidence of reliability; individual correlations for specific emotion categories ranged from .06 to .45, interpreted as no evidence of reliability to moderate evidence of reliability). The average proportion of the times that a facial configuration was observed during an emotional event (in one of those categories) was .22 (proportions for specific emotion categories ranged from .11 to .35, interpreted as no evidence to weak evidence of reliability).19
Fig. 6. Meta-analysis of facial movements during emotional episodes: a summary of effect sizes across studies (Duran, Reisenzein, & Fernández-Dols, 2017). Effect sizes are computed as correlations or proportions (as reported in the original experiments). Results include experiments that reported a correspondence between a facial configuration and its hypothesized emotion category as well as those that reported a correspondence between individual AUs of that facial configuration and the relevant emotion category; meta-analytic effect sizes that summarized only the effects for entire ensembles of AUs (the facial configurations specified in Fig. 4) were even lower than those reported here.
No overall assessment of specificity was reported in either the original or the expanded meta-analysis because most published studies do not report the false-positive rate (i.e., the frequency with which a facial AU is observed when an instance of the hypothesized emotion category was not present; see Fig. 3). Nonetheless, some striking examples of specificity failures have been documented in the scientific literature. For example, a certain smile, called a Duchenne smile, is defined in terms of facial muscle contractions (i.e., in terms of facial morphology): It involves movement of the orbicularis oculi, which raises the cheeks and causes wrinkles at the outer corners of the eyes, in addition to movement of the zygomaticus major, which raises the corners of the lips into a smile. A Duchenne smile is thought to be a spontaneous expression of authentic happiness. Research shows, however, that a Duchenne smile can be intentionally produced when people are not happy (Gunnery & Hall, 2014; Gunnery, Hall, & Ruben, 2013; also see Krumhuber & Manstead, 2009), consistent with evidence that Duchenne smiles often occur when people are signaling submission or affiliation rather than reflecting happiness (Rychlowska et al., 2017).
Spontaneous facial movements in naturalistic settings
Studies of facial configuration–emotion category associations in naturalistic settings tend to yield results similar to those from studies that were conducted in more controlled laboratory settings (Fernández-Dols, 2017; Fernández-Dols & Crivelli, 2013). Some studies observe that people express emotions in real-world settings by spontaneously making the facial muscle movements proposed in Figure 4, but such observations are generally not replicable across studies (e.g., cf. Matsumoto & Willingham, 2006 and Crivelli, Carrera, & Fernández-Dols, 2015; cf. Rosenberg & Ekman, 1994 and Fernández-Dols, Sanchez, Carrera, & Ruiz-Belda, 1997). For example, two field studies of winning judo fighters recently demonstrated that so-called Duchenne smiles were better predicted by whether an athlete was interacting with an audience than the degree of happiness reported after winning their matches (Crivelli et al., 2015). Only 8 of the 55 winning fighters produced a Duchenne smile in Study 1; all occurred during a social interaction. Only 25 of 119 winning fighters produced a Duchenne smile in Study 2, documenting, at best, weak evidence for reliability.
Posed facial movements
Another source of evidence comes from asking participants sampled from various cultures to deliberately pose the facial configurations that they believe they use to express emotions. In these studies, participants are given a single emotion word or a single, brief statement to describe each emotion category and are then asked to freely pose the facial configuration that they believe they make when expressing that emotion. Such research directly examines common beliefs about emotional expressions. For example, one study provided college students from Canada and Gabon (in Central Africa) with dictionary definitions for 10 emotion categories. After practicing in front of a mirror, participants posed the facial configurations so that “their friends would be able to understand easily what they feel” (Elfenbein, Beaupre, Levesque, & Hess, 2007, p. 134) and their poses were FACS coded. Likewise, a recent study asked college students in China, India, Japan, Korea, and the United States to pose the facial movements they believe they make when expressing each of 22 emotion categories (Cordaro et al., 2018). Participants heard a brief scenario describing an event that might cause anger (“You have been insulted, and you are very angry about it”) and then were instructed to pose a facial (and nonverbal but vocal) expression of emotion, as if the events in the scenario were happening to them. Experimenters were present in the testing room as participants posed their responses. Both studies found moderate to strong evidence that participants across cultures share common beliefs about the expressive pose for anger, fear, and surprise categories; there was weak to moderate evidence for the happiness category, and weak evidence for the disgust and sadness categories (Fig. 7). Cultural variation in participants’ beliefs about emotional expressions was also observed.
Fig. 7. Comparing posed and spontaneous facial movements. Correlations or proportions are presented for anger, disgust, fear, happiness, sadness, and surprise, separately for three studies. Data are from Table 6 in Cordaro et al. (2018), from Elfenbein, Beaupre, Levesque, and Hess (2007; reliability for the anger category is for AU4 + AU5 only), and from Duran, Reisenzein, and Fernández-Dols (2017; proportion data only).
Neither study compared participants’ posed expressions (their beliefs about how they move their facial muscles to express emotions) with observations of how they actually moved their faces when expressing emotion. Nonetheless, a quick comparison of the findings from the two studies and the proportions of spontaneous facial movements made during emotional events (from the Duran et al., 2017 meta-analysis) makes it clear that posed and spontaneous movements differ, sometimes quite substantially (again, see Fig. 7). When people pose a facial configuration that they believe expresses an emotion category, they make facial movements that more reliably agree with the hypothesized facial configurations in Figure 4.
The same cannot be said of people’s spontaneous facial movements during actual emotional episodes, however (for convergent evidence, see Motley & Camden, 1988; Namba, Makihara, Kabir, Miyatani, & Nakao, 2016). One possible interpretation of these findings is that posed and spontaneous facial-muscle configurations correspond to distinct communication systems. Indeed, there is some evidence that volitional and involuntary facial movements are controlled by different neural circuits (Rinn, 1984).
Another factor that may contribute to the discrepancy between posed and spontaneous facial movements is that people’s beliefs about their own behavior often reflect their stereotypes and do not necessarily correspond to how they actually behave in real life (see Robinson & Clore, 2002). Indeed, if people’s beliefs, as measured by their facial poses, are influenced directly by the common view, then any observed relationship between posed facial expressions and hypothesized emotion categories is merely evidence of the beliefs themselves.
More information: Report: ainowinstitute.org/AI_Now_2019_Report.pdf
Lisa Feldman Barrett et al. Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements, Psychological Science in the Public Interest (2019). DOI: 10.1177/1529100619832930