A team of researchers at the University of North Carolina at Chapel Hill and the University of Maryland at College Park has recently developed a new deep learning model that can identify people’s emotions based on their walking styles.
Their approach, outlined in a paper pre-published on arXiv, works by extracting an individual’s gait from an RGB video of him/her walking, then analyzing it and classifying it as one of four emotions: happy, sad, angry or neutral.
“Emotions play a significant role in our lives, defining our experiences, and shaping how we view the world and interact with other humans,” Tanmay Randhavane, one of the primary researchers and a graduate student at UNC, told TechXplore.
“Perceiving the emotions of other people helps us understand their behavior and decide our actions toward them.
For example, people communicate very differently with someone they perceive to be angry and hostile than they do with someone they perceive to be calm and contented.”

Most existing emotion recognition and identification tools work by analyzing facial expressions or voice recordings. However, past studies suggest that body language (e.g., posture, movements, etc.) can also say a lot about how someone is feeling.
Inspired by these observations, the researchers set out to develop a tool that can automatically identify the perceived emotion of individuals based on their walking style.
“The main advantage of our perceived emotion recognition approach is that it combines two different techniques,” Randhavane said.
“In addition to using deep learning, our approach also leverages the findings of psychological studies.
A combination of both these techniques gives us an advantage over the other methods.”
The approach first extracts a person’s walking gait from an RGB video of them walking, representing it as a series of 3-D poses.
Subsequently, the researchers used a long short-term memory (LSTM) recurrent neural network and a random forest (RF) classifier to analyze these poses and identify the most prominent emotion felt by the person in the video, choosing between happiness, sadness, anger or neutral.
The LSTM is initially trained on a series of deep features, but these are later combined with affective features computed from the gaits using posture and movement cues.
All of these features are ultimately classified using the RF classifier.
Randhavane and his colleagues carried out a series of preliminary tests on a dataset containing videos of people walking and found that their model could identify the perceived emotions of individuals with 80 percent accuracy.
In addition, their approach led to an improvement of approximately 14 percent over other perceived emotion recognition methods that focus on people’s walking style.
“Though we do not make any claims about the actual emotions a person is experiencing, our approach can provide an estimate of the perceived emotion of that walking style,” Aniket Bera, a Research Professor in the Computer Science department, supervising the research, told TechXplore.
“There are many applications for this research, ranging from better human perception for robots and autonomous vehicles to improved surveillance to creating more engaging experiences in augmented and virtual reality.”
Along with Tanmay Randhavane and Aniket Bera, the research team behind this study includes Dinesh Manocha and Uttaran Bhattacharya at the University of Maryland at College Park, as well as Kurt Gray and Kyra Kapsaskis from the psychology department of the University of North Carolina at Chapel Hill.
To train their deep learning model, the researchers have also compiled a new dataset called Emotion Walk (EWalk), which contains videos of individuals walking in both indoor and outdoor settings labeled with perceived emotions.
In the future, this dataset could be used by other teams to develop and train new emotion recognition tools designed to analyze movement, posture, and/or gait.
“Our research is at a very primitive stage,” Bera said.
“We want to explore different aspects of the body language and look at more cues such as facial expressions, speech, vocal patterns, etc., and use a multi-modal approach to combine all these cues with gaits.
Currently, we assume that the walking motion is natural and does not involve any accessories (e.g., suitcase, mobile phones, etc.).
As part of future work, we would like to collect more data and train our deep-learning model better.
We will also attempt to extend our methodology to consider more activities such as running, gesturing, etc.”
According to Bera, perceived emotion recognition tools could soon help to develop robots with more advanced navigation, planning, and interaction skills.
In addition, models such as theirs could be used to detect anomalous behaviors or walking patterns from videos or CCTV footage, for instance identifying individuals who are at risk of suicide and alerting authorities or healthcare providers.
Their model could also be applied in the VFX and animation industry, where it could assist designers and animators in creating virtual characters that effectively express particular emotions.
Emotion is the mental experience with high intensity and high hedonic content (pleasure/displeasure) (Cabanac, 2002), which deeply affects our daily behaviors by regulating individual’s motivation (Lang, Bradley & Cuthbert, 1998), social interaction (Lopes et al., 2005) and cognitive processes (Forgas, 1995).
Recognizing other’s emotion and responding adaptively to it is a basis of effective social interaction (Salovey & Mayer, 1990), and since users tend to regard computers as social agents (Pantic & Rothkrantz, 2003), they also expect their affective state being sensed and taken into account while interacting with computers.
As the importance of emotional intelligence for successful inter-personal interaction, the computer’s capability of recognizing automatically and responding appropriately to the user’s affective feedback had been confirmed as a crucial facet for natural, efficacious, persuasive and trustworthy human–computer interaction (Cowie et al., 2001; Pantic & Rothkrantz, 2003; Hudlicka, 2003).
The possible applications of such an emotion-sensitive system are numerous, including automatic customer services (Fragopanagos & Taylor, 2005), interactive games (Barakova & Lourens, 2010) and smart homes (Silva, Morikawa & Petra, 2012), etc.
Although automated emotion recognition is a very challenging task, the development of this technology would be of great value.
As the common use of multiple modalities to recognizing emotional states in human-human interaction, various clues have been used in affective computing, such as facial expressions (e.g., Kenji, 1991), gestures (e.g., Glowinski et al., 2008), physiological signals (e.g., Picard, Vyzas & Healey, 2001), linguistic information (e.g., Alm, Roth & Sproat, 2005) and acoustic features (e.g., Dellaert, Polzin & Waibel, 1996).
Beyond those, gait is another modality with great potential. As a most common daily behavior which is easily observed, the body motion and style of walking have been found by psychologists to reflect the walker’s emotional states.
Human observers were able to identify different emotions from gait such as the amount of arm swing, stride length and heavy-footedness (Montepare, Goldstein & Clausen, 1987).
Even when the gait information was minimized by use of point-light displays, which meant to represent the body motion by only a small number illuminated dots, observers still could make judgments of emotion category and intensity (Atkinson et al., 2004).
The attribution of the features of gaits and other body languages to the recognition of specific affective states had been summarized in a review (Kleinsmith & Bianchi-Berthouze, 2013).
In recent years, gait information has already been used in affective computing. Janssen et al. (2008) reported emotion recognition using artificial neural nets in human gait by means of kinetic data collected by force platform and kinematic data captured by motion capture system.
With the help of marker-based motion tracking system, researchers developed computing methods to recognize emotions from gait in inter-individual (comparable to recognizing the affective state of an unknown walker) as well as person-dependent status (comparable to recognizing the affective state of a familiar walker) (Karg et al., 2009a; Karg et al., 2009b; Karg, Kuhnlenz & Buss, 2010).
These gait information recording technologies had already made it possible to automatically recognize the emotional state of a walker, however, because of the high cost of trained person, technical equipment and maintenance (Loreen, Markus & Bernhard (2013)), the application of these non-portable systems were seriously limited.
The Microsoft Kinect is a low-cost, portable, camera-based sensor system, with the official software development kit (SDK) (Gaukrodger et al., 2013; Stone et al., 2015; Clark et al., 2013).
As a marker-free motion capture system, Kinect could continuously monitor three-dimensional body movement patterns, and is a practical option to develop an inexpensive, widely available motion recognition system in human daily walks.
The validity of Kinect has been proved in the studies of gesture and motion recognition. Kondori et al. (2011) identified head pose using Kinect, Fern’ndez-Baena, Susin & Lligadas (2012) found it perform well in tracking simple stepping movements, and Auvinet et al. (2015) successfully detected gait cycles in treadmill by Kinect.
In Weber et al.’s (2012) report, the accuracy and sensitivity of kinematic measurements obtained from Kinect, such as reaching distance, joint angles, and spatial–temporal gait parameters, were estimated and found to be comparable to gold standard marker-based motion capture systems like Vicon.
On the other hand, recently it has also been reported the application of Kinect in the medical field. Lange et al. (2011) used Kinect as a game-based rehabilitation tool for balance training. Yeung et al. (2014) found that Kinect was valid in assessing body sway in clinical settings.
Kinect also performed well in measuring some clinically relevant movements of people with Parkinson’s disease (Galna et al., 2014).
Since walkers’ emotional states could be reflected in their gaits (Montepare, Goldstein & Clausen, 1987; Atkinson et al., 2004), and Kinect has been found a low-cost, portable, but valid instrument to record human body movement features (Auvinet et al., 2015; Weber et al., 2012), using Kinect to recognize emotion by gaits could be a feasible practice. Because of the great value of automatic emotion recognition (Cowie et al., 2001; Silva, Morikawa & Petra, 2012), this practice is worth trying. By automating the record and analysis of body expressions, especially the application of machine learning methods, researchers were able to make use of more and more low-level features of configurations directly described by values of 3D coordinate in emotion recognition (De Silva & Bianchi-Berthouze, 2004; Kleinsmith & Bianchi-Berthouze, 2007).
The data-driven low-level features extracted from the original 3D coordinates could not provide an intuitive, high-level description of the gaits pattern under certain affective state, but may be used to train computational models effectively recognizing emotions.
We hypothesize that the walkers’ emotional states (such as happiness and anger) could be reflected in their gaits information recorded by Kinect in the form of coordinates of the main joints of body, and the states could be recognized through machine learning methods.
We conducted an experiment to test this hypothesis and try to develop a computerized method to recognize emotions from Kinect records of gaits.
More information: Identifying emotions from walking using affective and deep features. arXiv:1906.11884 [cs.CV]. arxiv.org/abs/1906.11884