Recognizing Human Emotional Expressions through Body Movements: A Deep Learning Approach


Recognizing human emotional expressions from images or videos is a fundamental area of research in affective computing and computer vision. This research has a wide range of applications in fields such as robotics, human-computer interaction, and healthcare.

In recent years, there has been a growing interest in understanding bodily expressed emotion, a field known as Bodily Expressed Emotion Understanding (BEEU).

This article explores the significance of BEEU, how it differs from facial expression recognition, and a novel approach to address the challenges in this area using deep learning.

Facial Expression Recognition vs. Bodily Expressed Emotion Understanding

Facial expression recognition has been widely studied and relies on the Facial Action Coding System (FACS) as an intermediate representation. FACS involves detecting action units (AUs) in the face, which correspond to the movements of specific facial muscles. These AUs are then used to recognize emotions. However, BEEU takes a different approach. It focuses on recognizing emotions from body movements, which offers several advantages over facial expressions:

  • Reliability in Crowded Scenes: In crowded environments or when a person’s facial area is obscured, body movements and postures remain detectable.
  • Body’s Diagnostic Value: Research suggests that the body may be more diagnostic for emotion recognition than the face.
  • Privacy and Confidentiality: In some applications, facial areas may be inaccessible due to privacy and confidentiality concerns.
  • Difficulty in Faking Emotions: Subtle emotions are challenging to fake through body movements, making them more genuine indicators of emotions.
  • Improved Accuracy: Combining body movements with facial expressions can enhance emotion recognition accuracy.

The Role of Motor Elements in BEEU

In BEEU, similar to FACS for facial expression recognition, researchers have identified specific motor elements related to human movements that correspond to different emotions. For example, individuals may touch their heads with their hands when feeling sad. These motor elements can serve as an intermediate representation for emotion recognition.

Motor elements offer advantages over facial muscle movements, such as being more readily detectable in video and having clearer definitions, making them more suitable for AI recognition. They bridge the gap between low-level movement features like joint velocity and emotion category labels.

Existing Methods and Their Limitations

While some previous studies have incorporated motor elements in BEEU, there is a gap in utilizing deep learning-based methods to comprehensively represent human motion. Many earlier methods relied on handcrafted features or limited the application to lab-controlled environments. Deep learning approaches that did exist often applied generic video or action recognition techniques, neglecting the understanding of motor elements.

The BoME (Body Motor Elements) Dataset

To address the lack of extensive public image or video datasets for deep learning-based motor element analysis, the authors of this work created the BoME dataset. This dataset comprises 1,600 high-quality video clips of human movements, where each clip represents distinct human movements and is annotated with precise movement labels. The Laban Movement Analysis (LMA) system was used to describe these motor elements.

LMA categorizes human movements into five categories: body, effort, space, shape, and phrasing, with over 100 detailed motor elements. To balance annotation cost and dataset size, 11 emotion-related LMA elements were selectively included in the BoME dataset. The dataset was annotated by a certified movement analyst (CMA) with expertise in LMA.

Deep Learning for Motor Element Analysis

The authors used the BoME dataset to evaluate the effectiveness of deep neural networks in learning representations of human movement. Several state-of-the-art video recognition networks, including the Video Swin Transformer (V-Swin), were applied to estimate LMA elements. The study examined the impact of factors such as video sampling rate and pretraining datasets on network performance. The results showed that deep neural networks, particularly V-Swin, performed well in learning appropriate movement representations from BoME.

The Movement Analysis Network (MANet)

To enhance BEEU, a dual-branch, dual-task network named Movement Analysis Network (MANet) was designed. MANet’s branches produced predictions for both bodily expressed emotion and LMA labels. The LMA branch’s features were integrated into the emotion branch for improved emotion recognition. Additionally, a new bridge loss enabled LMA prediction to supervise emotion prediction. MANet was trained on both the BoLD benchmark dataset for BEEU and the BoME dataset.

Results and Conclusion

Experiments using MANet on the BoLD dataset revealed that the approach significantly outperformed single-task baselines, demonstrating the potential of deep learning and motor element analysis in improving the recognition of bodily expressed emotions.

In summary, the field of Bodily Expressed Emotion Understanding is gaining prominence, offering advantages over facial expression recognition. Deep learning, combined with the BoME dataset, has the potential to advance our understanding of human emotions through body movements and improve recognition accuracy in various applications.


In this chapter, we engage in a comprehensive discussion of the key findings, implications, and future directions stemming from our study on Bodily Expressed Emotion Understanding (BEEU) using the BoME dataset, deep neural networks, and the innovative MANet model.

Contributions and Effectiveness

Our study introduced the BoME dataset, based on Laban Movement Analysis (LMA), to enhance BEEU through motor element analysis. We demonstrated that deep neural networks, particularly the Video Swin Transformer (V-Swin), are highly effective in capturing human movement representations using this dataset.

Furthermore, we presented MANet, a novel dual-branch model tailored for the understanding of bodily expressed emotions, which leverages the supervisory information from BoME through a specialized architecture and a custom loss function. Our experiments indicate that MANet surpasses existing approaches in the realm of BEEU.

Our choice of 11 LMA elements related to sadness and happiness represents a starting point. The LMA system encompasses over 100 elements, hinting at a vast potential for additional elements to contribute to emotion recognition. This suggests that further research could lead to the identification of more LMA elements associated with various emotions, thereby enhancing BEEU.

Future Directions

Building upon our study, we identify two primary avenues for future research:

Dataset Expansion and Enrichment: It is imperative to expand the BoME dataset and enrich it with more annotations. This expansion should include a broader range of LMA element labels and corresponding emotion labels. This will provide a richer resource for training and testing BEEU models and allow for a more comprehensive understanding of the relationship between LMA elements and emotions.

Psychological Insights and Affective Computing: Collaboration between researchers in psychology and affective computing can reveal valuable insights into the intricate relationships between LMA elements and emotions. Deepening our understanding of this relationship can lead to improved BEEU models and a deeper comprehension of how emotions are expressed through human movement.

Wider Applicability of LMA

Beyond BEEU, our exploration of LMA and emotion recognition opens the door to diverse practical applications across various domains. We discuss a few notable applications:

Healthcare and Mental Health: In the medical field, particularly in mental health care, the ability to monitor patients’ body movement patterns can alert healthcare professionals to emotional states that require attention. This approach can significantly enhance the efficiency and quality of patient care.

Robotics and Human-Computer Interaction: LMA motor element recognition can empower robots and AI systems to recognize human emotions through body movements. This capability enables these systems to adapt their interactions based on individuals’ emotional states. The outcome is a more personalized, natural, and empathetic human-robot interaction experience.

Human Action Recognition: LMA can potentially enhance general human action recognition. Specific LMA elements correspond to distinct human actions, such as sports movements or social interactions. Expanding LMA element annotations can facilitate a broader analysis of various human activities and their emotional implications.


In conclusion, our study demonstrates the effectiveness of incorporating LMA elements in BEEU, opening up a range of potential applications in diverse domains. The ability to recognize human emotions through body movements provides a foundation for improved human-robot interactions, enhanced healthcare, and more accurate emotion understanding.

As our understanding of LMA and its relationship with emotions continues to evolve, we anticipate further breakthroughs in the field of affective computing and computer vision. In summary, our work not only advances BEEU but also hints at the transformative potential of emotion recognition through human movement in numerous practical applications.

reference link :


Please enter your comment!
Please enter your name here

Questo sito usa Akismet per ridurre lo spam. Scopri come i tuoi dati vengono elaborati.