Advancing Phonology-Based Sign Language Assessment: From Learner to Machine-Generated Videos
| Type of publication: | Thesis |
| Citation: | Tarigopula_THESIS_2025 |
| Year: | 2025 |
| School: | Ecole polytechnique fédérale de Lausanne (EPFL) |
| URL: | https://infoscience.epfl.ch/en... |
| DOI: | https://doi.org/10.5075/epfl-thesis-11178 |
| Abstract: | Sign languages are rich visual languages that use both manual features like handshape, movement, and location and non-manual features such as facial expressions, head movements, and body posture to convey information. Sign language learning technologies have the potential to bridge the communication gap between hearing and hard-of-hearing communities by providing accessible platforms that deliver actionable, interpretable feedback on signing performance. In the light of advancements in deep learning, sign language recognition (SLR) has seen a plethora of improvements; however, sign language assessment (SLA), the task of evaluating the quality and correctness of signing, remains in its infancy. This thesis aims to advance assessment systems for sign language that not only provide feedback to help learners improve their signing performance, but also hold potential for offering automated feedback in the context of sign language generation (SLG). We build upon an explainable phonology-based framework, advancing it to work with RGB video input with the goal of enabling a webcam-based sign language assessment system that is accessible and suitable for real-world deployment. In this context, we investigate various 2D and 3D skeleton estimation methods derived from RGB input to extract interpretable and linguistically meaningful manual features. To incorporate more holistic representations, we further explore deep learning-based methods for modeling hand movement. Specifically, we evaluate spatio-temporal features from convolutional networks and sign language-agnostic Vision Transformers for assessing hand movement and handshape quality in isolated signs. We then investigate how SLA can be effectively leveraged for evaluating SLG. To this end, we propose a posterior-based assessment framework that uses articulatory posteriors to evaluate the quality of generated signing. This approach enables interpretable, frame-level feedback and is applicable to both video-to-video and text-to-pose generation models. Finally, we examine the role of non-manual features such as facial expressions in continuous sign language, where they play a critical role in marking grammatical structure and sign boundaries. We develop Facial Action Unit (FAU)-based non manual feature detectors, integrate the resulting posteriors into our phonology-based framework, and then analyze their contribution to sign segmentation through alignment. |
| Main Research Program: | Human-AI Teaming |
| Additional Research Programs: |
AI for Everyone |
| Keywords: | phonology-based sign language, posterior-based assessment, sign language generation, Sign language learning, sign language subunits, sign segmentation, spatio-temporal features |
| Projects: |
Idiap SMILE-II |
| Authors: | |
| Added by: | [UNK] |
| Total mark: | 0 |
|
Attachments
|
|
|
Notes
|
|
|
|
|