Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation

doi:10.3390/jimaging9110235

Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/88997

Título:	Empowering deaf-hearing communication: Exploring synergies between predictive and generative AI-based strategies towards (Portuguese) sign language interpretation
Autor(es):	Adão, Telmo Oliveira, José A. Shahrabadi, Somayeh Jesus, Hugo Fernandes, Marco Costa, Ângelo Ferreira, Vânia Gonçalves, Martinho Fradeira Lopéz, Miguel A.Guevara Peres, Emanuel Magalhães, Luís Gonzaga Mendes
Palavras-chave:	Deaf-hearing communication Generative pre-trained transformer (GPT) Inclusion Large language models (LLM) Long-short term memory (LSTM) Machine learning (ML) Portuguese sign language Sign language recognition (SLR) Video-based motion analytics
Data:	1-Nov-2023
Editora:	MDPI
Revista:	Journal of Imaging
Resumo(s):	Communication between Deaf and hearing individuals remains a persistent challenge requiring attention to foster inclusivity. Despite notable efforts in the development of digital solutions for sign language recognition (SLR), several issues persist, such as cross-platform interoperability and strategies for tokenizing signs to enable continuous conversations and coherent sentence construction. To address such issues, this paper proposes a non-invasive Portuguese Sign Language (Língua Gestual Portuguesa or LGP) interpretation system-as-a-service, leveraging skeletal posture sequence inference powered by long-short term memory (LSTM) architectures. To address the scarcity of examples during machine learning (ML) model training, dataset augmentation strategies are explored. Additionally, a buffer-based interaction technique is introduced to facilitate LGP terms tokenization. This technique provides real-time feedback to users, allowing them to gauge the time remaining to complete a sign, which aids in the construction of grammatically coherent sentences based on inferred terms/words. To support human-like conditioning rules for interpretation, a large language model (LLM) service is integrated. Experiments reveal that LSTM-based neural networks, trained with 50 LGP terms and subjected to data augmentation, achieved accuracy levels ranging from 80% to 95.6%. Users unanimously reported a high level of intuition when using the buffer-based interaction strategy for terms/words tokenization. Furthermore, tests with an LLM—specifically ChatGPT—demonstrated promising semantic correlation rates in generated sentences, comparable to expected sentences.
Tipo:	Artigo
URI:	https://hdl.handle.net/1822/88997
DOI:	10.3390/jimaging9110235
Arbitragem científica:	yes
Acesso:	Acesso aberto
Aparece nas coleções:	CAlg - Artigos em revistas internacionais / Papers in international journals

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
jimaging-09-00235-v2 (1).pdf		7,02 MB	Adobe PDF	Ver/Abrir

Ver registo completo Sugerir correção Estatísticas

Citations

Altmetrics