Background Severe dysarthria and global aphasia drastically reduce speech intelligibility, confining communication to familiar partners. Automatic speech recognition (ASR) systems may show limited performance when processing such atypical speech.Objective To determine whether a speaker-dependent Voice-Input Voice-Output Communication Aid (VIVOCA) embedded in the CapisciAMe app can decode the speech of a person with severe dysarthria and aphasia more accurately than rehabilitation professionals human listeners (RPHL).Methods We conducted a single-case proof-of-concept study. A 34-year-old woman, 15 years post-stroke, recorded 1,120 utterances of 13 target-words across five prompting modalities. A compact convolutional neural network (cnn-trad-fpool3) was trained on these samples and evaluated on an independent set of 936 utterances. Intelligibility was benchmarked against 12 RPHL familiar with the patient. The primary outcome was word-level accuracy.Results The tailored ASR achieved 72.65 % accuracy, outperforming familiar RPHL (mean = 56.75 %, SD = 12.91).Conclusions A personalized ASR system can exceed the intelligibility of human listeners for profoundly disordered speech, supporting its use as an assistive communication technology.

Case Report: Tailored automatic speech recognition in global aphasia with dysarthria - a single case proof of concept

Mulfari, Davide;Cardile, Davide;Vicario, Carmelo Mario;Corallo, Francesco;Mulfari, Salvatore;Tomaiuolo, Francesco
Ultimo
2026-01-01

Abstract

Background Severe dysarthria and global aphasia drastically reduce speech intelligibility, confining communication to familiar partners. Automatic speech recognition (ASR) systems may show limited performance when processing such atypical speech.Objective To determine whether a speaker-dependent Voice-Input Voice-Output Communication Aid (VIVOCA) embedded in the CapisciAMe app can decode the speech of a person with severe dysarthria and aphasia more accurately than rehabilitation professionals human listeners (RPHL).Methods We conducted a single-case proof-of-concept study. A 34-year-old woman, 15 years post-stroke, recorded 1,120 utterances of 13 target-words across five prompting modalities. A compact convolutional neural network (cnn-trad-fpool3) was trained on these samples and evaluated on an independent set of 936 utterances. Intelligibility was benchmarked against 12 RPHL familiar with the patient. The primary outcome was word-level accuracy.Results The tailored ASR achieved 72.65 % accuracy, outperforming familiar RPHL (mean = 56.75 %, SD = 12.91).Conclusions A personalized ASR system can exceed the intelligibility of human listeners for profoundly disordered speech, supporting its use as an assistive communication technology.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11570/3356749
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact