Background Severe dysarthria and global aphasia drastically reduce speech intelligibility, confining communication to familiar partners. Automatic speech recognition (ASR) systems may show limited performance when processing such atypical speech.Objective To determine whether a speaker-dependent Voice-Input Voice-Output Communication Aid (VIVOCA) embedded in the CapisciAMe app can decode the speech of a person with severe dysarthria and aphasia more accurately than rehabilitation professionals human listeners (RPHL).Methods We conducted a single-case proof-of-concept study. A 34-year-old woman, 15 years post-stroke, recorded 1,120 utterances of 13 target-words across five prompting modalities. A compact convolutional neural network (cnn-trad-fpool3) was trained on these samples and evaluated on an independent set of 936 utterances. Intelligibility was benchmarked against 12 RPHL familiar with the patient. The primary outcome was word-level accuracy.Results The tailored ASR achieved 72.65 % accuracy, outperforming familiar RPHL (mean = 56.75 %, SD = 12.91).Conclusions A personalized ASR system can exceed the intelligibility of human listeners for profoundly disordered speech, supporting its use as an assistive communication technology.
Case Report: Tailored automatic speech recognition in global aphasia with dysarthria - a single case proof of concept
Mulfari, Davide;Cardile, Davide;Vicario, Carmelo Mario;Corallo, Francesco;Mulfari, Salvatore;Tomaiuolo, FrancescoUltimo
2026-01-01
Abstract
Background Severe dysarthria and global aphasia drastically reduce speech intelligibility, confining communication to familiar partners. Automatic speech recognition (ASR) systems may show limited performance when processing such atypical speech.Objective To determine whether a speaker-dependent Voice-Input Voice-Output Communication Aid (VIVOCA) embedded in the CapisciAMe app can decode the speech of a person with severe dysarthria and aphasia more accurately than rehabilitation professionals human listeners (RPHL).Methods We conducted a single-case proof-of-concept study. A 34-year-old woman, 15 years post-stroke, recorded 1,120 utterances of 13 target-words across five prompting modalities. A compact convolutional neural network (cnn-trad-fpool3) was trained on these samples and evaluated on an independent set of 936 utterances. Intelligibility was benchmarked against 12 RPHL familiar with the patient. The primary outcome was word-level accuracy.Results The tailored ASR achieved 72.65 % accuracy, outperforming familiar RPHL (mean = 56.75 %, SD = 12.91).Conclusions A personalized ASR system can exceed the intelligibility of human listeners for profoundly disordered speech, supporting its use as an assistive communication technology.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


