Nowadays, many application scenarios benefit from automatic speech recognition (ASR) technology. Within the field of speech therapy, in some cases ASR is exploited in the treatment of dysarthria with the aim of supporting articulation output. However, in presence of atypical speech, standard ASR approaches do not provide any reliable result in terms of voice recognition due to main issues, including: (i) the extreme intra and inter-speakers variability of the speech in presence of speech impairments, such as dysarthria; (ii) the absence of dedicated corpora containing voice samples from users with a speech disability to train a state-of-the-art speech model, particularly in non-English languages. In this paper, we focus on isolated word recognition for native Italian speakers with dysarthria and we exploit an existing mobile app to collect audio data from users with speech disorders while they perform articulation exercises for speech therapy purposes. With this data availability, a convolutional neural network has been trained to spot a small number of keywords within atypical speech, according to a speaker dependent method. Finally, we discuss the benefits of the trained ASR system in tailored telerehabilitation contexts intended for patients with dysarthria who can follow treatment plans under the supervision of remote speech language pathologists.

Deep learning applications in telerehabilitation speech therapy scenarios

Mulfari D.
Primo
Conceptualization
;
Celesti A.
Penultimo
Supervision
;
Villari M.
Ultimo
Validation
2022-01-01

Abstract

Nowadays, many application scenarios benefit from automatic speech recognition (ASR) technology. Within the field of speech therapy, in some cases ASR is exploited in the treatment of dysarthria with the aim of supporting articulation output. However, in presence of atypical speech, standard ASR approaches do not provide any reliable result in terms of voice recognition due to main issues, including: (i) the extreme intra and inter-speakers variability of the speech in presence of speech impairments, such as dysarthria; (ii) the absence of dedicated corpora containing voice samples from users with a speech disability to train a state-of-the-art speech model, particularly in non-English languages. In this paper, we focus on isolated word recognition for native Italian speakers with dysarthria and we exploit an existing mobile app to collect audio data from users with speech disorders while they perform articulation exercises for speech therapy purposes. With this data availability, a convolutional neural network has been trained to spot a small number of keywords within atypical speech, according to a speaker dependent method. Finally, we discuss the benefits of the trained ASR system in tailored telerehabilitation contexts intended for patients with dysarthria who can follow treatment plans under the supervision of remote speech language pathologists.
2022
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0010482522006163-main.pdf

solo gestori archivio

Tipologia: Versione Editoriale (PDF)
Licenza: Copyright dell'editore
Dimensione 2.01 MB
Formato Adobe PDF
2.01 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11570/3244993
Citazioni
  • ???jsp.display-item.citation.pmc??? 3
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 9
social impact