This study explores the application of data fusion (DF) in tele-rehabilitation (TR). The idea is to combine the patient's cognitive and motor information captured during remote sessions using a low-cost webcam with the purpose of training a Machine Learning (ML) model to assess the patient's outcomes. Data is captured by analysing video sequences of rehabilitation sessions using Face Expression Recognition (FER) and Pose Estimation (PE) models, respectively providing cognitive and motor information. Furthermore, we apply a Long Short-Term Memory (LSTM) model to analyse and classify temporal sequences of both facial expression and movement data. Two DF techniques, i.e., early fusion and intermediate fusion, are compared by training an LSTM model on a dataset composed of skeletal movements data (UI-PRMD dataset) and facial mesh data. The early fusion approach combines raw features, whereas the intermediate fusion uses pre-processed features. Experiments demonstrate that the LSTM model trained with an intermediate fusion approach presents the best performance. These findings highlight the potential of DF in enhancing TR by providing more accurate assessments of patient movements and emotional states, which is crucial for improving remote rehabilitation outcomes. Specifically, we evaluated the performance (i.e., accuracy, training time and inference time) of unimodal and multimodal training, using both early fusion and intermediate fusion techniques.
Data Fusion in Tele-Rehabilitation: Combining Cognitive and Motor Datasets to Train a Machine Learning Model for Assessing Patients' Outcomes
Lonia G.Primo
;Ciraolo D.Secondo
;Calabro R. S.;Fazio M.;Villari M.Penultimo
;Celesti A.Ultimo
2025-01-01
Abstract
This study explores the application of data fusion (DF) in tele-rehabilitation (TR). The idea is to combine the patient's cognitive and motor information captured during remote sessions using a low-cost webcam with the purpose of training a Machine Learning (ML) model to assess the patient's outcomes. Data is captured by analysing video sequences of rehabilitation sessions using Face Expression Recognition (FER) and Pose Estimation (PE) models, respectively providing cognitive and motor information. Furthermore, we apply a Long Short-Term Memory (LSTM) model to analyse and classify temporal sequences of both facial expression and movement data. Two DF techniques, i.e., early fusion and intermediate fusion, are compared by training an LSTM model on a dataset composed of skeletal movements data (UI-PRMD dataset) and facial mesh data. The early fusion approach combines raw features, whereas the intermediate fusion uses pre-processed features. Experiments demonstrate that the LSTM model trained with an intermediate fusion approach presents the best performance. These findings highlight the potential of DF in enhancing TR by providing more accurate assessments of patient movements and emotional states, which is crucial for improving remote rehabilitation outcomes. Specifically, we evaluated the performance (i.e., accuracy, training time and inference time) of unimodal and multimodal training, using both early fusion and intermediate fusion techniques.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


