Accurate segmentation of cardiac structures represents a challenge in medical imaging, as it allows for precise assessment of the heart’s morphological and functional characteristics. The aim of this thesis was to train and evaluate the effectiveness of an unconventional model to tackle a multi-class segmentation task of key cardiac structures, placing particular emphasis on reducing computational costs and inference times for potential application in the hospital setting. To achieve this goal, three models from the YOLO (You Only Look Once) family, primarily developed for real-time object detection tasks, were used. A non-public dataset was collected, consisting of 99 Computer Tomography (CT) scans (Cardiac, Thoracic, and Total Body) partially annotated by expert clinicians. Four additional datasets were derived from this dataset: some through data augmentation techniques and others by reducing the total to 70 CT scans, removing cases with poor tagging. The YOLOv5x, YOLOv8n, and YOLO11n models were tested, and among these, YOLOv8n showed the best performance. Specifically, the best performance was achieved using the dataset reduced to 70 CT scans, applying YOLO’s default data augmentation techniques, and leveraging automatic parameter optimization via Optuna. With the optimal setup, YOLOv8n achieved high performance in multi-class segmentation of cardiac structures, with average IoU values of 0.85, DSC of 0.91, HD of 16.37 mm, and ASSD of 1.16 mm. The results obtained are satisfactory and comparable to those reported in the literature for U-Net models, with high DSC and IoU. This pave the way for further developments aimed at further improving the performance of this model by modifying the architecture and adding, for example, spatial attention modules to improve recognition of more complex regions.
YOLO for Human Heart Segmentation: from 2D CT Scans to 3D models
MEZZATESTA, SABRINA
2026-02-16
Abstract
Accurate segmentation of cardiac structures represents a challenge in medical imaging, as it allows for precise assessment of the heart’s morphological and functional characteristics. The aim of this thesis was to train and evaluate the effectiveness of an unconventional model to tackle a multi-class segmentation task of key cardiac structures, placing particular emphasis on reducing computational costs and inference times for potential application in the hospital setting. To achieve this goal, three models from the YOLO (You Only Look Once) family, primarily developed for real-time object detection tasks, were used. A non-public dataset was collected, consisting of 99 Computer Tomography (CT) scans (Cardiac, Thoracic, and Total Body) partially annotated by expert clinicians. Four additional datasets were derived from this dataset: some through data augmentation techniques and others by reducing the total to 70 CT scans, removing cases with poor tagging. The YOLOv5x, YOLOv8n, and YOLO11n models were tested, and among these, YOLOv8n showed the best performance. Specifically, the best performance was achieved using the dataset reduced to 70 CT scans, applying YOLO’s default data augmentation techniques, and leveraging automatic parameter optimization via Optuna. With the optimal setup, YOLOv8n achieved high performance in multi-class segmentation of cardiac structures, with average IoU values of 0.85, DSC of 0.91, HD of 16.37 mm, and ASSD of 1.16 mm. The results obtained are satisfactory and comparable to those reported in the literature for U-Net models, with high DSC and IoU. This pave the way for further developments aimed at further improving the performance of this model by modifying the architecture and adding, for example, spatial attention modules to improve recognition of more complex regions.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


