In recent years, significant progress has been achieved in medical image analysis, mainly due to the substantial advances in deep learning methods. In the past decade, Convolutional Neural Network (CNN) was the best model for image classification, demonstrating remarkable success in various medical applications. However, the advent of Vision Transformers (ViTs) has challenged the dominance of CNN approaches. This study aims to explore the potential of ViTs in healthcare, comparing their performance with that of CNN models. The latter has traditionally excelled in image feature extraction through convolutional operations; on the other hand, ViTs, relying on self-attention mechanisms, exhibit unique capabilities in capturing long-range dependencies, enabling them to effectively capture complex patterns within images. In this study, after analyzing their architectures, we assessed the behaviour of from-scratch and pre-trained models, highlighting their differences in performance and providing light on the applicability of Transfer Learning (TL) approach in the healthcare scenario.
Comparing CNNs and ViTs for Medical Image Classification Leveraging Transfer Learning
Lonia G.Primo
;Ciraolo D.Secondo
;Fazio M.;Villari M.Penultimo
;Celesti A.Ultimo
2024-01-01
Abstract
In recent years, significant progress has been achieved in medical image analysis, mainly due to the substantial advances in deep learning methods. In the past decade, Convolutional Neural Network (CNN) was the best model for image classification, demonstrating remarkable success in various medical applications. However, the advent of Vision Transformers (ViTs) has challenged the dominance of CNN approaches. This study aims to explore the potential of ViTs in healthcare, comparing their performance with that of CNN models. The latter has traditionally excelled in image feature extraction through convolutional operations; on the other hand, ViTs, relying on self-attention mechanisms, exhibit unique capabilities in capturing long-range dependencies, enabling them to effectively capture complex patterns within images. In this study, after analyzing their architectures, we assessed the behaviour of from-scratch and pre-trained models, highlighting their differences in performance and providing light on the applicability of Transfer Learning (TL) approach in the healthcare scenario.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


