In recent years, significant progress has been achieved in medical image analysis, mainly due to the substantial advances in deep learning methods. In the past decade, Convolutional Neural Network (CNN) was the best model for image classification, demonstrating remarkable success in various medical applications. However, the advent of Vision Transformers (ViTs) has challenged the dominance of CNN approaches. This study aims to explore the potential of ViTs in healthcare, comparing their performance with that of CNN models. The latter has traditionally excelled in image feature extraction through convolutional operations; on the other hand, ViTs, relying on self-attention mechanisms, exhibit unique capabilities in capturing long-range dependencies, enabling them to effectively capture complex patterns within images. In this study, after analyzing their architectures, we assessed the behaviour of from-scratch and pre-trained models, highlighting their differences in performance and providing light on the applicability of Transfer Learning (TL) approach in the healthcare scenario.

Comparing CNNs and ViTs for Medical Image Classification Leveraging Transfer Learning

Lonia G.
Primo
;
Ciraolo D.
Secondo
;
Fazio M.;Villari M.
Penultimo
;
Celesti A.
Ultimo
2024-01-01

Abstract

In recent years, significant progress has been achieved in medical image analysis, mainly due to the substantial advances in deep learning methods. In the past decade, Convolutional Neural Network (CNN) was the best model for image classification, demonstrating remarkable success in various medical applications. However, the advent of Vision Transformers (ViTs) has challenged the dominance of CNN approaches. This study aims to explore the potential of ViTs in healthcare, comparing their performance with that of CNN models. The latter has traditionally excelled in image feature extraction through convolutional operations; on the other hand, ViTs, relying on self-attention mechanisms, exhibit unique capabilities in capturing long-range dependencies, enabling them to effectively capture complex patterns within images. In this study, after analyzing their architectures, we assessed the behaviour of from-scratch and pre-trained models, highlighting their differences in performance and providing light on the applicability of Transfer Learning (TL) approach in the healthcare scenario.
2024
9798350354232
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11570/3346048
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact