The accurate modeling of significant wave height (Hs) is critical for maritime safety, coastal engineering, and offshore operations. In this study, we propose a clusteringbased data reduction approach to support the training of predictive models over large spatiotemporal marine datasets. The methodology is applied to wave measurements collected from four buoy locations in the Mediterranean Sea as part of the Italian National Wave Recording Network (RON), Mazara del Vallo, Ponza, Monopoli, and Ancona, complemented by wind data from ECMWF reanalysis. A time window segmentation strategy is employed, followed by K -means clustering to extract representative samples. The optimal number of clusters is determined via the total variance criterion and elbow method, ensuring statistical representativeness and capturing key wave dynamics such as storm events and calm conditions. The results demonstrate the method capability to reduce the training dataset without compromising variability in Hs. Furthermore, we assess the spatial transferability of clustering solutions across different sites using normalized total variance, indicating that centroids extracted in one region can generalize well to other buoys. The proposed approach enables the development of computationally efficient and physically interpretable forecasting models.
Clustering-Based Data Reduction for Significant Wave Height Modeling in the Mediterranean Sea: A Multi-Site Analysis
Sapuppo, Francesca
;Ragusa, Giovanni;Patane , Luca;Iuppa, Claudio;Faraci, Carla;Xibilia, Maria Gabriella
2025-01-01
Abstract
The accurate modeling of significant wave height (Hs) is critical for maritime safety, coastal engineering, and offshore operations. In this study, we propose a clusteringbased data reduction approach to support the training of predictive models over large spatiotemporal marine datasets. The methodology is applied to wave measurements collected from four buoy locations in the Mediterranean Sea as part of the Italian National Wave Recording Network (RON), Mazara del Vallo, Ponza, Monopoli, and Ancona, complemented by wind data from ECMWF reanalysis. A time window segmentation strategy is employed, followed by K -means clustering to extract representative samples. The optimal number of clusters is determined via the total variance criterion and elbow method, ensuring statistical representativeness and capturing key wave dynamics such as storm events and calm conditions. The results demonstrate the method capability to reduce the training dataset without compromising variability in Hs. Furthermore, we assess the spatial transferability of clustering solutions across different sites using normalized total variance, indicating that centroids extracted in one region can generalize well to other buoys. The proposed approach enables the development of computationally efficient and physically interpretable forecasting models.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


