Missing data is a very frequent problem in climatology, it influences on the quality of results that will afford in hydrological studies, as well as water resources management. This paper proposes a new imputation algorithm, based on the optimization of some regression methods, which are hot deck, k-nearest-neighbors imputation, weighted k-nearest-neighbors imputation, multiple imputation, linear regression and simple average method. The choice of these methods was justified by qualitative and quantitative statistical tests analysis. However, the reliability of obtained results depends mainly on percentage of missing data, choice of neighboring stations and data missingness mechanism which should be missing at random. During the study it was found that the most of stations in Soummam watershed don't have a good correlation because the large loss in rainfall data or the geology of watershed which gives a relationship between station position and rainfall variability. For this case, principal component analysis is applied on a set of stations; it showed a positive impact of altitude, latitude and longitude on correlation index between selected stations. The graphical analysis of the normal law on RMSE values, which were obtained by applying the proposed technique in several random cases of missingness, that are 4%, 8%, 12% and 16% respectively, it confirmed the validity and the performance of this approach.
A new approach for processing climate missing databases applied to daily rainfall data in Soummam watershed, Algeria
Scarpa, Marco;Bonaccorso, BrunellaPenultimo
;
2019-01-01
Abstract
Missing data is a very frequent problem in climatology, it influences on the quality of results that will afford in hydrological studies, as well as water resources management. This paper proposes a new imputation algorithm, based on the optimization of some regression methods, which are hot deck, k-nearest-neighbors imputation, weighted k-nearest-neighbors imputation, multiple imputation, linear regression and simple average method. The choice of these methods was justified by qualitative and quantitative statistical tests analysis. However, the reliability of obtained results depends mainly on percentage of missing data, choice of neighboring stations and data missingness mechanism which should be missing at random. During the study it was found that the most of stations in Soummam watershed don't have a good correlation because the large loss in rainfall data or the geology of watershed which gives a relationship between station position and rainfall variability. For this case, principal component analysis is applied on a set of stations; it showed a positive impact of altitude, latitude and longitude on correlation index between selected stations. The graphical analysis of the normal law on RMSE values, which were obtained by applying the proposed technique in several random cases of missingness, that are 4%, 8%, 12% and 16% respectively, it confirmed the validity and the performance of this approach.File | Dimensione | Formato | |
---|---|---|---|
A-new-approach-for-processing-climate-missing-databases-applied-to-daily-rainfall-data-in-Soummam-watershed-Algeria2019HeliyonOpen-Access.pdf
accesso aperto
Descrizione: Articolo principale
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
2.27 MB
Formato
Adobe PDF
|
2.27 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.