Overcoming Data Scarcity in Earth Science
Etcheverry Venturini, Lorena
Chreties Ceriani, Christian
Castro Casales, Alberto
heavily Environmental mathematical models represent one of the key aids for scientists to forecast, create, and evaluate complex scenarios. These models rely on the data collected by direct field observations. However, assembly of a functional and comprehensive dataset for any environmental variable is difficult, mainly because of i) the high cost of the monitoring campaigns and ii) the low reliability of measurements (e.g., due to occurrences of equipment malfunctions and/or issues related to equipment location). The lack of a sufficient amount of Earth science data may induce an inadequate representation of the response’s complexity in any environmental system to any type of input/change, both natural and human-induced. In such a case, before undertaking expensive studies to gather and analyze additional data, it is reasonable to first understand what enhancement in estimates of system performance would result if all the available data could be well exploited. Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. Different approaches are available to deal with missing data. Traditional statistical data completion methods are used in different domains to deal with single and multiple imputation problems. More recently, machine learning techniques, such as clustering and classification, have been proposed to complete missing data. This book showcases the body of knowledge that is aimed at improving the capacity to exploit the available data to better represent, understand, predict, and manage the behavior of environmental systems at all practical scales.