Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices

The problem of incomplete data matrices is repeatedly found in large databases; posing a significant obstacle for an effective treatment of data. This paper examines a self-organizing-map (SOM) based method of data imputation under the concept of distance object per one weight; to predict physicoche...

Descripción completa

Detalles Bibliográficos
Autores principales: Folguera, Laura, Zupan, Jure, Cicerone, Daniel, Magallanes, Jorge
Formato: Texto completo
Idioma:Inglés
Publicado: Elsevier Science Bv 2015
Materias:
Acceso en línea:https://ri.unsam.edu.ar/handle/123456789/1009
Descripción
Sumario:The problem of incomplete data matrices is repeatedly found in large databases; posing a significant obstacle for an effective treatment of data. This paper examines a self-organizing-map (SOM) based method of data imputation under the concept of distance object per one weight; to predict physicochemical parameters of water samples in a data set where concentrations of different analytes were missed. The method was evaluated according to two different possibilities: (a) including vectors of samples with and without missing data in the training data set and (b) pre-training a SOM for a data set with no missing values and then making imputations for a second data set (prediction set) of samples with missing values. Evaluations were made using a surface water data set of 270 samples from Reconquista River; in Buenos Aires Province; Argentina; by artificially setting a range of 17% to 39% of the data to missing. Results were compared to imputations made through professional criteria. SOMs gave reasonable estimates; with no statistically significant differences from estimates made through professional criteria; proving thus to be a suitable time-saving imputation method.