Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices

The problem of incomplete data matrices is repeatedly found in large databases; posing a significant obstacle for an effective treatment of data. This paper examines a self-organizing-map (SOM) based method of data imputation under the concept of distance object per one weight; to predict physicoche...

Descrizione completa

Dettagli Bibliografici
Autori principali: Folguera, Laura, Zupan, Jure, Cicerone, Daniel, Magallanes, Jorge
Natura: Texto completo
Lingua:Inglés
Pubblicazione: Elsevier Science Bv 2015
Soggetti:
Accesso online:https://ri.unsam.edu.ar/handle/123456789/1009
id ds-123456789-1009
record_format dspace
institution Repositorio Institucional
collection RI
language Inglés
topic CHEMOMETRICS
ARTIFICIAL NEURAL NETWORK
SELF-ORGANIZING MAPS
MISSING DATA IMPUTATION
ENVIRONMENTAL DATA SET
CIENCIAS QUÍMICAS
CIENCIAS EXACTAS Y NATURALES
spellingShingle CHEMOMETRICS
ARTIFICIAL NEURAL NETWORK
SELF-ORGANIZING MAPS
MISSING DATA IMPUTATION
ENVIRONMENTAL DATA SET
CIENCIAS QUÍMICAS
CIENCIAS EXACTAS Y NATURALES
Folguera, Laura
Zupan, Jure
Cicerone, Daniel
Magallanes, Jorge
Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices
description The problem of incomplete data matrices is repeatedly found in large databases; posing a significant obstacle for an effective treatment of data. This paper examines a self-organizing-map (SOM) based method of data imputation under the concept of distance object per one weight; to predict physicochemical parameters of water samples in a data set where concentrations of different analytes were missed. The method was evaluated according to two different possibilities: (a) including vectors of samples with and without missing data in the training data set and (b) pre-training a SOM for a data set with no missing values and then making imputations for a second data set (prediction set) of samples with missing values. Evaluations were made using a surface water data set of 270 samples from Reconquista River; in Buenos Aires Province; Argentina; by artificially setting a range of 17% to 39% of the data to missing. Results were compared to imputations made through professional criteria. SOMs gave reasonable estimates; with no statistically significant differences from estimates made through professional criteria; proving thus to be a suitable time-saving imputation method.
format Texto completo
author Folguera, Laura
Zupan, Jure
Cicerone, Daniel
Magallanes, Jorge
author_facet Folguera, Laura
Zupan, Jure
Cicerone, Daniel
Magallanes, Jorge
author_sort Folguera, Laura
title Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices
title_short Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices
title_full Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices
title_fullStr Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices
title_full_unstemmed Self-Organizing Maps for Imputation of Missing Data in Incomplete Data Matrices
title_sort self-organizing maps for imputation of missing data in incomplete data matrices
publisher Elsevier Science Bv
publishDate 2015
url https://ri.unsam.edu.ar/handle/123456789/1009
work_keys_str_mv AT folgueralaura selforganizingmapsforimputationofmissingdatainincompletedatamatrices
AT zupanjure selforganizingmapsforimputationofmissingdatainincompletedatamatrices
AT ciceronedaniel selforganizingmapsforimputationofmissingdatainincompletedatamatrices
AT magallanesjorge selforganizingmapsforimputationofmissingdatainincompletedatamatrices
_version_ 1747968924248440832