A. González Cebrián, A. Folch Fortuny, F. J. Arteaga Moreno, A. J. Ferrer Riquelme
Multivariate datasets often contain missing data and/or cellwise/rowwise outliers. Whereas several solutions have been proposed to deal with each one of these issues independently, the number of suitable techniques that simultaneously confront these phenomena is drastically reduced. In this talk we introduce the RadarTSR algorithm, a Robust Adaptation for Data with Anomalous Rows and/or cells of Trimmed Scores Regression method [1]. RadarTSR detects cellwise and rowwise outliers, imputes missing data without the harmful effect of outliers, and, if grouped rowwise outliers are detected, RadarTSR imputes them with their own model. The performance of RadarTSR is compared to MacroPCA algorithm [2], as far as we are concerned, the only proposal that deals with missing data and contemplates these two different types of outliers. Several simulated and real data sets are used.
[1] Folch-Fortuny et al (2015) Chemolab 146: 77–88.
[2] Hubert et al (2018) Technometrics 61 (4): 1–18.
Keywords: Missing data imputation, cellwise and rowwise outliers, trimmed scores regression, principal component analysis
Scheduled
Multivariate Analysis I
June 8, 2022 4:00 PM
Cloister room