J. Palarea-Albaladejo, J. A. Martín-Fernández, A. Ruiz-Gazen, C. Thomas-Agnan
High-throughput data representing large mixtures of chemical or biological signals have been characterised as compositional data. That is, multivariate data where the variables convey just relative information and are hence properly analysed through log-ratio coordinate representations. Log-ratios however cannot be computed when the data matrix includes entries that are zero (commonly related to censoring below a detection limit) or missing due to diverse experimental reasons. Completing the data matrix by imputation with sensible values allows to proceed with subsequent analysis and modelling. Building on an adapted form of singular value decomposition, we present an imputation method that addresses both types of unobserved values while accounting for the compositional data nature. Simulated and real data sets are used to assess the performance of the proposal and demonstrate its use. It is shown to be a particularly efficient alternative in the high-throughput context.
Keywords: zeros, missing data, nondetects, compositional data, singular value decomposition, log-ratio analysis
Scheduled
Multivariate Analysis I
June 8, 2022 4:00 PM
Cloister room