Á. Cía Mina, J. López Fidalgo
The subsampling procedure is widely used to downsize the data volume and allows computing estimators in regression models. Usually, subsampling is performed defining a weight for each point and selecting a subset according to these weights. The subsample can be chosen at random (Passive Learning), but in order to obtain better estimators, the optimal experimental design theory can be used searching for an influential sub-sample (Active Learning). This has been developed in the literature for linear and logistic regression, obtaining algorithms based on D-optimality and A-optimality. To the authors knowledge the distribution of the explanatory variables has never been considered for obtaining a subsample. We study the effect of the explanatory variables distribution on the estimation as well as the optimal design. We propose a novel method to obtain optimal subsampling through D-optimality, taking into account the marginal distribution of the covariates.
Palabras clave: Active learning, subsampling, optimal design of experiments
Programado
GT07 Diseño de Experimentos III
8 de junio de 2022 17:20
Sala de Conferencias