A. Mayr
Statistical boosting is a computational approach that emerged from machine learning, which allows to fit regression models in the presence of high-dimensional data. Furthermore, the algorithm can lead to data-driven variable selection. In more classical low-dimensional settings, however, the final models typically tend to include too many variables. This is due to a slow overfitting behavior of boosting. As a result, more variables get included into the final model without altering the prediction accuracy. Many of these false positives are incorporated with a small coefficient and therefore have a small impact, but lead to a larger model. We try to overcome this issue by giving the algorithm the chance to deselect base-learners with minor importance or to stop before they are selected in the first place. The approach is illustrated by a real-life biostatistical application, highlighting the importance of variable selection also in classical clinical trials or registries.
Keywords: statistical learning, biostatistics, prediction modelling, regression, variable selection
Scheduled
GT14 Biostatistics. The role of Biostatistics in Health Data Science
June 9, 2022 12:00 PM
A13