P. Malagón Selma, A. Debón Aucejo, A. J. Ferrer Riquelme
This talk compares three supervised multivariate techniques: partial least squares discriminant analysis (PLS-DA), random forest (RF) and logistic regression, to discover the football teams game actions that contribute to reaching the top positions or avoiding the bottom positions. Data from the "Big Five" teams during the 2018-2019 season were used. In the RF, we propose a permutation test to calculate the p-value for studying the statistical significance. The results were compared with those obtained using two-sample t-tests, demonstrating the advantages of multivariate approaches over univariate ones. In this case, the PLS-DA outperforms the other methods to establish the variables with the greatest contribution to the success or failure of a team. The results emphasize the high number of attacking actions that top teams made, whereas bottom teams have weak defenses and few offensive actions. Classification errors are used to evaluate the impact of chance in the final outcome.
Palabras clave: Principal component analysis (PCA), multivariate supervised methods, t-test, partial least squares discriminant analysis (PLS-DA), random forest (RF), logistic regression (RL), game actions
Programado
Sesión Invitada Análisis de datos en el deporte
8 de junio de 2022 17:20
Sala Audiovisuales