P. urruchi, J. López Fidalgo
Text data is difficult to incorporate into statistical analysis for its
high dimensionality and unstructured form. A very high dimensional classification
problem is here presented and the challenge is to reduce estimators into a bunch of
meaningful tokens, i.e variables, by performing some sort of regularization. The
goal is not just to achieve an accurate classification, but getting a meaningful
quantitative model for later qualitative analysis using resulting tokens which
are interpretable for qualitative analyst with knowledge in the respective field.
Applications reach fields traditionally approached qualitatively such as health,
sociology, psychology, politics and so on.
Palabras clave: Lasso, Regression, Coordinate Descent, Proximal Gradient
Programado
GT07 Diseño de Experimentos II
8 de junio de 2022 16:00
Sala de Conferencias