P. urruchi, J. López Fidalgo
Text data is difficult to incorporate into statistical analysis for its
high dimensionality and unstructured form. A very high dimensional classification
problem is here presented and the challenge is to reduce estimators into a bunch of
meaningful tokens, i.e variables, by performing some sort of regularization. The
goal is not just to achieve an accurate classification, but getting a meaningful
quantitative model for later qualitative analysis using resulting tokens which
are interpretable for qualitative analyst with knowledge in the respective field.
Applications reach fields traditionally approached qualitatively such as health,
sociology, psychology, politics and so on.
Keywords: Lasso, Regression, Coordinate Descent, Proximal Gradient
Scheduled
GT07 Design of Experiments II
June 8, 2022 4:00 PM
Conference hall