# Grid search for elastic net regularization

This post is a footnote to documentation to the `glmnet`

package and the `tidymodels`

framework. `glmnet`

is best known for fitting models via penalized maximum likelihood like ridge, lasso and elastic net regression. As explained in its documentatiom, `glmnet`

solves the problem

$\begin{align} \min_{\beta_0,\beta_1}\frac{1}{N}\sum_{i=1}^Nw_il(y_i,\beta_0+β^Tx_i)+\lambda[(1−\alpha)∥\beta∥^2_2/2+\alpha∥\beta∥_1] \end{align}$

where `glmnet::glmnet()`

conducts a grid search over values of *$\lambda$* which controls the overall strength of the penalty in the second term. When *$\alpha = 1$* we speak of lasso regression which can shrink coefficients to zero (discard them), while ridge regression ($\alpha = 0$) does not remove features.

`glmnet::glmnet()`

expects the user to set the value of *$\alpha \in [0,1]$*. In practice, however, many users will not know whether ridge, lasso or their mixture is most appropriate to their problem. Especially when we only care about minimizing *$|\hat{y} - y|$* we might want to conduct a grid search over both *$\lambda$* and *$\alpha$* to find the most suitable tuple of hyperparameter values for our model. The example below shows that this dual search is particularly easy to implement with `tidymodels`

.

## House prices in Slovakia

To exemplify the grid search I use a toy dataset with possible determinants of average quarterly house prices ($y$) in Slovakia. Let’s pretend that we have no idea about what affects house prices. Our design matrix *$X$* consists of a bunch of more or less reasonable predictors like average wage and net migration.

```
# load the data
data <- "https://github.com/michalovadek/misc/blob/main/sk_house_prices.RData?raw=true"
load(url(data))
```

We will need the `tidymodels`

set of packages and a seed for replicability.

```
library(tidymodels)
#require(glmnet)
set.seed(94241)
```

Our goal is to find a penalized regression model that is good at predicting house prices with the available predictors. To mitigate overfitting, we will both split our data into a training and a test set and deploy k-fold cross-validation.

```
# let's not worry about missing values and time ids
data <- dat_agg |>
dplyr::select(-year,-quarter) |>
tidyr::fill(dplyr::everything(), .direction = "updown")
# we keep 1/4 of the observations for testing
split <- initial_split(data,
prop = 3/4)
train_data <- training(split)
test_data <- testing(split)
# k-fold cross-validation
folds <- vfold_cv(train_data, v = 10)
```

We define the model and workflow for fitting it. We will be conducting a grid search over both `penalty`

($\lambda$) and `mixture`

($\alpha$).

```
# define model
model <- linear_reg(penalty = tune(), mixture = tune()) |>
set_engine("glmnet")
# RHS in formula selects all predictors
model_wf <- workflow() |>
add_model(model) |>
add_formula(price ~ .)
```

Now for the grid search. We are looking for the best combination of *$\lambda$* and *$\alpha$*. We create this grid with the appropriately named `expand_grid()`

(or `expand.grid`

in `base`

).

```
# suitable values of lambda will vary
# you can run glmnet::glmnet() to get the ballpark range for your data
grid <- tidyr::expand_grid(penalty = seq(0.1, 15, length.out = 50),
mixture = c(0, 0.33, 0.66, 1))
```

We are ready to search the hyperparameter space using our k-folded training data.

```
# we will use R^2 to evaluate model performance
search_res <- model_wf |>
tune_grid(folds,
grid = grid,
control = control_grid(save_pred = TRUE),
metrics = metric_set(rsq))
```

The results can be parsed manually by running `collect_metrics()`

but `tidymodels`

comes with a handy `autoplot()`

method that will give us a quick answer as to which hyperparameter tuple performed best.

```
autoplot(search_res)
```

We can immediately see some clear differences between the model options. For the selected range of *$\lambda$*, the performance of ridge regression does not change. Given how small our toy dataset is (and $N_{obs} > N_{predictors}$), it is not surprising that small penalties perform best.

The workflow is completed by picking and fitting the best model and seeing how it performs on unseen data (our test set).

```
# select model by highest R^2
best_param <- search_res |> select_best()
# final workflow
model_wf_final <- finalize_workflow(model_wf, best_param)
# fit final model
final_fit <- model_wf_final |>
fit(data = train_data)
# predict using test set
test_data$.pred <- predict(final_fit, new_data = test_data)$.pred
```

```
# plot y and yhat
test_data |>
ggplot(aes(x = price, y = .pred)) +
geom_point() +
geom_abline(slope = 1, color = "grey30") +
theme_minimal()
```

Looks good. Final performance metrics are obtained via `metrics()`

:

```
metrics(test_data, truth = price, estimate = .pred)
```

```
## # A tibble: 3 x 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 rmse standard 72.5
## 2 rsq standard 0.938
## 3 mae standard 47.1
```

And that’s it. `glmnet`

and `tidymodels`

are a powerful and easy-to-use duo for high-dimensional data analysis.