Grid search for elastic net regularization

This post is a footnote to documentation to the glmnet package and the tidymodels framework. glmnet is best known for fitting models via penalized maximum likelihood like ridge, lasso and elastic net regression. As explained in its documentatiom, glmnet solves the problem

$\begin{align} \min_{\beta_0,\beta_1}\frac{1}{N}\sum_{i=1}^Nw_il(y_i,\beta_0+β^Tx_i)+\lambda[(1−\alpha)∥\beta∥^2_2/2+\alpha∥\beta∥_1] \end{align}$

where glmnet::glmnet() conducts a grid search over values of $\lambda$ which controls the overall strength of the penalty in the second term. When $\alpha = 1$ we speak of lasso regression which can shrink coefficients to zero (discard them), while ridge regression ($\alpha = 0$) does not remove features.

glmnet::glmnet() expects the user to set the value of $\alpha \in [0,1]$. In practice, however, many users will not know whether ridge, lasso or their mixture is most appropriate to their problem. Especially when we only care about minimizing $|\hat{y} - y|$ we might want to conduct a grid search over both $\lambda$ and $\alpha$ to find the most suitable tuple of hyperparameter values for our model. The example below shows that this dual search is particularly easy to implement with tidymodels.

House prices in Slovakia

To exemplify the grid search I use a toy dataset with possible determinants of average quarterly house prices ($y$) in Slovakia. Let’s pretend that we have no idea about what affects house prices. Our design matrix $X$ consists of a bunch of more or less reasonable predictors like average wage and net migration.

# load the data
data <- ""

We will need the tidymodels set of packages and a seed for replicability.



Our goal is to find a penalized regression model that is good at predicting house prices with the available predictors. To mitigate overfitting, we will both split our data into a training and a test set and deploy k-fold cross-validation.

# let's not worry about missing values and time ids
data <- dat_agg |> 
  dplyr::select(-year,-quarter) |> 
  tidyr::fill(dplyr::everything(), .direction = "updown")

# we keep 1/4 of the observations for testing
split <- initial_split(data,
                       prop = 3/4)

train_data <- training(split)
test_data <- testing(split)

# k-fold cross-validation
folds <- vfold_cv(train_data, v = 10)

We define the model and workflow for fitting it. We will be conducting a grid search over both penalty ($\lambda$) and mixture ($\alpha$).

# define model
model <- linear_reg(penalty = tune(), mixture = tune()) |>

# RHS in formula selects all predictors
model_wf <- workflow() |>
  add_model(model) |>
  add_formula(price ~ .)

Now for the grid search. We are looking for the best combination of $\lambda$ and $\alpha$. We create this grid with the appropriately named expand_grid() (or expand.grid in base).

# suitable values of lambda will vary
# you can run glmnet::glmnet() to get the ballpark range for your data
grid <- tidyr::expand_grid(penalty = seq(0.1, 15, length.out = 50),
                           mixture = c(0, 0.33, 0.66, 1))

We are ready to search the hyperparameter space using our k-folded training data.

# we will use R^2 to evaluate model performance
search_res <- model_wf |> 
            grid = grid,
            control = control_grid(save_pred = TRUE),
            metrics = metric_set(rsq))

The results can be parsed manually by running collect_metrics() but tidymodels comes with a handy autoplot() method that will give us a quick answer as to which hyperparameter tuple performed best.


We can immediately see some clear differences between the model options. For the selected range of $\lambda$, the performance of ridge regression does not change. Given how small our toy dataset is (and $N_{obs} > N_{predictors}$), it is not surprising that small penalties perform best.

The workflow is completed by picking and fitting the best model and seeing how it performs on unseen data (our test set).

# select model by highest R^2
best_param <- search_res |> select_best()

# final workflow
model_wf_final <- finalize_workflow(model_wf, best_param)

# fit final model
final_fit <- model_wf_final |> 
  fit(data = train_data)

# predict using test set
test_data$.pred <- predict(final_fit, new_data = test_data)$.pred
# plot y and yhat
test_data |> 
  ggplot(aes(x = price, y = .pred)) +
  geom_point() +
  geom_abline(slope = 1, color = "grey30") +

Looks good. Final performance metrics are obtained via metrics():

metrics(test_data, truth = price, estimate = .pred)
## # A tibble: 3 x 3
##   .metric .estimator .estimate
##   <chr>   <chr>          <dbl>
## 1 rmse    standard      72.5  
## 2 rsq     standard       0.938
## 3 mae     standard      47.1

And that’s it. glmnet and tidymodels are a powerful and easy-to-use duo for high-dimensional data analysis.

Michal Ovádek
Michal Ovádek
Postdoctoral researcher

I am a scholar of law and politics in the European Union.