Hyperparameters in DL#

For a given machine learning model*, there are usually several hyperparameters configuring it. Tweaking their values to reach the optimum performance of the model is what is referred to as hyperparameter tuning. This could be done manually, using ranges of possible values for each parameter and embedded for loops to go over all possible combinations. But it would be very tedious work and impractical as there may be too many possibilities - and recall that a single training involves numerous computations!

Exercise

If your model has five hyperparameters and you want to try 10 different values for each of them, how many tuning combinations will there be?

Luckily, some tools are available to do the tuning!

Summary#

The exhaustive Grid Search method is good for a restricted hyperparameter space. It requires some prior knowledge from the users on ballparks and values of hyperparameters that are known to perform well (e.g. \(\alpha\) between 0.01 and 0.1). It can be a first step for tuning hyperparameters.

The Randomized Search is preferable when the hyperparameter space is large. It can take longer but the user has more control on the execution time as it directly depends on the number of sampled combinations. It is the search to opt for when not knowing which hyperparameter values would work.

There are more advanced methods for hyperparameter tuning such as Bayesian Optimization and Evolutionary Optimization.

Learn More

Yoshua Bengio, “Practical Recommendations for Gradient-Based Training of Deep Architectures” (2012) arXiv:1206.5533

“Hyperparameter Optimization With Random Search and Grid Search” on machinelearningmastery.com

“Comparing Randomized Search and Grid Search for Hyperparameter Estimation” on Scikit-Learn