Parameters and hyperparameters: a conundrum in machine and deep learning

“Be confused. It is where you begin to learn new things.”

-S.C. Lourie, British writer

.

As we plan for the upcoming inaugural comprehensive review course for the nascent American Board of Artificial Intelligence in Medicine and its certification assessment, several topics appear to be conundra for especially those who are not from the data science domain and are therefore worth delineating here at AIMed.

One such conundrum is the difference between parameters and hyperparameters, which are terms that are too often used interchangeably but are different concepts with distinct roles. Model hyperparameters are sometimes referred to as model parameters, and this unfortunate mingling of terms can often lead to understandable confusion. A model parameter is a feature that is internal to the model and is learned from the training data; these are usually not set manually by the data scientist. Examples of parameters include: coefficients in linear or logistic regression, split points in decision trees, support vectors in support vector machines, and weights and biases in artificial neural network. A model hyperparameter, on the other hand, is a configuration that is external to the model and is not learned from the training data; it is set by the data scientist before training as it cannot be estimated from the data (in other words, it is independent from the data). Think of a hyperparameter as a knob on a radio to get a clear signal. Examples of hyperparameters include: shrinkage factor in ridge regression, depth of trees in decision trees, kernel in support vector machines, k in k-nearest neighbor, and many architectural elements in neural networks (number of hidden layers and number of nodes per layer, learning rate for training, type of activation function, dropout, batch size, momentum, weight decay, etc). It should also be noted that a few algorithms (such as linear regression) actually do not require a hyperparameter.

While model training yields the best model parameters, hyperparameter tuning or optimization produces the best hyperparameters. Hyperparameter tuning or optimization has algorithm strategies that include grid search (computationally expensive as a result of the curse of dimensionality), random search, and Bayesian optimization. One common mistake regarding hyperparameter optimization is that the data scientist sometimes conveniently relies on the default values and not truly optimize the hyperparameters. It is important to avoid this error since it is the hyperparameters that can determine the values of the parameters and thus yield the best model performance.

In our next newsletters, we will discuss other common conundra (or conundrums since the origin of this word is uncertain) in artificial intelligence in medicine and healthcare.

Recommended Posts