Baseline Modeling

Machine Learning Patterns, Mechanisms > Model Evaluation Patterns > Baseline Modeling


Baseline Modeling (Khattak)

How can it be assured that a trained model is a performant model that adds value?


There may be more than one type of algorithm that can be applied to solve a particular type of machine learning problem. However, without knowing how much extra value, if at all, one model carries over others, a sub-optimal model may be selected with the further possibility of experiencing lost time and processing resources.


A baseline is established via a simple algorithm that serves as a yardstick for measuring the effectiveness of other complex models.


For regression problems, mean or median can be used as a baseline result while most frequent (ZeroR), stratified, or uniform models can be used to generate a baseline result for classification problems.


Query Engine, Analytics Engine, Processing Engine, Resource Manager, Storage Device, Visualization Engine









A dataset is prepared (1). The ZeroR algorithm is used to train a baseline model, achieving an accuracy of 50% (2, 3, 4). A classification algorithm is then chosen and the model is trained (5, 6). Upon comparison, the trained model’s accuracy, which is now 45%, is worse than the baseline model. This prompts either the selection of a different classification algorithm or the parameters of the trained model to be tuned (7).

Module 12: Fundamental Service API Design & Management

This pattern is covered in Machine  Learning Module 2: Advanced Machine Learning.

For more information regarding the Machine  Learning Specialist curriculum, visit