Machine Learning Patterns, Mechanisms > Model Evaluation Patterns > Baseline Modeling
Baseline Modeling (Khattak)
How can it be assured that a trained model is a performant model that adds value?
Problem
There may be more than one type of algorithm that can be applied to solve a particular type of machine learning problem. However, without knowing how much extra value, if at all, one model carries over others, a sub-optimal model may be selected with the further possibility of experiencing lost time and processing resources.
Solution
A baseline is established via a simple algorithm that serves as a yardstick for measuring the effectiveness of other complex models.
Application
For regression problems, mean or median can be used as a baseline result while most frequent (ZeroR), stratified, or uniform models can be used to generate a baseline result for classification problems.
Mechanisms
Query Engine, Analytics Engine, Processing Engine, Resource Manager, Storage Device, Visualization Engine
A dataset is prepared (1). The ZeroR algorithm is used to train a baseline model, achieving an accuracy of 50% (2, 3, 4). A classification algorithm is then chosen and the model is trained (5, 6). Upon comparison, the trained model’s accuracy, which is now 45%, is worse than the baseline model. This prompts either the selection of a different classification algorithm or the parameters of the trained model to be tuned (7).