Machine Learning Patterns, Mechanisms > Data Reduction Patterns > Feature Selection
How can only the most relevant set of features be extracted from a dataset for model development?
Development of a simple yet effective machine learning model requires the ability to select only the features that carry the maximum prediction potential. However, when faced with a dataset comprising a large number of features, a trial-and-error approach leads to loss of time and processing resources.
The dataset is analyzed methodically and only a subset of features is kept for model selection, thereby keeping the model simple yet effective.
Established feature selection techniques, such as forward selection, backward elimination and decision tree induction, are applied to the dataset to help filter out the features that do not significantly contribute towards building an effective yet simple model.
Query Engine, Analytics Engine, Processing Engine, Resource Manager, Storage Device
A dataset is prepared that consists of a large number of features (1). The analytics engine mechanism is used to assist with feature selection by exposing the dataset to the decision tree induction technique (2). This results in a subset of the original training dataset with only the most relevant features (3). This dataset is then used to train a new model (4, 5). The resulting model has increased accuracy, takes a shorter time to train and carry out predictions, and only slightly suffers from overfitting (6).