Machine Learning Patterns, Mechanisms > Unsupervised Learning Patterns > Category Discovery
Category Discovery (Khattak)
How can data be categorized into meaningful groups when the groups are not known in advance?
Knowing plausible categories to which data might belong is not always possible, which makes it impossible to use classification algorithms to categorize data into relevant categories.
A clustering model is built that automatically groups similar data points into the same categories based on the intrinsic similarities between the attributes of data.
Clustering algorithms, such as K-means, K-medians and hierarchical clustering, are employed to build the clustering model that organizes the data points into homogenous categories.
A dataset contains data about the spending behavior of customers in a retail store (1). An understanding of the data is required to be gained by finding customers that behave similarly, for which the K-Means algorithm is applied whereby the value of K is 3 (2). This results in similar customers being grouped together into three groups (3). By looking at the centroid of each group, meaningful labels are then allocated to each group (4).