Category Discovery | Arcitura Patterns

Machine Learning Patterns, Mechanisms > Unsupervised Learning Patterns > Category Discovery

Category Discovery (Khattak)

How can data be categorized into meaningful groups when the groups are not known in advance?

Problem

Knowing plausible categories to which data might belong is not always possible, which makes it impossible to use classification algorithms to categorize data into relevant categories.

Solution

A clustering model is built that automatically groups similar data points into the same categories based on the intrinsic similarities between the attributes of data.

Application

Clustering algorithms, such as K-means, K-medians and hierarchical clustering, are employed to build the clustering model that organizes the data points into homogenous categories.

Mechanisms

Query Engine, Analytics Engine, Processing Engine, Resource Manager, Storage Device, Visualization Engine

A dataset contains data about the spending behavior of customers in a retail store (1). An understanding of the data is required to be gained by finding customers that behave similarly, for which the K-Means algorithm is applied whereby the value of K is 3 (2). This results in similar customers being grouped together into three groups (3). By looking at the centroid of each group, meaningful labels are then allocated to each group (4).

Module 12: Fundamental Service API Design & Management

This pattern is covered in Machine Learning Module 2: Advanced Machine Learning.

For more information regarding the Machine Learning Specialist curriculum, visit www.arcitura.com/machinelearning.