Variability Computation | Arcitura Patterns

Machine Learning Patterns, Mechanisms > Data Exploration Patterns > Variability Computation

Variability Computation (Khattak)

How can the spread of values for a single variable in a dataset be determined?

Problem

Developing an intuition about a dataset involves determining the behavior of uncommon values of data. Failure to do so may result in treating abnormal values as normal.

Solution

The behavior of the uncommon values in a dataset is expressed in the form of the spread of values and is quantified via the application of proven statistical techniques.

Application

The numerical values in the dataset are identified and measures of variation, including range, interquartile range (IQR), variance, and standard deviation, are calculated.

Mechanisms

Query Engine, Analytics Engine, Processing Engine, Resource Manager, Storage Device

Variable x belongs to a dataset. An understanding of the values of variable x is required. It is determined that the most common value is 5 (1). In order to gain insight into the behavior of the remaining values, the measures of variation are calculated (2). Based on the value of standard variation, it is determined that the average spread of values from the mean value is 1.93, and further notes that the value of 8 is beyond 1 standard deviation (1 standard deviation is 5.2 + 1.93 = 7.13, whereas 5.2 is the mean value).

Module 12: Fundamental Service API Design & Management

This pattern is covered in Machine Learning Module 2: Advanced Machine Learning.

For more information regarding the Machine Learning Specialist curriculum, visit www.arcitura.com/machinelearning.