Center, spread and shape of distributions are statistical measures that describe data sets, they are called summary statistics.
A center of a data set is a way of describing a location. We can measure a center of a data in 3 different ways: the mean (average), the median and the mode.
A spread of a data set describes how similar or varied the set of the observed values. We can measure a center of a data in 2 different ways: a range and a standard deviation.
Center measures
The mean is the average value of a given data set.
Mean = average = sum of the values / number of the values
The median is the middle number in a sorted in ascending order data set (the median is the value that splits the data set into two halves). To calculate the median, arrange the values in an ascending order, count them and calculate the median. If the number of values is odd, the median is the middle value. If the number of values is odd, the median is the average of the two middle values.
The mode of a data set is the number that occurs most frequently in the set. To determine the mode, order the numbers from least to greatest, count how many times each number occurs and determine the mode. If no value appears more than once in the data set, the data set has no mode. If a there are two values that appear in the data set an equal number of times, they both will be modes etc.
Spread measures
The range measures the spread of a data inside the limits of a data set, it is calculated as a difference between the highest and lowest values in the data set. The larger the range, the greater the spread of the data.
range= the highest value – the lowest value.
The standard deviation is the measure of the overall spread (variability) of a data set values from the mean. The spread is measured as the distances (absolute values) from the mean of each value of the data sat. The more spread out a data set is, the greater are the distances from the mean and the standard deviation.
Outliers
An outlier is a value that is very different from the other values, so that it lies an abnormal distance from other values and is far from the middle of the data set.
- Mean: Removing a big outlier will reduce the mean value and removing a small outlier will enlarge the mean value.
- Median: Removing a small outlier will enlarge the median; removing a large outlier will reduce the median (the median will be the same if the values in the positions after the removal are equal to the values in the positions before the removal).
- Range: The value that will replace the outlier will be less distant therefore the range after removing the outlier will be smaller.
- Standard deviation: Since the outlier is a value that is far from other values and the mean, its removal will reduce the spread of the data and the standard deviation.
Continue reading this page for detailed explanations and examples.