Mean

1 post

Center, spread and shape of distributions

Center, spread and shape of distributions on the SAT test

SAT Subscore: Problem solving and data analysis

Studying center, spread and shape of distributions

On the SAT test center, spread and shape of distributions topic is part of problem solving and data analysis subscore that includes 9 advanced topics (see the full topics list on the top menu). 

Center, spread and shape of distributions topic is the seventh topic of problem solving and data analysis subscore. It is recommended to start learning problem solving and data analysis subscore with its first topic called ratios, rates and proportions.

Center, spread and shape of distributions topic is divided into sections from easy to difficult (the list of the sections appears on the left menu). Each section includes detailed explanations of the required material with examples followed by a variety of self-practice questions with solutions.

Finish studying heart of algebra subscore topics before you study this topic or any other problem solving and data analysis subscore topic. (Heart of algebra subscore includes basic algebra topics which knowledge is required for understanding problem solving and data analysis subscore topics). 

Center, spread and shape of distributions - summary

Center, spread and shape of distributions are statistical measures that describe data sets, they are called summary statistics.

A center of a data set is a way of describing a location. We can measure a center of a data in 3 different ways: the mean (average), the median and the mode.

A spread of a data set describes how similar or varied the set of the observed values. We can measure a center of a data in 2 different ways: a range and a standard deviation.

Center measures

The mean is the average value of a given data set.
Mean = average = sum of the values / number of the values

The median is the middle number in a sorted in ascending order data set (the median is the value that splits the data set into two halves). To calculate the median,  arrange the values in an ascending order, count them and calculate the median. If the number of values is odd, the median is the middle value. If the number of values is odd, the median is the average of the two middle values.

The mode of a data set is the number that occurs most frequently in the set. To determine the mode, order the numbers from least to greatest, count how many times each number occurs and determine the mode. If no value appears more than once in the data set, the data set has no mode. If a there are two values that appear in the data set an equal number of times, they both will be modes etc.

Spread measures

The range measures the spread of a data inside the limits of a data set, it is calculated as a difference between the highest and lowest values in the data set. The larger the range, the greater the spread of the data.
range= the highest value – the lowest value.

The standard deviation is the measure of the overall spread (variability) of a data set values from the mean. The spread is measured as the distances (absolute values) from the mean of each value of the data sat. The more spread out a data set is, the greater are the distances from the mean and the standard deviation.

Outliers

An outlier is a value that is very different from the other values, so that it lies an abnormal distance from other values and is far from the middle of the data set.

  • Mean: Removing a big outlier will reduce the mean value and removing a small outlier will enlarge the mean value. 
  • Median: Removing a small outlier will enlarge the median; removing a large outlier will reduce the median (the median will be the same if the values in the positions after the removal are equal to the values in the positions before the removal).
  • Range: The value that will replace the outlier will be less distant therefore the range after removing the outlier will be smaller.
  • Standard deviation: Since the outlier is a value that is far from other values and the mean, its removal will reduce the spread of the data and the standard deviation.  

Continue reading this page for detailed explanations and examples.

Measuring a center of a data set

A center of a data set is a way of describing a location. We can measure a center of a data in 3 different ways: the mean (average), the median and the mode.

We can measure a center of a data in 3 different ways: the mean (average), the median and the mode.

Mean calculation

Mean is an average value of a given data set. To calculate the mean, we need to add the total values given in a data set and then divide the sum by the total number of the values.

The mean formula is:
Mean = average = sum of the values / number of the values

Note that if a single value appears in a data set number of times, we need to include it number of times when calculating the sum of the values.

Consider the following example:

What is the mean (average) of the following numbers 10, 12, 16, 5 and 2?

mean = sum of the values / number of the values

mean=(10+12+16+5+2)/5=45/5=9.

Consider the following example:

There are 3 children in 3 families and 2 children in 2 families.

What is the mean (average) of the children in a family?

We need to translate the word problem into numerical values. Since the values 2 and 3 appears in a data set number of times, we need to include them number of times when calculating the sum of the values.

The values of the number of children in the families are 3, 3, 3, 2 and 2.

mean = sum of the values / number of the values

mean=(3+3+3+2+2)/5=13/5=2.6 children in a family.

Finding missing values given the mean

The mean formula is:
Mean = average = sum of the values / number of the values

If we are given the mean value, we can solve the equation of the mean formula for 1 value that is missing. This value can be one of the numbers in the data set.

For example:
In a data set {-21, 5, x, 10} the mean is 0.5.  The value of x is:
Mean = sum of the values / number of the values
(-21+5+x+10)/4=0.5
-21+5+x+10=4*0.5
-6+x=2
x=8

Consider the following example:

The average grade of 4 students is 75.

The teacher added another grade so that the average became 5 points higher.

What was the grade that the teacher added?

average = sum of the values / number of the values

(4*75+x)/5=75+5
300+x=80*5
300+x=400
x=400-300
x=85

Checking the answer: (4*75+100)/5=(300+100)/5=400/5=80.

Median calculation

The median is the middle number in a sorted in ascending or descending order data set. In other words, the median is the value that splits the data set into two halves.

Medial calculation steps:

Step 1: Arrange the values in an ascending (or a descending) order.

Step 2: Count how many values are in the data set.

Step 3: Calculate the median: If the number of values is odd, the median is the middle value. If the number of values is odd, the median is the average of the two middle values.

Note that if a single value appears in a data set number of times, we need to include it number of times when arranging the values in an ascending (or a descending) order.

Consider the following example:

What is the median of the following numbers 10, 12, 16, 5 and 2?

Step 1: Arranging the values in an ascending order: 2, 5, 10, 12 and 16.

Step 2: Counting how many values are in the data set: there are 5 values in the data set.

Step 3: Calculating the median: the number of values is odd, therefore the median is the middle value which is 10.

2, 5, 10, 12, 16

Consider the following example:

There are 3 children in 3 families, 1 child in 3 families and 2 children in 2 families.

What is the median of the children in a family?

We need to translate the word problem into numerical values. Since the values 1, 2 and 3 appears in a data set number of times, we need to include them number of times when arranging the values in an ascending order.

The values of the number of children in the families are 3, 3, 3, 1, 1, 1, 2 and 2.

Step 1: Arranging the values in an ascending order: 1, 1, 1, 2, 2, 3, 3 and 3.

Step 2: Counting how many values are in the data set: there are 8 values in the data set.

Step 3: Calculating the median: the number of values is even, therefore the median is the average of the two middle values in the locations 4 and 5 which is (2+2)/2=2.

1, 1, 1, 2, 2, 3, 3, 3

Calculating a median of a frequency graph

Arrange the data given in the graph in a table and follow the previous steps.

Consider the following example:

The table below shows exam grades of a group of students.

What is the median grade?

a median value of a frequency graph

Arranging the data from the graph in a table:
65-70      3
70-75      1
75-80      2
80-85      0
85-90      3
90-95      1

Arranging the values in an ascending order: 65-70, 65-70, 65-70, 70-75, 75-80, 75-80, 85-90, 85-90, 85-90, 90-95.

The number of the values is 3+1+2+3+1=10. The number of the values is even, therefore the median is the average of the groups in the positions 5 and 6. This is the group 75-80.

65-70, 65-70, 65-70, 70-75, 75-80, 75-80, 85-90, 85-90, 85-90, 90-95

Mode calculation

The mode of a data set is the number that occurs most frequently in the set.

Calculating the mode steps:

Step 1: Order the numbers from least to greatest.

Step 2: Count how many times each number occurs.

Step 3: Determine the mode- a data set can have more than one mode or no mode:

If no value appears more than once in the data set, the data set has no mode.

If a there are two values that appear in the data set an equal number of times, they both will be modes etc.

Consider the following example:

What is the mode of the following numbers 10, 12, 16, 5 and 2?

Step 1: Ordering the numbers from least to greatest- 2, 5, 10, 12 and 16.

Step 2: Counting how many times each number occurs- each number occurs one time.

Step 3: Determining the mode- there is no mode in the data set.

Consider the following example:

There are 3 children in 3 families, 1 child in 3 families and 2 children in 2 families. What is the mode of the children in a family?

This word problem already describes the values in groups, therefore we don’t need to arrange and count the values. Since 3 is the largest number of families, there are two modes which are 1 and 3 children.

1, 1, 1, 2, 2, 3, 3, 3  

Measuring a spread of a data set

A spread of a data set describes how similar or varied the set of the observed values.

We can measure a center of a data in 2 different ways: a range and a standard deviation.

Range calculation

A range measures the spread of a data inside the limits of a data set, it is calculated as a difference between the highest and lowest values in the data set. The larger the range, the greater the spread of the data.

A range formula is: range= the highest value – the lowest value.

Consider the following example:

What is the range of the following numbers 10, 12, 16, 5 and 2?

range= the highest value – the lowest value

range=16-2=14.

Consider the following example:

There are 3 children in 3 families, 1 child in 3 families and 2 children in 2 families. What is the range of the number of children in a family?

range= the highest value – the lowest value

range=3-1=2.

Standard deviation calculation

Standard deviation is the measure of the overall spread (variability) of a data set values from the mean. The spread is measured as the distances (absolute values) from the mean of each value of the data sat. The more spread out a data set is, the greater are the distances from the mean and the standard deviation.

Standard deviation calculation is not covered in the SAT, but you need to know to determine which data group has a greatest standard deviation.

Comparing standard deviations of number of data sets steps:

Step 1: Calculate the mean of each data set.

Step 2: Calculate the distance of each value from the mean. Note that the distance should be calculated as an absolute value.

Step 3: Summarize the distances (absolute values from step 2) of each dataset. The data set that has the smaller sum has the smallest standard deviation.

Consider the following example:

Which of the two following groups of numbers has a higher standard deviation?

Group A: 12, 16 and 20.

Group A: 7, 10 and 16.

Step 1: Calculating the mean of each group:
Group A: mean = average = sum of the values / number of the values=(12+16+20)/3=48/3=16.
Group B: mean = average = sum of the values / number of the values =(7+10+16)/3=33/3=11.

Step 2: Calculating the distance of each value from the mean:
Group A: 12-16=-4, 16-16=0, 20-16=4.
Group B: 7-11=-4, 10-11=-1, 16-11=5.

Step 3: Summarizing the distances (absolute values from step 2) of each group:
Group A: 4+0+4=8.
Group B: 4+1+5=10.

Group B has a bigger standard deviation than group A.

Outliers and their effect on summary statistics measures

An outlier is a value that is very different from the other values, so that it lies an abnormal distance from other values and is far from the middle of the data set.

For example:
In a data set of 5, 8, 10 and 30 the outlier is 30.
In a data set -21, 5, 8, 10 the outlier is -21.

The effect of an outlier on a mean

The mean formula is: mean = average = sum of the values / number of the values

Removing a value from a data set reduces the number of values (denominator) by 1 and changes the sum of the values (numerator). If we remove a value that is smaller than the mean, the new mean will be bigger; If we remove a value that is bigger than the mean, the new mean will be smaller; if we remove a value that is equal to the mean, the new mean will be without change. We know that the outlier is significantly smaller or bigger than the mean, therefore removing a big outlier will reduce the mean value and removing a small outlier will enlarge the mean value.

Consider the following example:

Calculate the mean of the following data sets with and without the outlier:

{5, 7, 8, 40}

{-21, 5, 8, 10}

In a data set of {5, 7, 8, 40} the mean is (5+7+8+40)/4=60/4=15.
The data set without the outlier of 40 is {5, 7, 8}, its mean is (5+7+8)/3=20/3=6.333.
The mean reduced from 15 to 6.33, since the outlier 40 is larger than the mean 15.

In a data set {-21, 5, 8, 10} the mean is (-21+5+8+10)/4=2/4=0.5.
The data set without the outlier of -21 is {5, 8, 10}, the mean is (5+8+10)/4=23/4=7.67.
The mean increased from 0.5 to 7.67, since the outlier -21 is smaller than the mean 0.5.

The effect of an outlier on a median

The median is the middle number in a sorted in ascending order data set.

The median is calculated from the middle value/ values of the data set, therefore removing the outlier will change the positions of the values that are taken to calculate the median.

If we remove a small outlier (the first number), the positions of all values get smaller by 1 (value number 2 becomes number 1, value number 3 becomes number 2…). After removing a small outlier, we take values in bigger positions to calculate the median. Since the values are in ascending order, the numbers in bigger positions have bigger values, therefore the median will be bigger (the median will be same if the values in the positions after the removal are equal to the values in the positions before the removal).   

If we remove a large outlier (the last number), the positions of all values don’t change but the positions of the values that are taken to calculate the median get smaller by 1. Since the values are in ascending order, the numbers in smaller positions have smaller values, therefore the median will be smaller (the median will be same if the values in the positions after the removal are equal to the values in the positions before the removal).

Removing a small outlier will enlarge the median; removing a large outlier will reduce the median (the median will be the same if the values in the positions after the removal are equal to the values in the positions before the removal).

Consider the following example:

Calculate the median of the following data sets with and without the outlier:

{5, 7, 8, 40}

{-21, 5, 8, 10}

In a data set of {5, 7, 8, 40} the median is (7+8)/2=7.5.
The data set without the outlier of 40 is {5, 7, 8}, its median is 7.
The median decreased from 7.5 to 7 after removing a big outlier of 40.

In a data set {-21, 5, 8, 10} the median is (5+8)/2=13/2=6.5.
The data set without the outlier of -21 is {5, 8, 10}, its median is 8.
The median increased from 6.5 to 8 after removing a small outlier of -21.

Note that if the numbers in the new locations (after removing the median) are equal to the numbers in the previous locations (before removing the median), the median will be the same.

The effect of an outlier on the range

The range formula is: range= the highest value – the lowest value.

Since an outlier is a very small or a very big value, it is included in the range calculation, therefore removing the outlier will affect the value of the range. The value that will replace the outlier will be less distant therefore the range after removing the outlier will be smaller.

Consider the following example:

Calculate the range of the following data sets with and without the outlier:

{5, 7, 8, 40}

{-21, 5, 8, 10}

In a data set of {5, 7, 8, 40} the range is 40-5=35.
The data set without the outlier of 40 is {5, 7, 8}, its range is 8-5=3.
The range decreased from 35 to 3 after removing the outlier.

In a data set {-21, 5, 8, 10} the range is 10–21=10+21=31.
The data set without the outlier of -21 is {5, 8, 10}, its range is 10-5=5.
The range decreased from 31 to 5 after removing the outlier.

The effect of an outlier on a standard deviation

The standard deviation is the measure of the overall spread of a data set values from the mean. Since the outlier is a value that is far from other values and the mean, its removal will reduce the spread of the data and the standard deviation.  

Consider the following example:

Calculate the standard deviation of the following data sets with and without the outlier:

{5, 7, 8, 40}

{-21, 5, 8, 10}

In a data set of {5, 7, 8, 40} the mean is 15. The sum of the distances from the mean is |40-15|+|8-15|+|7-15|+|5-15|=25+7+8+10=50.

The data set without the outlier of 40 is {5, 7, 8}, the mean is 6.67. The sum of the distances from the mean is |8-6.67|+|7-6.67|+|5-6.67|=1.33+0.33+1.67=3.33.

The sum of the distances from the mean reduced from 50 to 3.33, therefore the standard deviation of the data set significantly reduced.

In a data set {-21, 5, 8, 10} the mean is 0.5. The sum of the distances from the mean is |-21-0.5|+|5-0.5|+|8-0.5|+|10-0.5|=21.5+4.5+7.5+9.5=43.

The data set without the outlier of -21 is {5, 8, 10}, the mean is 7.67. The sum of the distances from the mean is |5-7.67|+|8-7.67|+|10-7.67|=2.67+0.33+2.33=5.33.

The sum of the distances from the mean reduced from 43 to 5.33, therefore the standard deviation of the data set significantly reduced.

You just finished studying center, spread and shape of distributions topic, the seventh topic of problem solving and data analysis subscore!

Continue studying the next problem solving and data analysis subscore topic- key features of graphs.