linear line of best fit

1 post

Scatterplots

Scatterplots on the SAT test

SAT Subscore: Problem solving and data analysis

Before learning this subject, make sure you master the subjects of linear functions and graphing quadratic functions.

A scatterplot is a graphic representation of a data set of observations (each observation includes x and y variables and is represented by a dot in the xy plane). The purpose of the scatterplot is to visualize the relationship between the variables x and y to determine if there are patterns or correlations between the two variables.

Scatterplots can visualize the following features of the relationships:
The type of the correlation between the variables– positive or negative correlation or no correlation.
The type of the data pattern- linear (straight) or nonlinear (curved).
The strength of the relationship between the variables– strong or weak correlation.
Unusual features in the data.

The line of best fit can be drawn through the area of the dots, the line of best fit represents the trend of the relationship between the two variables.
A line of best fit may be a straight (linear) line or a curved (parabolic) line.
If the line of best fit is straight (linear), it has a slope that represents the rate of the change and an intercept that represents the initial value.
The line of best fit can be used to predict y values that are not included in the data set. The prediction can be done in 2 ways: looking at the graph of the line of best fit or calculating the values from the equation of the line of best fit.
Calculating differencies between the given value and the predicted value: For any given x value, we can calculate the difference between its y value given in the data set (represented by the dot on the scatterplot) and the y value predicted by the line of best fit.
To calculate estimated change in y values, multiply the slope value of the line of best fit equation by the given amount of the change.

Continue reading this page for detailed explanations and examples.

Scatterplot and line of best fit definitions

A scatterplot (also called a scatter diagram) is a graphic representation of a data set of observations. Each observation in the data set includes 2 variables (x,y) and represented as a dot on the scatterplot on x and y axes (the independent variable is plotted along the x axis and the dependent variable along the y axis).

The line of best fit can be drawn through the area of the dots, this is the line that best represents the dots of the scatterplot. Note that the line of best fit can be drawn only if there is a connection between x and y variables.

The purpose of the scatterplot is to visualize the relationship between the variables x and y to determine if there are patterns or correlations between the two variables. If there is a connection between x and y variables, draw a line of best fit through the area of the points to see the type of the connection.

The difference between the line of best fit versus a line (linear) graph: the line of best fit represents the trend of the relationship between the two variables and not joining the dots together like on a linear graph. In other words: all the dots of the linear graph appear on the graph in comparison to the dots of the line of best fit that appear near the graph (some of them may appear on the graph).

The relationship types represented by scatterplots

  • To analyze the relationship, it is recommended to sketch a line between the dots and look at the line instead of the dots.
  • When you interpret a scatterplot, look at the data as you go from left to right.

Scatterplots can visualize the following features of the relationships:

1. The type of the correlation between the variables- positive or negative correlation or no correlation:
In positive correlation, as one variable increases so does the other. In a positive correlation the slope of the line is positive.
In negative correlation, as one variable increases the other decreases. In a negative correlation the slope of the line is negative.

2. The type of the data pattern- linear (straight) or nonlinear (curved):
A linear correlation can be graphed as a straight line in the xy-plane. The slope of a linear line is constant all the way along the line.
A curved correlation can be graphed as a smooth line that changes its direction at least once. The slope of a curved line is constantly changing.

3. The strength of the relationship between the variables:
The more concentrated the dots are along the line or the curve, the stronger the relationship. In other words: if the points are close to the line or the curve, the relationship is considered as strong.

4. Unusual features in the data, such as gaps in the data set.

The graph below represents a linear scatterplot with a negative strong connection between the variables x and y.

Scatterplot- linear, negative, strong connection

In the above scatterplot:

  • The connection can be graphed as a straight line of best fit, therefore the connection is linear.
  • The dots are located near the line, therefore the connection between the variables x and y is strong.
  • The slope of the line is negative, therefore the connection between the variables x and y is negative (as x increases y decreases).

The graph below represents a curved scatterplot with a weak connection between the variables x and y. The slope changes its direction after the vertex point from negative to positive (look at the data as you go from left to right).

a curved scatterplot with a weak connection between the variables x and y.

In the above scatterplot:

  • The connection can be graphed as a parabolic line of best fit, therefore the connection is curved.
  • The dots are located far from the line, therefore the connection between the variables x and y is weak.
  • The slope changes its direction after the vertex point from negative to positive (look at the data as you go from left to right).

The line of best fit- properties and purpose

The line of best fit (also called the trend line) is drawn through the area of the dots, this is the line that best represents the positions of the dots of the scatterplot in the xy plane.

The properties of the line of best fit

A line of best fit can be drawn through the area of the dots only if there is a connection between x and y variables.

A line of best fit may be a straight (linear) line or a curved (parabolic) line, depending on how the dots are arranged on the x y plane.

The points on the line of best fit represent a trend of the connection and not specific observations (unless the observation represented by a dot is locates on the line).

If the line of best fit is straight (linear), it has a slope that represents the rate of the change and an intercept that represents the initial value.     

The purpose of the line of best fit

  1. The line of best fit helps us to identify the type of the connection between x and y variables. (It is possible to identify the connection by analyzing the dots without the line but analyzing the line will be easier and clearer).
  2. The line of best fit estimates the value of y for any specified value of x. This is very useful for predicting y values of x values that are not given in the data set.
  3. We can calculate estimated changes in y values using the slope value from the line  of best fit equation.

Estimating the function of the line of best fit

Some questions may ask you to choose the function of the line of best fit from 4 given functions.
Before learning this subject, make sure you master the subjects of linear functions and graphing quadratic functions.

Estimating a linear function in a form of f(x)=mx+b:
1. Sketch a straight line that fits the data (that line should continue so it intercepts the y axis).
2. Estimate the y intercept of the line- this is the value of the parameter b in the function.
3. Estimate the slope of the line- this is the value of the parameter m in the function.
The slope sign can be seen from the direction of the line (increasing line has a positive slope and decreasing line has a negative slope).
The slope value can be estimated from 2 dots on the graph. The slope formula is the difference in y values divided by the difference in x values of any 2 points on the line.

Estimating a quadratic function in a form of f(x)=ax2+bx+c:
1. Sketch a parabola that fits the data.
2. Estimate the y intercept of the parabola- this is the value of the parameter c in the function.
3. Identify the vertex sign (positive for minimum or negative for maximum)- this is the sign of the parameter a.

Consider the following example:

According to the graph shown below, which of the following best models the line of best fit?

A f(x)=-0.5x2+5x+10
B f(x)= 0.5x2+2x+10
C f(x)=-2x2+5x+10
D f(x)= 0.5x2+2x+15

finding a quadratic function of the line of best fit

We can see from the given graph of the parabola that the y intercept is y=10, therefore the answer D in not correct.
We can see from the given graph of the parabola that the parabola has a minimum point, therefore the sign of x2 parameter a is positive. The answers A and C are not correct.
The correct answer is B: f(x)= 0.5x2+2x+10. In this answer the intercept c=10 and the x2 parameter a=0.5 best fit the graph of the parabola.

Predicting values using the line of best fit

The prediction can be done in 2 ways:

1. Look at the x value on the x axis and find the corresponding y value on the line of best fit graph. If the predicted x value lies beyond the shown line, we can extend the best fit line to see the predicted value.

2. Calculate the y value from the equation of the line of best fit.

Consider the following example:

The scatterplot below shows a data from a sample. The equation of the line of best fit is y=-0.5x+5 and it is marked in red on the xy plane below.

What is the predicted value of y for x=0 and x=3?

Finding the corresponding y values on the line of best fit graph:
We need to extend the line of best fit, see the orange line on the xy plane above.
For x=0 we see that the y value is y=5.
For x=3 we see that the y value is y=3.5.

Calculating the y value from the equation of the line of best fit:
y(x=0)=-0.5*0+5=5
y(x=3)=-0.5*3+5=-1.5+5=3.5

Calculating the difference between the actual data and the line of best fit

For any given x value, we can calculate the difference between its y value given in the data set (represented by the dot on the scatterplot) and the y value predicted by the line of best fit.

The given y value can be seen from the dot of the scatterplot or from the data table (if given).

The predicted y value can be seen from the line of best fit graph or calculated from the equation of the line of best fit.

To represent the difference as a distance, calculate a positive value (write the bigger value first and then subtract the smaller value).

Consider the following example:

The scatterplot below shows a data from a sample. The equation of the line of best fit is y=-0.5x+5.

What is the difference in y values between the data points and the line of best fit for x=1 and x=3?

The difference between the actual data and the line of best fit

The equation of the line of best fit is y=-0.5x+5.

The predicted y values calculated from the equation of the line of best fit are:
y(x=1)=-0.5*1+5=-0.5+5=4.5
y(x=3)=-0.5*3+5=-1.5+5=3.5
Note that we can also look at the line of best fit instead of making calculations.

The actual y values from looking at the dots of scatterplot are:
For x=1 we see that y=4
For x=3 we see that y=4
The positive difference for x=1 is 4.5-4=0.5
The positive difference for x=3 is 4-3.5=0.5

Calculating estimated changes using the best fit line equation

To calculate estimated change in y values, multiply the slope value of the line of best fit equation by the given amount of the change.

Note that we don’t need to include the constant in the calculation of the change, since it doesn’t affect the change value (the constant value is included in the y values before and after the change and therefore is cancelled in the change calculation).

Consider the following example:

The line of best fit is represented by the equation y=0.1x+14.

 If x decreases by 500, what is the estimated change in y according to the line of best fit?

0.1*-500=-50 is the decrease in y if x decreases by 500.

Note that we don’t need to include the constant 14 in the calculation of the change, since it doesn’t affect the change value. The constant value (14) is included in the y values before and after the change and therefore is cancelled.

For example:
For x=1,000 we get y=0.1*1000+14=100+14=114
For x=500 we get y=0.1*500+14=50+14=64
The change from x=1,000 to x=500 is 50+14-(100+14)= 50+14-100-14= -50 the number 14 is cancelled.