I have written about correlation in my previous post Identifying the correlation coefficient using LibreOffice Calc. This correlation talks about effect of one variable on another. So we talk about relationship between known values.

In this post, we pay attention to predicting unknown values from known values, which is regression. The primary objective of regression analysis is to provide estimates of dependent variables from independent variables.

### Calculation of Regression Equation

We shall use the following methods to study about regression

- Algebraic Method
- Graphical Method

### Calculation of Regression using algebraic method

**Regression equation of X on Y** is given as below –

X_{e} = a+b_{y}

a, b = two unknown constants of the line.

a – level of the fitted line.

b – slope of the line.

X_{e} – value of X computed from the relationship for a given Y.

By the least square method, we shall find out the values of a and b to determine the regression line. This line is called the “line of best fit”.

ΣX = Na + bΣY

ΣXY = aΣY + bΣY^{2}

N is the number of observed pairs of value

Now – **the regression equation of Y on X**

Y_{e} = a+b_{x}

To compute the value of a & b, the formula is given as below.

ΣY = Na + bΣX

ΣXY = aΣX + bΣX^{2}

### Example

Lets take the following data set to work out.

**X** |
**Y** |

10 |
10 |

12 |
22 |

13 |
24 |

16 |
27 |

17 |
29 |

20 |
33 |

25 |
37 |

**113** |
**182** |

So, lets compute the values needed for our equation –

**X** |
**Y** |
**X**^{2} |
**XY** |

10 |
10 |
100 |
100 |

12 |
22 |
144 |
264 |

13 |
24 |
169 |
312 |

16 |
27 |
256 |
432 |

17 |
29 |
289 |
493 |

20 |
33 |
400 |
660 |

25 |
37 |
625 |
925 |

**ΣX = 113** |
**ΣY = 182** |
**ΣX**^{2} = 1983 |
**ΣXY = 3186** |

The regression equation of Y on X

Ye = a+bx

To compute the value of a & b, the formula is given as below.

ΣY = Na + bΣX

182 = 7a + b113

Rewriting the above equation –

113b + 7a = 182 —————-(1)

Substituting the values in the equation –

ΣXY = aΣX + bΣX^{2}

3186 = a113 + b1983

113a + 1983b = 3186 ——————(2)

Multiply equation (1) by 113 and equation 2 by 7.

113*113b + 113*7a = 113*182

12769b + 791a = 20566 ————–(3)

7*113a + 7*1983b = 7*3186

791a + 13881b = 22302 —————(4)

(3) – (4)

12769b + 791a = 20566

13881b + 791a = 22302 (-)

————————————-

-1112b=-1736

Removing the – on both sides,

1112b=1736

b = 1736/1112

b = 1.5611510791366906474820143884892

Substitute the value of b to find the value of a.

from (1) –

113b + 7a = 182

113*1.5611510791366906474820143884892 + 7a = 182

176.41007194244604316546762589928 + 7a = 182

7a = 182-176.41007194244604316546762589928

7a = 5.58992805755395683453237410072

a = 5.58992805755395683453237410072/7

a = 0.79856115107913669064748201438857

The equation of straight line is –

Ye = a+bx

Substituting the values of a and b –

**Ye = 0.79856115107913669064748201438857 +1.56x**

This is the regression equation of Y on X.

When I plot the graph of the given data set, It looks like this.

Using the regression value, I predict Y for the missing values of X such as 11, 14, 15 etc. Here is the complete table with actual and predicted values. Predicted values are given in different colour.

**X** |
**Y** |

10 |
10 |

11 |
17.9712230215827 |

12 |
22 |

13 |
24 |

14 |
22.6546762589928 |

15 |
24.2158273381295 |

16 |
27 |

17 |
29 |

18 |
28.8992805755396 |

19 |
30.4604316546763 |

20 |
33 |

21 |
33.5827338129496 |

22 |
35.1438848920863 |

23 |
36.705035971223 |

24 |
38.2661870503597 |

25 |
37 |

Let’s plot this table in scattered chart now. Here you go!

Similarly you may predict the value of X for the given Y.

In my next post, I’d write about how to compute regression using LibreOffice Calc. Bye until then!