# Linear Regression

I have written about correlation in my previous post Identifying the correlation coefficient using LibreOffice Calc. This correlation talks about effect of one variable on another. So we talk about relationship between known values.

In this post, we pay attention to predicting unknown values from known values, which is regression. The primary objective of regression analysis is to provide estimates of dependent variables from independent variables.

### Calculation of Regression Equation

We shall use the following methods to study about regression

• Algebraic Method
• Graphical Method

### Calculation of Regression using algebraic method

Regression equation of X on Y is given as below –

Xe = a+by

a, b = two unknown constants of the line.

a – level of the fitted line.

b – slope of the line.

Xe – value of X computed from the relationship for a given Y.

By the least square method, we shall find out the values of a and b to determine the regression line. This line is called the “line of best fit”.

ΣX = Na + bΣY

ΣXY = aΣY + bΣY2

N is the number of observed pairs of value

Now – the regression equation of Y on X

Ye = a+bx

To compute the value of a & b, the formula is given as below.

ΣY = Na + bΣX

ΣXY = aΣX + bΣX2

### Example

Lets take the following  data set to work out.

 X Y 10 10 12 22 13 24 16 27 17 29 20 33 25 37 113 182

So, lets compute the values needed for our equation –

 X Y X2 XY 10 10 100 100 12 22 144 264 13 24 169 312 16 27 256 432 17 29 289 493 20 33 400 660 25 37 625 925 ΣX = 113 ΣY = 182 ΣX2 = 1983 ΣXY = 3186

The regression equation of Y on X

Ye = a+bx

To compute the value of a & b, the formula is given as below.

ΣY = Na + bΣX

182 = 7a + b113

Rewriting the above equation –

113b + 7a = 182 —————-(1)

Substituting the values in the equation –

ΣXY = aΣX + bΣX2

3186 = a113 + b1983

113a + 1983b = 3186 ——————(2)

Multiply equation (1) by 113 and equation 2 by 7.

113*113b + 113*7a = 113*182

12769b + 791a = 20566 ————–(3)

7*113a + 7*1983b = 7*3186

791a + 13881b = 22302 —————(4)

(3) – (4)

12769b + 791a = 20566

13881b + 791a = 22302 (-)
————————————-
-1112b=-1736

Removing the – on both sides,

1112b=1736

b = 1736/1112

b = 1.5611510791366906474820143884892

Substitute the value of b to find the value of a.

from (1) –

113b + 7a = 182

113*1.5611510791366906474820143884892 + 7a = 182

176.41007194244604316546762589928 + 7a = 182

7a = 182-176.41007194244604316546762589928

7a = 5.58992805755395683453237410072

a = 5.58992805755395683453237410072/7

a = 0.79856115107913669064748201438857

The equation of straight line is –

Ye = a+bx

Substituting the values of a and b –

Ye = 0.79856115107913669064748201438857 +1.56x

This is the regression equation of Y on X.

When I plot the graph of the given data set, It looks like this.

Using the regression value, I predict Y for the missing values of X such as 11, 14, 15 etc. Here is the complete table with actual and predicted values. Predicted values are given in different colour.

 X Y 10 10 11 17.9712230215827 12 22 13 24 14 22.6546762589928 15 24.2158273381295 16 27 17 29 18 28.8992805755396 19 30.4604316546763 20 33 21 33.5827338129496 22 35.1438848920863 23 36.705035971223 24 38.2661870503597 25 37

Let’s plot this table in scattered chart now. Here you go!

Similarly you may predict the value of X for the given Y.

In my next post, I’d write about how to compute regression using LibreOffice Calc. Bye until then!