I’d be writing about ANOVA in this post, after my previous post on Skew & kurtosis. ANOVA is a technique to perform statistical intervention on one or more than two populations at same time to analyze the data effectively.

Though there are several other types in anova, I’d discuss about One-way anova in this post.

### What is Hypothesis?

A Hypothesis is a tentative statement about relationship between two or more variables. It is a specific, testable prediction about what do you expect to happen in your investigation.

### Types of Hypothesis

Following is the major types of statistical hypothesis.

H0: Null Hypothesis: It is usually hypothesis that sample the observations based on chance.

H1: Alternate Hypothesis: It is the hypothesis that sample observations are influenced by some non-random cause.

When we collect the air quality of Mumbai in 2016, a null hypothesis may be like this – there is no change in quality between second and third quarters of 2016. An alternate hypothesis H1 may be, the quality is poorer in third quarter of 2016.

### What is a Hypothesis testing?

Hypothesis testing is a process to prove or disprove the research question. By allowing an error of 5% or 1% (α alpha values), the researcher can conclude that result may be real if chance alone could produce the same result only 5% or 1% of the time or less.

Let’s take a research question..

Is the mean salary of an IT family in Bangalore equal to ₹ 40000?

we need to write this question in terms of null (H0) and alternate (HA) hypothesis.

The null hypothesis is, μ = ₹ 40000.

H0: μ = ₹ 40000

The alternative hypothesis is μ ≠ ₹ 40000

HA: μ ≠ ₹ 40000

HA suggests us that the salary may be lesser than 40000, or greater than 40000. We call this phenomenon as two-tailed tests.

### Type I, Type II errors

Generally we may ignore certain percentage of measurements, usually these are peak measurements. We call them as Type I error represented by α (alpha). Generally it would be 5% (0.05) or 1% (0.01)

A Type II error is used to identify the causes to reject a false null hypothesis. It is generally good not to ignore Type II errors.

### Calculating ANOVA – manual approach

Let’s take a simple data. Banks gives different loan amounts to entrepreneurs of three different investment slabs. It gives ₹10000, ₹15000 and ₹20000 respectively.

Following is the returns obtained by each slabs.

₹10,000.00 | ₹15,000.00 | ₹20,000.00 |

9 | 7 | 4 |

8 | 6 | 3 |

7 | 6 | 2 |

8 | 7 | 3 |

8 | 8 | 4 |

9 | 7 | 3 |

8 | 6 | 2 |

*1. Define the NULL (H0) and alternative (HA) hypothesis.*

H0: there is no difference between three conditions with loan scheme.

μ ₹10,000.00 = μ ₹15,000.00 = μ ₹20,000.00

H1: There is a difference. Not all μs are equal.

*2. Define the alpha value for type I error.*

α = 5% = 0.05.

*3. Determine the degree of freedom (DF)*

Number of samples N = 21

Number of groups/columns/levels/contitions a = 3

Number of rows n = 7

Degree of freedom between columns *df*_{between} = a – 1 = 3 – 1 =2.

Degree of freedom between rows *df*_{between} = N – a = 21 – 3 = 18.

Degree of freedom for all the data (total) *df*_{total} = N – 1 = 21 – 1 = 20.

*4. Decision rules*

To look up the critical value, two *df* values need to be used.

Look at the F table for the critical value using v1, v2, ie., (2, 18) (alpha = 0.05)

Next, Look at the statistical table for 2 in v1 column and 18 in V2 row. We find 3.555. —-(0)

So the rule is, if F (calculated value) is greater than 3.555 (F table value), reject the null hypothesis or else, accept the null hypothesis.

5. Calculate the statistics

₹10,000.00 | ₹15,000.00 | ₹20,000.00 | |||

Sample x |
Sample x ² |
Sample i |
Sample i² | Sample j |
Sample j² |

9 | 81 | 7 | 49 | 4 | 16 |

8 | 64 | 6 | 36 | 3 | 9 |

7 | 49 | 6 | 36 | 2 | 4 |

8 | 64 | 7 | 49 | 3 | 9 |

8 | 64 | 8 | 64 | 4 | 16 |

9 | 81 | 7 | 49 | 3 | 9 |

8 | 64 | 6 | 36 | 2 | 4 |

ΣTi = 57 | ΣTi = 47 | ΣTi = 21 | |||

Σx² = 467 | Σi² = 319 | Σj² = 67 |

Sum of all samples T = ΣTi = 57 + 47 + 21 = 125 ————- (1)

So 125 is the total sum of all samples.

ΣΣxij²= 467 + 319 + 67 = 853 ———– (2)

Using (1) and (2),

Q = ΣΣxij²-T²/N

Q = 853 – 125²/21

Q = 853 – 15625/21

Q = 853 – 744.04761904761904761904761904762

Q = 108.952380952381 ———————-(3)

Q1 = Σ(T_{i}²/n_{i})-T²/N

Q1 = (Σ(57 + 47 + 21)/7) – (125²/21)

Q1 = 98.6666666666666 —————–(4)

Q2 = Q – Q1

Q2 = (3) – (4)

Q2 = 10.2857142857143 —————-(5)

### Anova Table

Source of variations (SV) | Sum of squares (SS) | Degrees of freedom (df) | Mean Square (MS) | Variance Ratio (F) |

Between classes | Q1 | h-1 | Q1/h-1 | MS_{between}/MS_{within} |

Within Classes | Q2 | N-h | Q2/N-h | |

Total | Q | N-1 | — |

Lets substitute the values in the above table.

We already calculated the values for Q1, Q2 and Q above.

h is the number of columns, which is 3.

h = 3 —————(6)

So,

h-1 = 2 ———–(7)

N-h = 21-3 = 18 ———–(8)

N – 1 = 20 —————-(9)

MS_{between} = Q1/h1 = 98.67/2 = 49.335 ————(10)

MS_{within} = Q2/N-h = 10.29/18 = 0.57167 ————-(11)

Finally our F value would be,

F = MS_{between}/MS_{within}

F = 49.335/0.57167 = 86.2998 —————–(12)

Source of variations (SV) | Sum of squares (SS) | Degrees of freedom (df) | Mean Square (MS) | Variance Ratio (F) |

Between classes | 98.67 | 2 | 49.335 | 86.2998 |

Within Classes | 10.29 | 18 | 0.57167 | |

Total | 10.28 | 20 | — |

### Conclusion

Our calculated F value (variance ratio) is 86.2998

Statistical value is 3.55 as per (0)

Hence calculated F value is greater than statistical value.

F(2, 18) = 86.2998 ≠ 3.55. Hence our null hypothesis is rejected and alternate hypothesis is accepted. There is a difference between the loan schemes. Not all μs are equal.

Pingback: Calculating Anova with LibreOffice Calc | JavaShine

Pingback: Testing of Independence – Chi Square test – Manual, LibreOffice, R | JavaShine

Pingback: Analysis of Variance ANOVA using R | JavaShine

Pingback: Regression testing in R | JavaShine