 # One way analysis of variance

I’d be writing about ANOVA in this post, after my previous post on Skew & kurtosis. ANOVA is a technique to perform statistical intervention on one or more than two populations at same time to analyze the data effectively.

Though there are several other types in anova, I’d discuss about One-way anova in this post.

### What is Hypothesis?

A Hypothesis is a tentative statement about relationship between two or more variables. It is a specific, testable prediction about what do you expect to happen in your investigation.

### Types of Hypothesis

Following is the major types of statistical hypothesis.

H0: Null Hypothesis: It is usually hypothesis that sample the observations based on chance.

H1: Alternate Hypothesis: It is the hypothesis that sample observations are influenced by some non-random cause.

When we collect the air quality of Mumbai in 2016, a null hypothesis may be like this – there is no change in quality between second and third quarters of 2016. An alternate hypothesis H1 may be, the quality is poorer in third quarter of 2016.

### What is a Hypothesis testing?

Hypothesis testing is a process to prove or disprove the research question. By allowing an error of 5% or 1% (α alpha values), the researcher can conclude that result may be real if chance alone could produce the same result only 5% or 1% of the time or less.

Let’s take a research question..

Is the mean salary of an IT family in Bangalore equal to ₹ 40000?

we need to write this question in terms of null (H0) and alternate (HA) hypothesis.

The null hypothesis is, μ = ₹ 40000.

H0: μ = ₹ 40000

The alternative hypothesis is μ ₹ 40000

HA: μ ₹ 40000

HA suggests us that the salary may be lesser than 40000, or greater than 40000. We call this phenomenon as two-tailed tests.

### Type I, Type II errors

Generally we may ignore certain percentage of measurements, usually these are peak measurements. We call them as Type I error represented by α (alpha). Generally it would be 5% (0.05) or 1% (0.01)

A Type II error is used to identify the causes to reject a false null hypothesis. It is generally good not to ignore Type II errors.

### Calculating ANOVA – manual approach

Let’s take a simple data. Banks gives different loan amounts to entrepreneurs of three different investment slabs. It gives ₹10000, ₹15000 and ₹20000 respectively.

Following is the returns obtained by each slabs.

 ₹10,000.00 ₹15,000.00 ₹20,000.00 9 7 4 8 6 3 7 6 2 8 7 3 8 8 4 9 7 3 8 6 2

1. Define the NULL (H0) and alternative (HA) hypothesis.

H0: there is no difference between three conditions with loan scheme.

μ ₹10,000.00 = μ ₹15,000.00 = μ ₹20,000.00

H1: There is a difference. Not all μs are equal.

2. Define the alpha value for type I error.

α = 5% = 0.05.

3. Determine the degree of freedom (DF)

Number of samples N = 21

Number of groups/columns/levels/contitions a  = 3

Number of rows n = 7

Degree of freedom between columns dfbetween = a – 1 = 3 – 1  =2.

Degree of freedom between rows dfbetween = N – a = 21 – 3 = 18.

Degree of freedom for all the data (total) dftotal = N – 1 = 21 – 1 = 20.

4. Decision rules

To look up the critical value, two df values need to be used.

Look at the F table for the critical value using v1, v2, ie., (2, 18) (alpha = 0.05)

Next, Look at the statistical table for 2 in v1 column and 18 in V2 row. We find 3.555. —-(0) So the rule is, if F (calculated value) is greater than 3.555 (F table value), reject the null hypothesis or else, accept the null hypothesis.

5. Calculate the statistics

 ₹10,000.00 ₹15,000.00 ₹20,000.00 Sample x Sample x ² Sample i Sample i² Sample j Sample j² 9 81 7 49 4 16 8 64 6 36 3 9 7 49 6 36 2 4 8 64 7 49 3 9 8 64 8 64 4 16 9 81 7 49 3 9 8 64 6 36 2 4 ΣTi = 57 ΣTi = 47 ΣTi = 21 Σx² = 467 Σi² = 319 Σj² = 67

Sum of all samples T = ΣTi = 57 + 47 + 21 = 125 ————- (1)

So 125 is the total sum of all samples.

ΣΣxij²= 467 + 319 + 67 = 853 ———– (2)

Using (1) and (2),

Q = ΣΣxij²-T²/N

Q = 853 – 125²/21

Q = 853 – 15625/21

Q = 853 – 744.04761904761904761904761904762

Q = 108.952380952381 ———————-(3)

Q1 = Σ(Ti²/ni)-T²/N

Q1 = (Σ(57 + 47 + 21)/7) – (125²/21)

Q1 = 98.6666666666666 —————–(4)

Q2 = Q – Q1

Q2 = (3) – (4)

Q2 = 10.2857142857143 —————-(5)

### Anova Table

 Source of variations (SV) Sum of squares (SS) Degrees of freedom (df) Mean Square (MS) Variance Ratio (F) Between classes Q1 h-1 Q1/h-1 MSbetween/MSwithin Within Classes Q2 N-h Q2/N-h Total Q N-1 —

Lets substitute the values in the above table.

We already calculated the values for Q1, Q2 and Q above.

h is the number of columns, which is 3.

h = 3 —————(6)

So,

h-1 = 2 ———–(7)

N-h = 21-3 = 18 ———–(8)

N – 1 = 20 —————-(9)

MSbetween = Q1/h1 = 98.67/2 = 49.335 ————(10)

MSwithin = Q2/N-h = 10.29/18 = 0.57167 ————-(11)

Finally our F value would be,

F = MSbetween/MSwithin

F = 49.335/0.57167 = 86.2998 —————–(12)

 Source of variations (SV) Sum of squares (SS) Degrees of freedom (df) Mean Square (MS) Variance Ratio (F) Between classes 98.67 2 49.335 86.2998 Within Classes 10.29 18 0.57167 Total 10.28 20 —

### Conclusion

Our calculated F value (variance ratio) is 86.2998

Statistical value is 3.55 as per (0)

Hence calculated F value is greater than statistical value.

F(2, 18) = 86.2998 3.55. Hence our null hypothesis is rejected and alternate hypothesis is accepted. There is a difference between the loan schemes. Not all μs are equal.