Analysis of Variance ANOVA using R

I have written about different types of hypothesis testing in my previous posts.

This post discuss about how to calculate ANOVA using R. This post will not talk about anova, as I have written this in One way analysis of variance and Calculating Anova with LibreOffice Calc

So this is also similar to t test. The difference exists in in the type of variables used. As given in Testing of Independence – Chi Square test – Manual, LibreOffice, R blog post, T test would be used when two variables are categorical. In addition, T test uses only two groups.

We can use categorical (IV) and continuous DV variables in Anova. We may use more than 2 variables.

An increase in number of schools, leads to increase in educational expense of the Government. We call this as reasonable change.

An increase in number of schools, leads to increase school fees of each family. This is unfortunate, right. So this is unreasonable change.

So reasonable change is a change in independent variable has a change in dependent variable.

An unreasonable change is a change in dependent variable with no change in independent variable.

We would care about f ratio here.

f ratio = ratio between explained and unexplained variable.

f ratio = explained variance/unexplained variance

If f > 4, the variables have significant difference. Before making a decision, refer to p value.

Let me take the same data set used in my previous posts.

```> sal
id gender      educ  Designation Level Salary Last.drawn.salary Pre..Exp Ratings.by.interviewer
1   1 female        UG  Jr Engineer   JLM  10000              1000        3                      4
2   2   male DOCTORATE     Chairman   TLM 100000            100000       20                      4
3   3   male   DIPLOMA        Jr HR   JLM   6000              6000        1                      3
4   4   male        PG     Engineer   MLM  15000             15000        7                      2
5   5 female        PG  Sr Engineer   MLM  25000             25000       12                      4
6   6   male   DIPLOMA  Jr Engineer   JLM   6000              8000        1                      1
7   7   male   DIPLOMA Jr Associate   JLM   8000              8000        2                      4
8   8 female        PG     Engineer   MLM  13000             13000        7                      3
9   9 female        PG     Engineer   MLM  14000             14000        7                      2
10 10 female        PG     Engineer   MLM  16000             16000        8                      4
11 11 female        UG  Jr Engineer   JLM  10000              1000        3                      4
12 12   male DOCTORATE     Chairman   TLM 100000            100000       20                      4
13 13   male   DIPLOMA        Jr HR   JLM   6000              6000        1                      3
14 14   male        PG     Engineer   MLM  15000             15000        7                      2
15 15 female        PG  Sr Engineer   MLM  25000             25000       12                      4
16 16   male   DIPLOMA  Jr Engineer   JLM   6000              8000        1                      1
17 17   male   DIPLOMA Jr Associate   JLM   8000              8000        2                      4
18 18 female        PG     Engineer   MLM  13000             13000        7                      3
19 19 female        PG     Engineer   MLM  14000             14000        7                      2
20 20 female        PG     Engineer   MLM  16000             16000        8                      4
21 21 female        PG  Sr Engineer   MLM  25000             25000       12                      4
22 22   male   DIPLOMA  Jr Engineer   JLM   6000              8000        1                      1
23 23   male   DIPLOMA Jr Associate   JLM   8000              8000        2                      4
24 24 female        PG     Engineer   MLM  13000             13000        7                      3
25 25 female        PG     Engineer   MLM  14000             14000        7                      2
26 26 female        PG     Engineer   MLM  16000             16000        8                      4
27 27 female        UG  Jr Engineer   JLM  10000              1000        3                      4
28 28   male DOCTORATE     Chairman   TLM 100000            100000       20                      4
29 29   male   DIPLOMA        Jr HR   JLM   6000              6000        1                      3
30 30   male        PG     Engineer   MLM  15000             15000        7                      2
31 31 female        PG  Sr Engineer   MLM  25000             25000       12                      4
32 32 female        PG  Sr Engineer   MLM  25000             25000       12                      4
33 33   male   DIPLOMA  Jr Engineer   JLM   6000              8000        1                      1
34 34   male   DIPLOMA Jr Associate   JLM   8000              8000        2                      4
35 35 female        PG     Engineer   MLM  13000             13000        7                      3
36 36 female        PG     Engineer   MLM  14000             14000        7                      2
37 37 female        PG     Engineer   MLM  16000             16000        8                      4
38 38 female        UG  Jr Engineer   JLM  10000              1000        3                      4
39 39   male DOCTORATE     Chairman   TLM 100000            100000       20                      4
40 40   male   DIPLOMA        Jr HR   JLM   6000              6000        1                      3
41 41   male        PG     Engineer   MLM  15000             15000        7                      2
42 42 female        PG  Sr Engineer   MLM  25000             25000       12                      4
43 43   male   DIPLOMA  Jr Engineer   JLM   6000              8000        1                      1
44 44   male   DIPLOMA Jr Associate   JLM   8000              8000        2                      4
45 45 female        PG     Engineer   MLM  13000             13000        7                      3
46 46 female        PG     Engineer   MLM  16000             16000        8                      4
47 47 female        UG  Jr Engineer   JLM  10000              1000        3                      4
48 48   male DOCTORATE     Chairman   TLM 100000            100000       20                      4
49 49   male   DIPLOMA        Jr HR   JLM   6000              6000        1                      3
50 50   male        PG     Engineer   MLM  15000             15000        7                      2
```

Lets take education qualification and salary drawn. Usually higher the educational qualification, higher the salary.

```> aggregate(Salary~educ, mean, data=sal)
educ     Salary
1   DIPLOMA   6666.667
2 DOCTORATE 100000.000
3        PG  17040.000
4        UG  10000.000
```

Obviously there is change. When diploma holder gets 6666 rupees, a doctorate gets 100000. Let’s check the analysis of variance aov() now.

```> aov1 <- aov(Salary~educ, data=sal)
> aov1
Call:
aov(formula = Salary ~ educ, data = sal)

Terms:
educ   Residuals
Sum of Squares  35270186667   538293333
Deg. of Freedom           3          46

Residual standard error: 3420.823
Estimated effects may be unbalanced
```

Let’s look at the mean salary and anova summary below

```> aggregate(Salary~educ, mean, data=sal)
educ     Salary
1   DIPLOMA   6666.667
2 DOCTORATE 100000.000
3        PG  17040.000
4        UG  10000.000
```
```> summary(aov1)
Df    Sum Sq   Mean Sq F value Pr(>F)
educ         3 3.527e+10 1.176e+10    1005 <2e-16 ***
Residuals   46 5.383e+08 1.170e+07
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

F value is 1005. Our threshold is 4. We are getting very high value.
p value *** is <0.001. It is lesser than standard alpha 5% which is 0.05.

So change is salary is significant with the change in education. This data has significant difference.

The first row educ indicates explained variance. The second row, residuals, indicates unexplained variance.

The *** indicates that our model is not only significant for 0.05, but it is significant even at 0.001. Out of 1000, we may reject 1 time.