I have written about different types of hypothesis testing in my previous posts.

- Testing of Independence – Chi Square test – Manual, LibreOffice, R
- Testing of difference – T test using R

This post discuss about how to calculate ANOVA using R. This post will not talk about anova, as I have written this in One way analysis of variance and Calculating Anova with LibreOffice Calc

So this is also similar to t test. The difference exists in in the type of variables used. As given in Testing of Independence – Chi Square test – Manual, LibreOffice, R blog post, T test would be used when two variables are categorical. In addition, T test uses only two groups.

We can use categorical (IV) and continuous DV variables in Anova. We may use more than 2 variables.

An increase in number of schools, leads to increase in educational expense of the Government. We call this as reasonable change.

An increase in number of schools, leads to increase school fees of each family. This is unfortunate, right. So this is unreasonable change.

So reasonable change is a change in independent variable has a change in dependent variable.

An unreasonable change is a change in dependent variable with no change in independent variable.

We would care about f ratio here.

f ratio = ratio between explained and unexplained variable.

f ratio = explained variance/unexplained variance

If f > 4, the variables have significant difference. Before making a decision, refer to p value.

Let me take the same data set used in my previous posts.

> sal id gender educ Designation Level Salary Last.drawn.salary Pre..Exp Ratings.by.interviewer 1 1 female UG Jr Engineer JLM 10000 1000 3 4 2 2 male DOCTORATE Chairman TLM 100000 100000 20 4 3 3 male DIPLOMA Jr HR JLM 6000 6000 1 3 4 4 male PG Engineer MLM 15000 15000 7 2 5 5 female PG Sr Engineer MLM 25000 25000 12 4 6 6 male DIPLOMA Jr Engineer JLM 6000 8000 1 1 7 7 male DIPLOMA Jr Associate JLM 8000 8000 2 4 8 8 female PG Engineer MLM 13000 13000 7 3 9 9 female PG Engineer MLM 14000 14000 7 2 10 10 female PG Engineer MLM 16000 16000 8 4 11 11 female UG Jr Engineer JLM 10000 1000 3 4 12 12 male DOCTORATE Chairman TLM 100000 100000 20 4 13 13 male DIPLOMA Jr HR JLM 6000 6000 1 3 14 14 male PG Engineer MLM 15000 15000 7 2 15 15 female PG Sr Engineer MLM 25000 25000 12 4 16 16 male DIPLOMA Jr Engineer JLM 6000 8000 1 1 17 17 male DIPLOMA Jr Associate JLM 8000 8000 2 4 18 18 female PG Engineer MLM 13000 13000 7 3 19 19 female PG Engineer MLM 14000 14000 7 2 20 20 female PG Engineer MLM 16000 16000 8 4 21 21 female PG Sr Engineer MLM 25000 25000 12 4 22 22 male DIPLOMA Jr Engineer JLM 6000 8000 1 1 23 23 male DIPLOMA Jr Associate JLM 8000 8000 2 4 24 24 female PG Engineer MLM 13000 13000 7 3 25 25 female PG Engineer MLM 14000 14000 7 2 26 26 female PG Engineer MLM 16000 16000 8 4 27 27 female UG Jr Engineer JLM 10000 1000 3 4 28 28 male DOCTORATE Chairman TLM 100000 100000 20 4 29 29 male DIPLOMA Jr HR JLM 6000 6000 1 3 30 30 male PG Engineer MLM 15000 15000 7 2 31 31 female PG Sr Engineer MLM 25000 25000 12 4 32 32 female PG Sr Engineer MLM 25000 25000 12 4 33 33 male DIPLOMA Jr Engineer JLM 6000 8000 1 1 34 34 male DIPLOMA Jr Associate JLM 8000 8000 2 4 35 35 female PG Engineer MLM 13000 13000 7 3 36 36 female PG Engineer MLM 14000 14000 7 2 37 37 female PG Engineer MLM 16000 16000 8 4 38 38 female UG Jr Engineer JLM 10000 1000 3 4 39 39 male DOCTORATE Chairman TLM 100000 100000 20 4 40 40 male DIPLOMA Jr HR JLM 6000 6000 1 3 41 41 male PG Engineer MLM 15000 15000 7 2 42 42 female PG Sr Engineer MLM 25000 25000 12 4 43 43 male DIPLOMA Jr Engineer JLM 6000 8000 1 1 44 44 male DIPLOMA Jr Associate JLM 8000 8000 2 4 45 45 female PG Engineer MLM 13000 13000 7 3 46 46 female PG Engineer MLM 16000 16000 8 4 47 47 female UG Jr Engineer JLM 10000 1000 3 4 48 48 male DOCTORATE Chairman TLM 100000 100000 20 4 49 49 male DIPLOMA Jr HR JLM 6000 6000 1 3 50 50 male PG Engineer MLM 15000 15000 7 2

Lets take education qualification and salary drawn. Usually higher the educational qualification, higher the salary.

> aggregate(Salary~educ, mean, data=sal) educ Salary 1 DIPLOMA 6666.667 2 DOCTORATE 100000.000 3 PG 17040.000 4 UG 10000.000

Obviously there is change. When diploma holder gets 6666 rupees, a doctorate gets 100000. Let’s check the analysis of variance aov() now.

> aov1 <- aov(Salary~educ, data=sal) > aov1 Call: aov(formula = Salary ~ educ, data = sal) Terms: educ Residuals Sum of Squares 35270186667 538293333 Deg. of Freedom 3 46 Residual standard error: 3420.823 Estimated effects may be unbalanced

Let’s look at the mean salary and anova summary below

> aggregate(Salary~educ, mean, data=sal) educ Salary 1 DIPLOMA 6666.667 2 DOCTORATE 100000.000 3 PG 17040.000 4 UG 10000.000

> summary(aov1) Df Sum Sq Mean Sq F value Pr(>F) educ 3 3.527e+10 1.176e+10 1005 <2e-16 *** Residuals 46 5.383e+08 1.170e+07 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

F value is **1005**. Our threshold is 4. We are getting very high value.

p value *** is <0.001. It is lesser than standard alpha 5% which is 0.05.

So change is salary is significant with the change in education. This data has significant difference.

The first row educ indicates explained variance. The second row, residuals, indicates unexplained variance.

The *** indicates that our model is not only significant for 0.05, but it is significant even at 0.001. Out of 1000, we may reject 1 time.