Testing of Independence – Chi Square test – Manual, LibreOffice, R

Hi,

I have written about testing of hypothesis in my earlier posts

Statisticians recommended right testing approaches for different type of data.

When we have –

  • both data as categorical, we shall use Chi Square Test
  • Continuous and Continuous data, we shall use correlation
  • Categorical and Continuous data, we shall use t test or anova.

In this post, I’d be using the below given data set.

   id gender      educ  Designation Level Salary Last.drawn.salary Pre..Exp Ratings.by.interviewer
    1 female        UG  Jr Engineer   JLM  10000              1000        3                      4
    2   male DOCTORATE     Chairman   TLM 100000            100000       20                      4
    3   male   DIPLOMA        Jr HR   JLM   6000              6000        1                      3
    4   male        PG     Engineer   MLM  15000             15000        7                      2
    5 female        PG  Sr Engineer   MLM  25000             25000       12                      4
    6   male   DIPLOMA  Jr Engineer   JLM   6000              8000        1                      1
    7   male   DIPLOMA Jr Associate   JLM   8000              8000        2                      4
    8 female        PG     Engineer   MLM  13000             13000        7                      3
    9 female        PG     Engineer   MLM  14000             14000        7                      2
   10 female        PG     Engineer   MLM  16000             16000        8                      4
   11 female        UG  Jr Engineer   JLM  10000              1000        3                      4
   12   male DOCTORATE     Chairman   TLM 100000            100000       20                      4
   13   male   DIPLOMA        Jr HR   JLM   6000              6000        1                      3
   14   male        PG     Engineer   MLM  15000             15000        7                      2
   15 female        PG  Sr Engineer   MLM  25000             25000       12                      4
   16   male   DIPLOMA  Jr Engineer   JLM   6000              8000        1                      1
   17   male   DIPLOMA Jr Associate   JLM   8000              8000        2                      4
   18 female        PG     Engineer   MLM  13000             13000        7                      3
   19 female        PG     Engineer   MLM  14000             14000        7                      2
   20 female        PG     Engineer   MLM  16000             16000        8                      4
   21 female        PG  Sr Engineer   MLM  25000             25000       12                      4
   22   male   DIPLOMA  Jr Engineer   JLM   6000              8000        1                      1
   23   male   DIPLOMA Jr Associate   JLM   8000              8000        2                      4
   24 female        PG     Engineer   MLM  13000             13000        7                      3
   25 female        PG     Engineer   MLM  14000             14000        7                      2
   26 female        PG     Engineer   MLM  16000             16000        8                      4
   27 female        UG  Jr Engineer   JLM  10000              1000        3                      4
   28   male DOCTORATE     Chairman   TLM 100000            100000       20                      4
   29   male   DIPLOMA        Jr HR   JLM   6000              6000        1                      3
   30   male        PG     Engineer   MLM  15000             15000        7                      2
   31 female        PG  Sr Engineer   MLM  25000             25000       12                      4
   32 female        PG  Sr Engineer   MLM  25000             25000       12                      4
   33   male   DIPLOMA  Jr Engineer   JLM   6000              8000        1                      1
   34   male   DIPLOMA Jr Associate   JLM   8000              8000        2                      4
   35 female        PG     Engineer   MLM  13000             13000        7                      3
   36 female        PG     Engineer   MLM  14000             14000        7                      2
   37 female        PG     Engineer   MLM  16000             16000        8                      4
   38 female        UG  Jr Engineer   JLM  10000              1000        3                      4
   39   male DOCTORATE     Chairman   TLM 100000            100000       20                      4
   40   male   DIPLOMA        Jr HR   JLM   6000              6000        1                      3
   41   male        PG     Engineer   MLM  15000             15000        7                      2
   42 female        PG  Sr Engineer   MLM  25000             25000       12                      4
   43   male   DIPLOMA  Jr Engineer   JLM   6000              8000        1                      1
   44   male   DIPLOMA Jr Associate   JLM   8000              8000        2                      4
   45 female        PG     Engineer   MLM  13000             13000        7                      3
   46 female        PG     Engineer   MLM  16000             16000        8                      4
   47 female        UG  Jr Engineer   JLM  10000              1000        3                      4
   48   male DOCTORATE     Chairman   TLM 100000            100000       20                      4
   49   male   DIPLOMA        Jr HR   JLM   6000              6000        1                      3
   50   male        PG     Engineer   MLM  15000             15000        7                      2

We shall use chi square test for two types of hypothesis testing

  • test of independence of variables
  • test goodness of fit

Testing of independence

We can find out the association between two (at least) categorical variables. Higher the chi square value, better the result is. We shall use this to test our hypothesis.

Goodness of fit

When we use chi square test to find the goodness of fit, we shall use 2 categorical variables. higher the chi square value, better the result is. We shall use this to test BLR, SEM tests.

Example for Testing of independence

This post talks about testing of independence. We have employee data given above. Following are my hypothesis.

H0 = Number of female employees and level of management are not related.

H1 = Number of female employees and level of management are related.

We would solve this using three methods

  1. Manual way of chi square test
  2. Chi square test with LibreOffice Calc
  3. Chi square test with R

Manual way of chi square test

We prepare the count of female employees in each level as given below. I have used COUNTIFS() function of LibreOffice.

chi square libre office 01

 

Calculate the row (highlighted in pink colour) and column sums (blue colour) and summation of all row sums (saffron colour).

chi square libre office 02

 

The values are called observed values. We shall find out the expected values as well easily as given below.

chi square libre office 03

Expected value = column sum x row sum/sum of rowsum

=J15*N12/N15 = 25 x 20/50 = 10

 

Finally our table looks like this.

chi square libre office 04

 

All the observed values (O), Expected values (E) are substituted in the below table. We calculate the Chi square value χ2 which is 19.

O E O-E (O-E)2 (O-E)2/E
5 10 -5 25 2.5
20 12.5 7.5 56.25 4.5
0 2.5 -2.5 6.25 2.5
15 10 5 25 2.5
5 12.5 -7.5 56.25 4.5
5 2.5 2.5 6.25 2.5
χ2 19

 

Level of significance or Type 1 error = 5%, which is 0.05

Degrees of freedom = (row count – 1) x (column count – 1) = 2

Critical value of χ2 is 5.991, which is looked up using the level of significance and degrees of freedom in the below given table.

chi square libre office 05

Make a decision

To accept our null hypothesis H0, calculated χ2 < critical χ2.
Our calculated χ2 = 19
Our critical χ2 = 5.991

Hence, we reject null hypothesis and accept alternate hypothesis.

You may watch the following video to understand the above calculation.

Chi square test with LibreOffice Calc

We have already found out the frequency distribution of females and males per each management level. Let’s use the same.

chi square libre office 06

Select Data>Statistics>Chi-square Test
chi square libre office 07

Choose the input cells
chi square libre office 08

Select the Output Cell
chi square libre office 09

Finally my selections are given as below
chi square libre office 10

After pressing OK, We get the following result
chi square libre office 11

Make a decision

If pα reject the null hypothesis. If p>α fail to reject the null hypothesis.

Our p 0.00007485 is lesser than alpha 0.05. So null hypothesis is rejected and alternate hypothesis is accepted.

Chi square test with R

I have the data set stored as sal.csv file. I’m importing it and store to sal object.

> setwd("d:/gandhari/videos/Advanced Business Analytics/")
> sal <-read.csv("sal.csv")
> head(sal)
  id gender      educ Designation Level Salary Last.drawn.salary Pre..Exp Ratings.by.interviewer
1  1 female        UG Jr Engineer   JLM  10000              1000        3                      4
2  2   male DOCTORATE    Chairman   TLM 100000            100000       20                      4
3  3   male   DIPLOMA       Jr HR   JLM   6000              6000        1                      3
4  4   male        PG    Engineer   MLM  15000             15000        7                      2
5  5 female        PG Sr Engineer   MLM  25000             25000       12                      4
6  6   male   DIPLOMA Jr Engineer   JLM   6000              8000        1                      1

As I wrote in Exploring data files with R I create a Frequency Distribution table using table() function.

> gender_level_table <- table(sal$Level, sal$gender)
> gender_level_table

      female male
  JLM      5   15
  MLM     20    5
  TLM      0    5

Use chisq.test() function with gender_level_table as its input, to run the chi square test

> chisq.test(gender_level_table)

	Pearson's Chi-squared test

data:  gender_level_table
X-squared = 19, df = 2, p-value = 7.485e-05

Warning message:
In chisq.test(gender_level_table) :
  Chi-squared approximation may be incorrect

Make a decision

If pα reject the null hypothesis. If p>α fail to reject the null hypothesis.

Our p 7.485e-05 is lesser than alpha 0.05. So null hypothesis is rejected and alternate hypothesis is accepted.

See you in another interesting post. Happy Sunday.

 

Advertisements

2 thoughts on “Testing of Independence – Chi Square test – Manual, LibreOffice, R

  1. Pingback: Testing of difference – T test using R | JavaShine

  2. Pingback: Analysis of Variance ANOVA using R | JavaShine

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s