Happy Independence Day

My previous post about measures of central tendency is simple and interesting. But, see my dear budding data scientists, it doesn’t represent the whole population completely. India has variety of states. While southern states are reasonably performing good, north Indian states may bring down the efficiency of the entire country. In this situation, how accurate would be our decisions from our statistical analysis?

So, we need to look at the dispersion of samples as well. Dispersion is a process of distributing something or someone over an area. It describes how scattered or how squeezed our samples are. So, we talked about measurements at center in the previous post. Let’s talk about how far the samples are from the center, now.

### Methods of measuring dispersion

I want to write about the following now.

- Range
- Mean Deviation or Standard Deviation
- Lorenz Curve

### Range

It is a rough measure of dispersion that is based on the extreme items, not on the available items.

Range R = Largest value L – Smallest Value S

Coefficient of Range = (L-S)/(L+S)

Following is the percentage of marks obtained by students in a class 10-A, which is doing well in studies.

90, 95, 93, 99, 100

Range R = L – S

R = 100 – 90

R = 10.

Coefficient of Range = L-S/L+S

C = 100-90/100+90

C = 10/190 = 0.0526.

Following is the percentage of marks obtained by students in a class 10-B, which has variety of students.

10, 25, 17, 80, 40

Range R = L – S

R = 80 – 10

R = 70.

Coefficient of Range = L-S/L+S

C = 80-10/80+10

C = 70/90 = 0.778

Following is the percentage of marks obtained by students in a class 10-C, which is doing poor in studies.

10, 15, 13, 19, 20

20-10/20+10

Coefficient of Range = 10/30=0.333

Following is the percentage of marks obtained by students in a class 10-D, which is doing well except some students.

90, 95, 93, 20, 100

Range = 100-20 = 70

100-20/100+20

Coefficient of Range = 80/120=0.6

**Class** |
**% of marks** |
**Max** |
**Min** |
**Range** |
**Range Coefficient** |

10A |
90 |
95 |
93 |
99 |
100 |
100 |
90 |
10 |
0.052631579 |

10B |
10 |
25 |
17 |
80 |
40 |
80 |
10 |
70 |
0.777777778 |

10C |
10 |
15 |
13 |
19 |
20 |
20 |
10 |
10 |
0.333333333 |

10D |
90 |
95 |
93 |
20 |
100 |
100 |
20 |
80 |
0.666666667 |

- Range is being used in industries QA, to identify the samples not within the accepted range.
- Range is used to identify variation prices of commodities.

### Inter-Quartile Range & Quartile Deviation

To avoid the extreme values, lets’ try to eliminate 25% of lowest and highest items in the series. To obtain the measure of variance, we shall use the distance between first and 3rd quartile, which is called inter-quartile range.

Interquartile range = Third Quartile Q3 – First Quartile Q1

Semi-Quartile range = Interquartile range / 2

Lets take the below given data set

**Age** |
**Members** |

20 |
3 |

30 |
61 |

40 |
132 |

50 |
153 |

60 |
140 |

70 |
51 |

80 |
3 |

Compute the cumulative frequency c.f

**Age** |
**members** |
**c.f** |

20 |
3 |
3 |

30 |
61 |
64 |

40 |
132 |
196 |

50 |
153 |
349 |

60 |
140 |
489 |

70 |
51 |
540 |

80 |
3 |
543 |

First quartile Q1 = (N+1)/4th item

Q1 = (543+1)/4

Q1 = 136th item

The value closer to 136 is 40 in the above table.

Third quartile Q3 = Value of 3 * (N+1)/4th Item

Q3 = 408th item which is 60.

Quartile deviation is QD = (Q3 – Q1) / 2

QD = (60-40)/2

QD = 10

Coefficient of Quartile Deviation = (Q3-Q1)/(Q3+Q1)

c = 60-40/60+40

c = 0.2

See you in another post.