Let’s talk about Frequency distribution today.
I wrote about various data collection and sampling methods in my previous blog post Sampling Techniques.
After data collection or sampling, the first task a researcher do is organizing or categorizing. It would help him/her to get a overview of his data set. Frequency distribution is a simple method in this stage.
It contains at least two columns
- Scale of Measurement – X
- Frequency – f
X Column would list min-max values without missing any value.
f contains the tallies for the scale. Each tally represent one occurrence. Let’s explain this with a simple data as given below.
Following is the arrival of flights from Trichy Airport today.
|(DXB) Dubai||Air India Express||612||00:05||Landed|
|(SHJ) Sharjah||Air India Express||614||02:35||Landed|
|(KUL) Kuala Lumpur||AirAsia||25||08:55||Landed|
|(KUL) Kuala Lumpur||(MXD) Malindo Air||221||09:45||Landed|
|(SIN) Singapore||TigerAir||2662||10:10||En Route|
|(MAA) Chennai||Jet Airways||2748||11:05||Landed|
|(SIN) Singapore||Air India Express||681||15:10||En Route|
|(KUL) Kuala Lumpur||AirAsia||27||16:35||En Route|
|(MAA) Chennai||Jet Airways||2411||17:35||Scheduled|
|(MAA) Chennai||Jet Airways||2789||21:25||Scheduled|
|(KUL) Kuala Lumpur||AirAsia||23||21:45||Scheduled|
|(KUL) Kuala Lumpur||(MXD) Malindo Air||223||22:35||Scheduled|
|(KUL) Kuala Lumpur||AirAsia||29||23:45||Scheduled|
I want to do a timeline analysis of how many flights landed during different part of the timings.
Let’s perform a frequency distribution. I want to classify based in 6 hours interval.
So our classes count is given as below
24 hours/6 hour interval = 4 hours interval
Generally number classes is identified as 1+3.3log(n), where number of observations in the data.
1+3.3 log(17) = 5. Anyway for my own convenience, I chose the 4 hours interval here.
what’s the lowest value given in the above table? 00.50
What’s the highest value given? 23.45
Now lets identify our class width
Which is 5.7 hours. Lets round it as 6.
Our class width is 6 now.
Following is the FD table. Lower class limit and Upper Class Limit denotes X and Frequency denotes f.
|Lower Class Limit||Upper Class Limit||Frequency|
Excel would do this job at no time! Following is the output from Excel Histogram function.
Types of Frequency Distribution – Skewing
The above graph for Trichy airport, does it show us any trend? Yes it is. Pls look at the below given graph with a trend line. We the a tail on the left side and the head is on right side. We call this behaviour as skew!
When the head is on left side and tail is on right side we call those skew as positive. Vice-versa is called negative skew. When you see a bell like trend, up in the center and tails are uniformly extended in left and right, it is called symmetric distribution.
Here you go. See you in next post.