Frequency Distribution

Let’s talk about Frequency distribution today.

I wrote about various data collection and sampling methods in my previous blog post Sampling Techniques.

After data collection or sampling, the first task a researcher do is organizing or categorizing. It would help him/her to get a overview of his data set. Frequency distribution is a simple method in this stage.

It contains at least two columns

  1. Scale of Measurement – X
  2. Frequency – f

X Column would list min-max values without missing any value.

f contains the tallies for the scale. Each tally represent one occurrence. Let’s explain this with a simple data as given below.

Following is the arrival of flights from Trichy Airport today.

Origin Airline Flight Arrival Status
(DXB) Dubai Air India Express 612 00:05 Landed
(SIN) Singapore TigerAir 2668 00:35 Landed
(SHJ) Sharjah Air India Express 614 02:35 Landed
(CMB) Colombo Srilankan 131 08:40 Landed
(KUL) Kuala Lumpur AirAsia 25 08:55 Landed
(KUL) Kuala Lumpur (MXD) Malindo Air 221 09:45 Landed
(SIN) Singapore TigerAir 2662 10:10 En Route
(MAA) Chennai Jet Airways 2748 11:05 Landed
(CMB) Colombo Srilankan 133 14:30 Landed
(SIN) Singapore Air India Express 681 15:10 En Route
(KUL) Kuala Lumpur AirAsia 27 16:35 En Route
(MAA) Chennai Jet Airways 2411 17:35 Scheduled
(MAA) Chennai Jet Airways 2789 21:25 Scheduled
(KUL) Kuala Lumpur AirAsia 23 21:45 Scheduled
(KUL) Kuala Lumpur (MXD) Malindo Air 223 22:35 Scheduled
(SIN) Singapore TigerAir 2664 22:50 Scheduled
(KUL) Kuala Lumpur AirAsia 29 23:45 Scheduled

I want to do a timeline analysis of how many flights landed during different part of the timings.

Let’s perform a frequency distribution. I want to classify based in 6 hours interval.

So our classes count is given as below

24 hours/6 hour interval = 4 hours interval

Generally number classes is identified as 1+3.3log(n), where number of observations in the data.

1+3.3 log(17) = 5. Anyway for my own convenience, I chose the 4 hours interval here.

 

what’s the lowest value given in the above table? 00.50

What’s the highest value given? 23.45

Now lets identify our class width

fd01.png

Which is 5.7 hours. Lets round it as 6.

Our class width is 6 now.

Following is the FD table. Lower class limit and Upper Class Limit denotes X and Frequency denotes f.

Lower Class Limit Upper Class Limit Frequency
00:00 06:00 3
06:00 12:00 5
12:00 18:00 4
18:00 00:00 5

Excel would do this job at no time! Following is the output from Excel Histogram function.

Bin Frequency
00:00 0
06:00 3
12:00 5
18:00 4
More 5

 

fd02.png

Types of Frequency Distribution – Skewing

The above graph for Trichy airport, does it show us any trend? Yes it is. Pls look at the below given graph with a trend line. We the a tail on the left side and the head is on right side. We call this behaviour as skew!

fd03

When the head is on left side and tail is on right side we call those skew as positive. Vice-versa is called negative skew. When you see a bell like trend, up in the center and tails are uniformly extended in left and right, it is called symmetric distribution.

Here you go. See you in next post.

Advertisements