Measures of shape – Skew & kurtosis

I have written about Measures of Variance – Standard Deviation in my previous post. This is good. We came a long way from Measures of Central Tendency to variance. But this measure do not indicate us, whether the distribution is symmetric or not. We have seen Frequency Distribution that differ widely in their nature and composition. Following are the two popular measures those are used to indicate the shape of a distribution of an interval or ratio variable.

  • Skewness
  • Kurtosis

Shape of the distribution is the shape of data when we plot the graph.

What is a symmetrical distribution?

when the mean, median and mode are identical, we call the distribution as symmetrical.

When it is not symmetrical, we call it as asymmetrical or skewed. Following are some types of shape.

UniModal:

The distribution has a single value that occurs most frequently.

Uni modal bimodal multimodal

Symmetrical:

the left side distribution of values mirrors the right side.

positive-skewness

Bell-Shaped

The frequencies of cases decline towards the extreme values in the right and left tails. the resultant graph will be in the shape of a bell.

Normal Distribution

If all the values in a data set, are equally distributed, the shape would be symmetrical. For this type of data set, mean, median and model would be equal. 50% of the cases will lie above or below the mid point of the mean.

Skew

Skewness measures the symmetry of a distribution. Calculating the skewness will indicate the position of lower and higher values in a data set, that will pull the shape the distribution towards lower or higher end.

When the data has lot of low values, the shape will be +vely skewed.

If it has more higher numbers, it would be -vely skewed.

Kurtosis

Distributions of data may not have same shape always. Some are asymmetric, skewed to the left or right. Otheer distributions are bimodal, multimodal as shown above. I have explain these above.

Another measure to consider is, the shape of the tails of the distribution, on the far left and right. Kurtosis is the measure of the thickness or heaviness of the tails of a distribution.

Three categories of kurtosis are –

  1. Mesokurtic – typically measures with respect to normal distribution. The tails are similar to ND. The kurtosis of a mesokurtic distribution is neither high or low, rather it is considered as a base line for the two other classifications.
  2. Leptokurtic – Kurtosis is greater than mesokurtic. Peaks are thin and tall. Tails are thick ad heavy.
  3. Platykurtic – Kurtosis is lesser than meso kurtic. This has slender tails. This possess low peak.

kurtosis

See you in another interesting post.

Advertisements

Frequency Distribution

Let’s talk about Frequency distribution today.

I wrote about various data collection and sampling methods in my previous blog post Sampling Techniques.

After data collection or sampling, the first task a researcher do is organizing or categorizing. It would help him/her to get a overview of his data set. Frequency distribution is a simple method in this stage.

It contains at least two columns

  1. Scale of Measurement – X
  2. Frequency – f

X Column would list min-max values without missing any value.

f contains the tallies for the scale. Each tally represent one occurrence. Let’s explain this with a simple data as given below.

Following is the arrival of flights from Trichy Airport today.

Origin Airline Flight Arrival Status
(DXB) Dubai Air India Express 612 00:05 Landed
(SIN) Singapore TigerAir 2668 00:35 Landed
(SHJ) Sharjah Air India Express 614 02:35 Landed
(CMB) Colombo Srilankan 131 08:40 Landed
(KUL) Kuala Lumpur AirAsia 25 08:55 Landed
(KUL) Kuala Lumpur (MXD) Malindo Air 221 09:45 Landed
(SIN) Singapore TigerAir 2662 10:10 En Route
(MAA) Chennai Jet Airways 2748 11:05 Landed
(CMB) Colombo Srilankan 133 14:30 Landed
(SIN) Singapore Air India Express 681 15:10 En Route
(KUL) Kuala Lumpur AirAsia 27 16:35 En Route
(MAA) Chennai Jet Airways 2411 17:35 Scheduled
(MAA) Chennai Jet Airways 2789 21:25 Scheduled
(KUL) Kuala Lumpur AirAsia 23 21:45 Scheduled
(KUL) Kuala Lumpur (MXD) Malindo Air 223 22:35 Scheduled
(SIN) Singapore TigerAir 2664 22:50 Scheduled
(KUL) Kuala Lumpur AirAsia 29 23:45 Scheduled

I want to do a timeline analysis of how many flights landed during different part of the timings.

Let’s perform a frequency distribution. I want to classify based in 6 hours interval.

So our classes count is given as below

24 hours/6 hour interval = 4 hours interval

Generally number classes is identified as 1+3.3log(n), where number of observations in the data.

1+3.3 log(17) = 5. Anyway for my own convenience, I chose the 4 hours interval here.

 

what’s the lowest value given in the above table? 00.50

What’s the highest value given? 23.45

Now lets identify our class width

fd01.png

Which is 5.7 hours. Lets round it as 6.

Our class width is 6 now.

Following is the FD table. Lower class limit and Upper Class Limit denotes X and Frequency denotes f.

Lower Class Limit Upper Class Limit Frequency
00:00 06:00 3
06:00 12:00 5
12:00 18:00 4
18:00 00:00 5

Excel would do this job at no time! Following is the output from Excel Histogram function.

Bin Frequency
00:00 0
06:00 3
12:00 5
18:00 4
More 5

 

fd02.png

Types of Frequency Distribution – Skewing

The above graph for Trichy airport, does it show us any trend? Yes it is. Pls look at the below given graph with a trend line. We the a tail on the left side and the head is on right side. We call this behaviour as skew!

fd03

When the head is on left side and tail is on right side we call those skew as positive. Vice-versa is called negative skew. When you see a bell like trend, up in the center and tails are uniformly extended in left and right, it is called symmetric distribution.

Here you go. See you in next post.