Correlation is a measure of degree to which two variables are related.
For instance, Tamilnadu government opened more number of primary schools and offered free noon-meal scheme in the past, which improved the education quality. The same government has opened liquor shops all over Tamilnadu, which increased the number of accidents.
So we are talking about two variables here. In our first example, education promotion activities taken by government is one variable, education quality is another variable. Next example has number of liquor shops as first variable and number of accidents in second variable.
So, correlation is linear association between two variables which wold help to determine the relationship between the. Correlation coefficient lies in the range of -1.00 to +1.00 as +ve or -ve probability.
It not only gives the estimate of degree of association between two or more variables but also helps us to test the interdependence of the variables.
We use Spearman’s coefficient ρ (rho), which is apt for both continuous, discrete and ordinal variables.
Types of Correlation
- Positive Correlation
- Negative Correlation
- Simple Correlation
- Multiple Correlation
- Partial Correlation
- Total Correlation
- Linear Correlation
- Non-linear Correlation
The correlation depends on the direction of the variables. An increase in variable A causing an increase in variable B leads to positive correlation.
- Height and Weight
- Demand and Price
The correlation depends on the direction of the variables. An increase in variable A causing a decrease in variable B leads to positive correlation.
- Number of files and free space in the hard drive
- Price and competition
Simple & Multiple Correlation
We have already seen that correlation is relation between two variables. This is simple correlation. Sometimes, you may see more than two variables sometimes. This would be multiple correlation. For example, Number of students enrolled in a school, number of similar schools available in its vicinity and Number of school going children around the same.
Partial & Total Correlation
Analyzing the correlation excluding one or more variables is called partial correlation. We’d consider all variables in a total correlation.
Linear & Non-linear Correlation
If the ratio of change between two variables is uniform (directly or reverse proportional), we say it is linear correlation. if not, it is non-linear.
Computing Coefficient of Correlation Manually
Lets take the following data set for analysis.
Karl Pearson’s formula for coefficient of correlation r is given as –
r = (Σxy * N) – Σx * Σy / √[(Σx2 * N – (Σx)2] * [Σy2 * N – (Σy)2]
|Σx = 70||Σy = 63||Σx2=728||Σy2=651||Σxy = 676|
N = Total number of samples / column count
N = 14/2 = 7 ————-(1)
r = (Σxy * N) – Σx * Σy / √([Σx2 * N – (Σx)2] * [Σy2 * N – (Σy)2])
r = ((676 * 7) – 70 * 63) / √([728 * 7 – (70)2] * [651 * 7 – (63)2])
r = 4732 – 4410/√([5096-4900] * [4557 – 3969])
r = 322 / 339.4819582835 = 0.948504013668671
Between the scale of -1 to +1, our coefficient is +0.95, which shows a strong +ve correlation.