I’m starting a new blog posts series after my Big Data series of posts. I’d be starting with statistical concepts. I’m planning to take it until programming with R. Let’s see how it goes.
Let’s start with jargons or terminologies.
We’d be discussing about the following.
- Variable & Attribute
We observe numerical figures for a desired characteristics. Collection of such numerical figures is called data.
Let’s take the below given table. This denotes the number of flights operated by different Airlines for Tier 2 cities of Tamil Nadu state of India. This is a collection of numerical figures, which we call as a data set.
We may classify the data into two categories.
- Categorical Data (or Qualitative Data) – Examples: Weight=”low’, Height=”short”
- Numerical Data (or Quantitative Data) – Examples: Height=1.8m Weight = 70Kg.
Population, Sample and Sampling
Statistical investigations is always performed against a collection of metrics, individuals and their attributes. Such collection is called Population. For example – India is a vast country with 1.3 billion people.
Finite subset of population is sample. When we want to know about what Indian people think about China, it is impractical to consult all 1.3 billion people. We choose small set of people to do the survey. this small set is called sample. It is small part of something, used to represent the whole.
The process of selection is Sampling. Each sample/observation/data may measure different properties.
Quality possessed by a sample is called characteristic. For example height of the individuals, nationality of the group of passengers etc
Variable & Attribute
If a characteristic is measurable, it is variable. This is usually measured in numbers. For example, age, height etc.
Check the below given table. Number of flights operated by Air India, Silk Air etc are variables.
If the characteristic can not be measured, it is attribute. For example – single, married, widowed, divorced etc.
Following is an example for attribute. Delhi, Bangalore, Mumbai – are not numerical measures
Parameter & Statistic
Parameter is the characteristics of a population. Statistic is the characteristic of a Sample.
Let’s take the example again. What is the mean salary of Indians? – this mean salary is a characteristic. When you answer this question from population, it is called Population Mean μ. If you answer this question from Sample, it is called Sample Mean x̅.