Statistics – Basic terminologies

I’m starting a new blog posts series after my Big Data series of posts. I’d be starting with statistical concepts. I’m planning to take it until programming with R. Let’s see how it goes.

Let’s start with jargons or terminologies.

We’d be discussing about the following.

  1. Data
  2. Population
  3. Sample
  4. Sampling
  5. Characteristic
  6. Variable & Attribute
  7. Parameter

Data

We observe numerical figures for a desired characteristics. Collection of such numerical figures is called data.

Let’s take the below given table. This denotes the number of flights operated by different Airlines for Tier 2 cities of Tamil Nadu state of India. This is a collection of numerical figures, which we call as a data set.

The number of flights operated by different Airlines for Tier 2 cities of Tamilnadu state of India. This is a collection of numerical figures, which we call as a data set.

The number of flights operated by different Airlines for Tier 2 cities of Tamilnadu state of India. This is a collection of numerical figures, which we call as a data set.

We may classify the data into two categories.

  1. Categorical Data (or Qualitative Data) – Examples: Weight=”low’, Height=”short”
  2. Numerical Data (or Quantitative Data) – Examples: Height=1.8m Weight = 70Kg.

Population, Sample and Sampling

Statistical investigations is always performed against a collection of metrics, individuals and their attributes. Such collection is called Population. For example – India is a vast country with 1.3 billion people.

Population, Sampling and Samples

Population, Sampling and Samples

Finite subset of population is sample. When we want to know about what Indian people think about China, it is impractical to consult all 1.3 billion people. We choose small set of people to do the survey. this small set is called sample. It is small part of something, used to represent the whole.

sampling

The process of selection is Sampling. Each sample/observation/data may measure different properties.

Characteristic

Quality possessed by a sample is called characteristic. For example height of the individuals, nationality of the group of passengers etc

Variable & Attribute

If a characteristic is measurable, it is variable. This is usually measured in numbers. For example, age, height etc.

Check the below given table. Number of flights operated by Air India, Silk Air etc are variables.

The number of flights operated by different Airlines for Tier 2 cities of Tamilnadu state of India. This is a collection of numerical figures, which we call as a data set.

The number of flights operated by different Airlines for Tier 2 cities of Tamilnadu state of India. This is a collection of numerical figures, which we call as a data set.

If the characteristic can not be measured, it is attribute. For example – single, married, widowed, divorced etc.

Following is an example for attribute. Delhi, Bangalore, Mumbai – are not numerical measures

Airline Hub

attribute variable comparison

Parameter & Statistic

Parameter is the characteristics of a population. Statistic is the characteristic of a Sample.

Population, Sampling and Samples

Population, Sampling and Samples

Let’s take the example again. What is the mean salary of Indians? – this mean salary is a characteristic. When you answer this question from population, it is called Population Mean μ. If you answer this question from Sample, it is called Sample Mean x̅.