# Sampling Techniques

This post assumes you have seen the basic concepts of statistics mentioned in my previous post Statistics – Basic terminologies.

Population, Sampling and Samples

Lets discuss about sampling in detail in this post. Because sampling is crucial in your data analysis. Higher the quality of the samples, Higher would be your results.

### Population

The collection of all units of a specified type in a given region at particular point of time is called population or universe.

Examples:

• Population of persons in a region
• Population of trees or birds in a forest

### Sampling Unit

Elementary units or group of such units which besides being clearly defined, identifiable and observable are convenient for the purpose of sampling is called a sampling unit.

Examples:

• A Family in a budget
• A farm or group of farms by a single household in a crop survey.

### Sampling Frame

A map showing the boundaries of the sampling units is a sampling frame.

A list of all sampling units belongs to a population to be studied with their identification  particulars is a sampling frame.

Examples:

• List of farms in villages of India

### Random Sampling or Probability Sampling

One of more sampling units from a population according to some specified procedures is said to constitute a sample. If its selection is governed by ascertainable laws of chance, it is called random/probability samples.

Assume a population consists of the n sampling units U1, U2, U3, ….. Un. We select a sample of n units by selecting them unit by unit with equal probability for every unit at each draw with or without replacing the sampling units selected in the previous draws.

Example:

• Select one person from each economic tier
• Select one tax payer from each tax slab

### Non-Random Sampling

A sample selected by a non-random process is called as non-random sampling. A non-random sample which is drawn using certain judgement of getting right samples is called judgement or purposive sample. This type of surveys is generally performed in large scale surveys, as it is not possible to get strictly valid estimates of popular parameters under consideration .

For a unique identification of population one should know the following.

• Elementary units
• Population characteristics, which vary numerically from unit to unit.
 Population Elementary Unit Characteristics Students of a particular class of a school Each student Marks obtained in final exam LED TVs produced by a company Each TV Length of life in years

### Types of Population

1. Finite (Countable numbers): E.g., Odd numbers (1, 3, 5 …)
2. Infinite (infinite and uncountable elementary units): Eg., pressure, humidity
3. Real (existing objects with factual observations): Eg., Indian Census
4. Hypothetical (hypothetically illustrated by the investigator): Eg., (coin tossing, dice thrown)

### Sampling procedure

Steps and examples are given below

### Probability Sampling

#### Simple Random Sampling

SRS is always using Equal Probability of Selection (EPS) (not all EPS selections are SRS). It is applicable when population is –

1. small
2. homogeneous

This is done by assigning number to each unit in a sampling frame. A table of random number or lottery is used to determine the unit selection.

• If sampling frame is large, population is large, this is impractical
• Minority subgroups may not be adequately selected

#### Systematic Sampling

This would sort the population in any order and choose the samples in regular intervals.

i1, i2, i3, i4, i5, i6, i7, i8, i9

in the above population, I have taken the samples in red colour in regular intervals. Pls note I have not selected the first or last items of my population.

1. Every 10th name in telephone directory
2. Every 5th sapling in a sugarcane farm

• Convenient
• Selecting the sampling frame is easy
• Sample evenly spread over entire population

• Hidden priorities may affect the precision
• Difficult to access the precision of estimate from single survey.

#### Stratified Sampling

The sampling may be biased. How? Assume, I’m sampling from a group of people of different villages. I may select only men, or women or more of one gender. My samples may affect my precision. Hence when the population includes number of distinct categories, the frame can be organized into separate ‘strata’. Each stratum would be sampled as an independent sub-population, in which individual elements are randomly selected. Men is a stratum, women is another stratum. This process is called Stratified Random Sampling.

#### Cluster Sampling

Here we do the sampling in a different way. We would be sampling twice.

We would choose areas of sampling

The population is divided into clusters (usually based on geographical locations)

Sample units are groups rather than individuals (Eg., Senior citizens of Palani Murugan Street, Flower vendors of Dhendapani shopping complex etc)

A sample of such cluster is selected (Eg., senior citizens of Palani Murugan Street)

All units from the select clusters are studied (All senior citizens of Palani Murugan Street).

### Non-Probability Sampling

Non-random sampling shall be used for ‘not-mission critical’ surveys, as we are not seriously concerned about the accuracy of the results.

#### Convenience Sampling

This is collecting the samples from a population that is convenient to get (located near by, data readily available etc). We may not take a decision based on this sampling, as this doesn’t represent entire population. This is generally used in pilot phases.

For example, Taking agriculture yield values from internet, survey about the Government from the neighbourhood.

#### Judgement Sampling

We may do a sampling based on our own experience and preference. Samples are non-randomly chosen by the researcher based on his own judgement.

For example, a teacher may choose his own sampling to pick some students 👩‍🎓👨‍🎓 for extra coaching class, whom he thinks as poor in studies.

#### Quota Sampling

Quota, ration etc are not new to India 🇮🇳. Similar to stratified sampling, the population is divided into strata. then judgemental sampling is used.

#### Snowball sampling

Getting the efficient and cost-effective samples from the links of our known resources. For example, Google, when it released its email service, chose the users (samples) based on referrals.

# Statistics – Basic terminologies

I’m starting a new blog posts series after my Big Data series of posts. I’d be starting with statistical concepts. I’m planning to take it until programming with R. Let’s see how it goes.

We’d be discussing about the following.

1. Data
2. Population
3. Sample
4. Sampling
5. Characteristic
6. Variable & Attribute
7. Parameter

### Data

We observe numerical figures for a desired characteristics. Collection of such numerical figures is called data.

Let’s take the below given table. This denotes the number of flights operated by different Airlines for Tier 2 cities of Tamil Nadu state of India. This is a collection of numerical figures, which we call as a data set.

The number of flights operated by different Airlines for Tier 2 cities of Tamilnadu state of India. This is a collection of numerical figures, which we call as a data set.

We may classify the data into two categories.

1. Categorical Data (or Qualitative Data) – Examples: Weight=”low’, Height=”short”
2. Numerical Data (or Quantitative Data) – Examples: Height=1.8m Weight = 70Kg.

### Population, Sample and Sampling

Statistical investigations is always performed against a collection of metrics, individuals and their attributes. Such collection is called Population. For example – India is a vast country with 1.3 billion people.

Population, Sampling and Samples

Finite subset of population is sample. When we want to know about what Indian people think about China, it is impractical to consult all 1.3 billion people. We choose small set of people to do the survey. this small set is called sample. It is small part of something, used to represent the whole.

The process of selection is Sampling. Each sample/observation/data may measure different properties.

### Characteristic

Quality possessed by a sample is called characteristic. For example height of the individuals, nationality of the group of passengers etc

### Variable & Attribute

If a characteristic is measurable, it is variable. This is usually measured in numbers. For example, age, height etc.

Check the below given table. Number of flights operated by Air India, Silk Air etc are variables.

The number of flights operated by different Airlines for Tier 2 cities of Tamilnadu state of India. This is a collection of numerical figures, which we call as a data set.

If the characteristic can not be measured, it is attribute. For example – single, married, widowed, divorced etc.

Following is an example for attribute. Delhi, Bangalore, Mumbai – are not numerical measures

### Parameter & Statistic

Parameter is the characteristics of a population. Statistic is the characteristic of a Sample.

Population, Sampling and Samples

Let’s take the example again. What is the mean salary of Indians? – this mean salary is a characteristic. When you answer this question from population, it is called Population Mean μ. If you answer this question from Sample, it is called Sample Mean x̅.