This post assumes you have seen the basic concepts of statistics mentioned in my previous post Statistics – Basic terminologies.
Population, Sampling and Samples
Lets discuss about sampling in detail in this post. Because sampling is crucial in your data analysis. Higher the quality of the samples, Higher would be your results.
The collection of all units of a specified type in a given region at particular point of time is called population or universe.
- Population of persons in a region
- Population of trees or birds in a forest
Elementary units or group of such units which besides being clearly defined, identifiable and observable are convenient for the purpose of sampling is called a sampling unit.
- A Family in a budget
- A farm or group of farms by a single household in a crop survey.
A map showing the boundaries of the sampling units is a sampling frame.
A list of all sampling units belongs to a population to be studied with their identification particulars is a sampling frame.
- List of farms in villages of India
Random Sampling or Probability Sampling
One of more sampling units from a population according to some specified procedures is said to constitute a sample. If its selection is governed by ascertainable laws of chance, it is called random/probability samples.
Assume a population consists of the n sampling units U1, U2, U3, ….. Un. We select a sample of n units by selecting them unit by unit with equal probability for every unit at each draw with or without replacing the sampling units selected in the previous draws.
- Select one person from each economic tier
- Select one tax payer from each tax slab
A sample selected by a non-random process is called as non-random sampling. A non-random sample which is drawn using certain judgement of getting right samples is called judgement or purposive sample. This type of surveys is generally performed in large scale surveys, as it is not possible to get strictly valid estimates of popular parameters under consideration .
For a unique identification of population one should know the following.
- Elementary units
- Population characteristics, which vary numerically from unit to unit.
|Students of a particular class of a school
||Marks obtained in final exam
|LED TVs produced by a company
||Length of life in years
Types of Population
- Finite (Countable numbers): E.g., Odd numbers (1, 3, 5 …)
- Infinite (infinite and uncountable elementary units): Eg., pressure, humidity
- Real (existing objects with factual observations): Eg., Indian Census
- Hypothetical (hypothetically illustrated by the investigator): Eg., (coin tossing, dice thrown)
Steps and examples are given below
Types of Sampling
We already talked about Probability and non-probability methods. Let’s talk about other sampling methods now.
Simple Random Sampling
SRS is always using Equal Probability of Selection (EPS) (not all EPS selections are SRS). It is applicable when population is –
- readily available
This is done by assigning number to each unit in a sampling frame. A table of random number or lottery is used to determine the unit selection.
- If sampling frame is large, population is large, this is impractical
- Minority subgroups may not be adequately selected
This would sort the population in any order and choose the samples in regular intervals.
i1, i2, i3, i4, i5, i6, i7, i8, i9
in the above population, I have taken the samples in red colour in regular intervals. Pls note I have not selected the first or last items of my population.
- Every 10th name in telephone directory
- Every 5th sapling in a sugarcane farm
- Selecting the sampling frame is easy
- Sample evenly spread over entire population
- Hidden priorities may affect the precision
- Difficult to access the precision of estimate from single survey.
The sampling may be biased. How? Assume, I’m sampling from a group of people of different villages. I may select only men, or women or more of one gender. My samples may affect my precision. Hence when the population includes number of distinct categories, the frame can be organized into separate ‘strata’. Each stratum would be sampled as an independent sub-population, in which individual elements are randomly selected. Men is a stratum, women is another stratum. This process is called Stratified Random Sampling.
Here we do the sampling in a different way. We would be sampling twice.
We would choose areas of sampling
The population is divided into clusters (usually based on geographical locations)
Sample units are groups rather than individuals (Eg., Senior citizens of Palani Murugan Street, Flower vendors of Dhendapani shopping complex etc)
A sample of such cluster is selected (Eg., senior citizens of Palani Murugan Street)
All units from the select clusters are studied (All senior citizens of Palani Murugan Street).
Non-random sampling shall be used for ‘not-mission critical’ surveys, as we are not seriously concerned about the accuracy of the results.
This is collecting the samples from a population that is convenient to get (located near by, data readily available etc). We may not take a decision based on this sampling, as this doesn’t represent entire population. This is generally used in pilot phases.
For example, Taking agriculture yield values from internet, survey about the Government from the neighbourhood.
We may do a sampling based on our own experience and preference. Samples are non-randomly chosen by the researcher based on his own judgement.
For example, a teacher may choose his own sampling to pick some students 👩🎓👨🎓 for extra coaching class, whom he thinks as poor in studies.
Quota, ration etc are not new to India 🇮🇳. Similar to stratified sampling, the population is divided into strata. then judgemental sampling is used.
Getting the efficient and cost-effective samples from the links of our known resources. For example, Google, when it released its email service, chose the users (samples) based on referrals.