The world’s leading publication for data science, AI, and ML professionals.

Two Stage Stratified Random Sampling – Clearly Explained

Understand the intricate procedure of two stage random sampling with the help of a practical use case.

Photo by Scott Graham on Unsplash
Photo by Scott Graham on Unsplash

1. Introduction

Large scale surveys conducted by Governments are important tools to get information about the population / economy. These surveys provide critical inputs for socio-economic planning. The National Sample Survey Office (NSSO) was set up in 1950 to conduct nationwide surveys. In such large scale surveys, it is common to find stratified sampling process for collecting survey data. Ever wondered what this means? This blog piece is intended to straighten this out with reference to All India Debt and Investment Survey (AIDIS) which uses two stage stratified sampling. So keep reading.

The rest of this writeup is structured into following segments:

  • What is AIDIS
  • What sampling technique is used in AIDIS
  • What is stratified sampling
  • What is two stage stratified sampling
  • Role of multipliers in analyzing AIDIS data
  • Conclusion

2. What is AIDIS?

The objectives of the AIDIS is to obtain quantitative information on the stock of assets, incidence of indebtedness, capital formation and other indicators of the economy which will be of value in assessing the credit structure of the economy. Further, it can be of help in estimating (i) demand for credit by households and (ii) supply of credit by institutional and non-institutional agents – in order to formulate banking policies. In short, it collects basic quantitative information on assets and liabilities of households. The latest round (77th) of AIDIS was conducted in 2019 in two visits.

  • Visit 1 – January 2019 to July 2019 (8 months)
  • Visit 2 – August 2019 to December 2019 (4 months)

The reference periods for questions asked during 1st and 2nd visit were H2:2018 and H1:2019 respectively.

AIDIS conducted by the NSSO is a classic example of stratified sampling technique. We will understand about this technique using a simple example and extrapolate the idea to understand the more complex AIDIS sampling.

3. What is stratified sampling?

The basic idea behind stratified sampling is to divide a heterogeneous population into smaller groups or subpopulations, such that the sampling units are homogeneous with respect to the characteristic under study. These homogeneous groups are collectively called strata. Each group is considered separate and a random sample of predetermined size is drawn from each stratum.

3.1 Procedure of stratified sampling

a. Divide the population of N units into k homogeneous groups (strata) with each group containing Nₖ units. Each stratum consists of non-overlapping units which are homogeneous with respect to the characteristic under study such that

b. Draw a random sample of size nᵢ from each stratum independently of other strata.

c. All the sampling units drawn from each stratum pooled together constitute a stratified sample of size 𝑛

Stratified Sampling: Image by Author
Stratified Sampling: Image by Author

3.2 Example of stratified sampling

Let us understand the concept using an example. Suppose we are interested in knowing the average weight of students in a school of class 1 to class 12. The weight of students, in general, varies depending on their age. Younger students will tend to have lower weights as compared to senior students. One way to determine the average weight of students could be to measure the weight of all students and then take the average, however, it is resource intensive and time-taking. It is definitely not feasible when number of students are very large. Stratified sampling comes in handy in these situations. One can divide all the students into different strata such as

Students of class 1: Stratum 1

Students of Class 2: Stratum 2

Students of class 12: Stratum 12

Random samples of predetermined sizes can be drawn from each strata. All the drawn samples combined together constitute the final stratified sample.

Let us extend the previous example by introducing another level of complexity. What if there are boys and girls in all classes. The weight of girls will in general differ from weight of boys. A random sampling at the class level may not be able to ensure the right proportion of boys and girls as reflected in the population at the class level. This may bias the estimate of average weight. In such a scenario, having a sub-strata at gender level in each class can take us closer to the actual population mean. This still is a single stage stratified sampling as sampling happens only once, only the incidence of sampling has moved to a deeper level in the hierarchical allocation of population i.e. from each class to genders of each class.

4. What is a two stage stratified sampling?

In two stage stratified sampling, sampling occurs twice and at two different levels in the hierarchical allocation of population. To understand this better, lets consider the sampling process for AIDIS survey. The first stage units (FSUs) are villages/blocks depending on rural/urban area. The second stage units (SSUs) are households in both the sectors. The sampling procedure for AIDIS is enumerated below:

Stage 1

  1. For sampling purpose, each district is considered as a stratum. Within each district, rural and urban areas are considered as sub-stratum.
  2. Within each rural sub-strata, a number of First Stage Units (FSUs) are formed depending on its population such that each FSU has roughly same population between 1000–1200 (as per Census 2011). Within each urban sub-strata too, FSUs are formed such that number of households is roughly equal to 250.
  3. From each sub-strata in both rural and urban sector within each stratum, required number of FSUs is selected by Simple Random Sampling Without Replacement (SRSWOR) scheme.

First stage sampling process is illustrated through the following infographic:

AIDIS Stratified Sampling Stage 1: Image by Author
AIDIS Stratified Sampling Stage 1: Image by Author

Stage 2

  1. Six second stage strata (SSS) are formed in selected FSU of rural/ urban sector considering the MPCE and indebtedness of households.
  2. All the households of the sample FSU are listed and allocated to one of the six SSS.
  3. Required number of sample households from each second stage strata (SSS) are selected by Stratified Random Sampling Without Replacement (SRSWOR).

This second stage sampling process is illustrated through the following infographic:

AIDIS Stratified Sampling Stage 2: Image by Author
AIDIS Stratified Sampling Stage 2: Image by Author

From the above process, it cannot be emphasized more that sampling happens in two stages:

1.Selection of FSUs: From all the sub-strata in both rural and urban sector within each stratum, required number of FSUs is selected by Simple Random Sampling Without Replacement (SRSWOR) scheme.

where N is the total number of FSUs in any sub-stratum and n is the number of sample FSUs surveyed in that particular sub-stratum.

2.Selection of households: The sample households from each second stage strata (SSS) are selected by Stratified Random Sampling Without Replacement (SRSWOR).

where H is the total number of households listed in a particular SSS of a selected FSU sample and h is the number of households surveyed in that SSS of that FSU sample.

5. Role of multipliers

All the sample households pooled together give us the sample which we see in the survey data. It may be obvious from the above process, that each household in the survey data represents a group of households which have similar characteristics in terms of MPCE and indebtedness.

The multiplier can be thought of as the approximate number of households each survey entity represents from the population. Therefore an appropriate weight becomes important while aggregating the survey data to derive population attributes. A simple mean/average of survey entities, for instance, will not reflect population mean/average as it gives equal weightage to all the households surveyed. Therefore, weighted average using multiplier as weights becomes necessary in estimation of population mean/average.

6. Conclusion

In this blog post, I discussed two stage stratified sampling procedure and illustrated it using a practical use case of All India Debt and Investment Survey. Which other surveys use this sampling technique – feel free to leave your comments below.

Before we wind up,

I invite you to join me in this exciting data science odyssey. Follow my medium page to explore more exciting content about data science.

Disclaimer: Views are personal.

References

  1. Sampling Theory, Module IV, NPTEL
  2. Sampling Theory, Module X, NPTEL
  3. Note on Sample Design and Estimation Procedure of NSS 77th round
  4. Instructions to Field Staff – Vol 1
  5. Instructions to Field Staff – Vol 2

Written By

Topics:

Related Articles