|
The Genesis
of Modern Sampling Theory
-
Most target populations in social and
business studies have specific demographic or commercial structures
that are of interest to researchers. Any sample selected for production of
official statistics is naturally expected to be as representative as possible to the population it was
selected from.
|
What
is a representative sample?
A
sample is said to be representative of a population
if the sample and population distributions are similar with
respect to some key characteristics. The
activity that aims to ensure the representativeness of the sample is called
Sampling Design. |
-
Samples selected according to a specific design are
referred to as Complex Samples. Depending on the nature of
the population, sampling designs may vary considerably in their level of complexity.
Once a complex sample is selected and the data collected, each
sampled individual is assigned a weight.
|
What
is a weight?
Each
individual in the sample represents a group of individuals
in the target population. The number of individuals in that
group represents the weight associated with the sample
individual. Suppose 100,000 individuals aged 25 or
more are selected randomly from the Northeast region of the
Unites States, where about 36,000,000 people in the same age
range live. Therefore, each of the 100,000 sample
individuals represents approximately 36,000,000 / 100,000 =
360 individuals in the population |
-
All statistics such as the sums or the
means
produced from a complex sample must be calculated as weighted sums or
weighted means using the weights defined as above. The question now
is how to evaluate the precision of estimates obtained from complex samples. Standard errors, variances, confidence intervals
must be calculated. Hypothesis testing must be performed.
-
The problem: Abstract
statistical models do not refer to any specific population of
interest and are therefore inappropriate as a framework for
statistical inference from complex samples. It became apparent in the first half of the
twentieth century that a new framework was needed to address
inferential problems in the context of finite populations. Such a
framework was created in an incredible tour de force by the Polish
mathematician Jerzy Neyman; giving birth to what is known today as
the modern sample survey theory, also referred to as Finite
Population Sampling. This theory is well documented in key
reference books such as Cochrane (1977), Sarndal et al.
(1992), and Kish (1965).
-
Since its creation, the modern theory
of sampling has considerably gained in complexity overtime.
Inferential methods such as the Jackknife or the Bootstrap (and
other replication methods) have been developed. A plethora of
software products (WesVar, SUDAAN, STATA, and more) for handling
complex samples are proposed by many vendors.
|