Sampling

June 5, 20206 yr

Q 268. 'The results are only as good as the sample' and hence it is imperative to select a good sample. What are the key considerations while sampling in order to get a good sample?

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

All questions so far can be seen here - https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/
Please visit the forum home page at https://www.benchmarksixsigma.com/forum/ to respond to the latest question open till the next Tuesday/ Friday evening 5 PM as per Indian Standard Time. Questions launched on Tuesdays are open till Friday and questions launched on Friday are open till Tuesday.
The best answer is always shown at the top among responses and the author finds honorable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term

June 5, 20206 yr

Why Use Data Sampling?
Sometimes When you trying to gather information on a complete population is just cost prohibitive. Think about CNN’s TV Channel coverage of an election cycle in the USA. It is not that possible to ask every voter how they voted & WHom they gonna vote. Even if it were, not all would answer. Instead of that they use exit polls to derive statistical conclusions about the population as a whole.

Concerns About Data Sampling
When you are taking a sample from larger population you must make sure that the samples are an appropriate size and are sampled without any bias. You should address these concerns while collecting data

For example, it is very helpful if the sample size is large enough for the data to follow normal distribution as this will really opens the door to use an array of statistical tools.

How Large Should your Data Sample Be?

calculation for how large a sample data set should be actually depends on:

Type of data (continuous or discrete) being measured
How much precisely you want your statistical inferences to be.
Estimate of the standard deviation or historical standard deviation for the entire population.
confidence level desired.

Below points are for Hypothesis test

Sample size needed for hypothesis tests depend on:

Desired Risk (Both alpha and beta)
Minimum value to be detected in between the population means
variation in the characteristic being measured (S or sigma) – the population variance.
Even parameter shift sensitivity
(Population size does NOT come into the determination of how big a population is.)

1

June 5, 20206 yr

Key considerations in order to get a good sample are:

- Clarity on the end result one wants to achieve.

- Alignment of the sample selection to the organisation’s business & expectations.

- Processes & tools needed to select the right sample.

- Follow up approach to sustain the sampling results.

June 9, 20206 yr

Benchmark Six Sigma Expert View by Venugopal R

Statistical Sampling is a method that has been prevalent for long to help assess the characteristics about a population. Though the best option would be to assess the entire population, it may practically not be possible and hence the dependency on sampling to take decisions.

Sampling Risks:

While every method of sampling is associated with risk of errors, it is possible to understand and even quantify these risks and thus take an informed decision. Most of us will be aware about sampling errors, but we can have a quick recap as below:

1. Risk of declaring a good population as bad (alpha risk)

2. Risk of declaring a bad population as good (beta risk)

Any sampling plan is governed by its operating characteristic curve (OC curve) that depicts and quantifies these risks. The OC curves and the sampling plans based on them have been very widely used in business for the purpose of deciding the appropriate acceptance sampling. However, I am not elaborating on this topic further here since there are many other aspects of sampling to be covered.

Sampling Frame:

To obtain a representative sample from a population, it is important to define the ‘sampling frame’. The sampling frame is the set of units that exhaustively represents the universe from which we take a sample. For instance, if we need to pick a sample for assessing customer satisfaction for a certain product and we pick the sample customers based on the credit card details, the sample will not cover the set of customers who paid through other means, and it is possible that their levels of satisfaction could be markedly different. Hence, in this case, the ‘sampling frame’ should incorporate inclusion of customers from all modes of payment.

A sampling frame should be defined in such a manner that it considers and represents all possible stratification of the population. The number of units in the population not covered by the frame is known as ‘gap’. If the units in the gap are distributed like the units in the frame, then the sample will be a good representation of the population. Samples taken without using a frame are called as ‘non-probability’ samples, where as the samples taken using frames are called as ‘probability’ samples. It is recommended to use probability sampling, whenever possible, so that valid statistical inferences could be derived.

Let us discuss various types of probability samples that could be used for different situations:

Simple Random Sample:

This is one of the most basic sampling methods. In this method there is a random chance for picking up any item from a population of N items. The lot of N items represents the frame. One may use random numbers and pick the samples

Stratified Sampling:

Here the N items in a population are divided into sub-groups or strata, based on a characteristic of relevance. A simple random sample is selected from each stratum and the combined result is obtained. For instance, if we need to pick a sample to perform a medical test from a population of the state, we can sub-classify them into districts and pick random samples from each district. Stratified sampling technique can help to reduce the overall sample size to obtain the same level of confidence on inferences. Further, it will also help to understand if any heterogeneity is present between the strata.

Systematic Sampling:

In systematic sampling, we classify all the items in the frame into groups by dividing the total number of items by the sample size. A very simple example of sequential sampling is to pick every n^th item from a production line for inspection. While this sampling method gives a uniform coverage across the frame, one has to be cautious of certain disadvantages. For instance, imagine this is used for assessing the travel experience of people who got off a flight, and the method followed was to pick every 12^th passenger who exits. There is a possibility that you might be picking up more passengers who were occupying a particular seat location, say, window seat, and thus likely to introduce bias in the sampling.

Cluster Sampling:

All the items in the frame are divided into clusters. Clusters are naturally occurring sub-categories of the frame. Example: Districts within a state, Colleges within a region etc. Out of n number of clusters, a few samples are selected and all the items in that cluster are studied. It may be noticed that the cluster sampling method is different from the stratified sampling method. Cluster sampling could result in increased sample size, but sometimes it may be convenient and reduce need to travel.

Keeping the objective in mind, the sampling strategy and method will have to be decided, so that the inferences based on the sample will meaningfully and reliability representative.

June 9, 20206 yr

Sampling Strategy

When

Within the scope of each data collection

Goal

Samples save time and effort when data is collected

· when it is impractical, impossible or too expensive to collect all data

· when the data collection is a Cumbersome process

Deriving a sampling strategy, which provides the most accurate level of information about the population being measured.so objective is to meet the goals of data collection but optimize the effort and cost

The sampling strategy comprises the methodology for selecting samples as well as planning the sample size. This basic procedure can be divided into four phases :

1.The selection of samples should be entirely random

2.Choose a selection principle and a selection type

different types of selection and selection principles will be driven by cost and effort criteria

they vary depending on the question being asked

3.Determine a selection technique in case of random selection

Non-random Selection

Random Selection

Quota Procedure Guideline of quotas

e.g. accident repair

Application: If only targeted information is needed

Simple Sample

All units have the same chance of being drawn

Advantage: No knowledge about population necessary

Disadvantage: High effort

Cut-off Procedure

Only a part of the population is observed, e.g. accident damage

Application: If only one aspect is to be examined

Cluster Sample

The population is clustered in a logical way and one cluster is selected e.g. sites

Advantage: Lower costs

Disadvantage: Information can get Lost

Haphazard Selection

Example: Only the information which can be obtained easily, is collected

Application: If only a first impression is to be gained e.g. for estimation of proportion or standard deviation for more precise sample size calculation

Stratified Sample

The population is stratified according to relevant criteria, e.g. spray-painting type, machine, location etc. Then a representative sample is removed from each stratum

Advantage: Smaller sample

Disadvantage: Information on the population must be available to start with

4.Determine the sample size

The bigger the sample the greater the validity i.e. the quality of the statistical conclusion about the population

One should therefore revert to available data (e.g. from IT systems): The data is treated like sample since the process to be improved hasn't yet been stopped

When new data is collected (e.g. manual counting, surveys) an assessment of the cost of collection and desired level of confidence and precision must take place

All in all three factors play a role when the sample size is determined:

· the required Confidence Level which indicates the likelihood that the population mean lies within the given Confidence Interval. This value is normally a given for any organization e.g. 95%

· The granularity is an indication of how precise we want to be and is usually half the width of the Confidence Interval

· The costs and the duration of the data measurement increase with the sample size. When the sample sizes are calculated it is important to consider whether the requested precision is worth the costs inc

Rules of thumb for sample size

· Discrete100, at least 5 per category , Data : Ok/ Not Ok

- Continuous - 30

1

June 9, 20206 yr

There is no best answer for this question.

Sampling is done to derive meaningful inferences about the population. Some of the key considerations for sampling are

1. Purpose of the study

2. Cost and time available for the study

3. Permissible errors in the study (alpha and beta)

4. Population Information (sampling frame) available

5. Sampling method (to ensure a non-biased and representative sample)

6. Sample size

Also review the answer provided by Mr Venugopal R, Benchmark Six Sigma's in-house expert.

1

Sampling

Featured Replies

Create an account or sign in to comment

Who's Online (See full list)

Lead AI Transformation without coding

Most Solved

Forum Statistics

Member Statistics

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)