Q 237. Why is Bootstrapping, a method of drawing samples with replacement, used an alternative to inferential statistics. Explain with simple examples. Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday. All questions so far can be seen here - https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/ Please visit the forum home page at https://www.benchmarksixsigma.com/forum/ to respond to the latest question open till the next Tuesday/ Friday evening 5 PM as per Indian Standard Time The best answer is always shown at the top among responses and the author finds honorable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term

Benchmark Six Sigma Expert View by Venugopal R We normally take a sample from a large population to estimate the parameters of the population. For instance, if we are interested to estimate the average height of male population in a country, we have to rely on the findings based on a random sample. However, we also know that the finding based on one sample would not be an accurate estimate, since the average that we obtain from another sample is bound to be different. This necessitates pulling multiple samples from the large population, so that we obtain a sampling distribution from which the population parameters could be derived easily and more accurately. The task of pulling multiple samples and conducting the measurement could prove cumbersome in certain cases. Bradley Efron, an American statistician came up with a method in 1979, by which instead of taking different multiple samples, one large sample subjected to re-sampling with replacement could provide us with results that would be almost the same as we would have obtained by using multiple samples. He coined this method as “Bootstrap re-sampling”. I will try to provide a brief explanation of this method as below: One large random and representative sample set, say sample size ‘N’, has to be picked up from the population being studied. Measure each unit in the sample-set and replace them back into the sample-set Pick one unit from the sample-set, measure it for the characteristics of interest and replace it into the sample-set. Pick another unit, measure it and replace it. When you repeat this procedure N times, you would have completed one “Bootstrap sample-set”. Keep repeating point no.3 ‘K’ times to obtain data from ‘K’ Bootstrap sample-sets, each containing ‘N’ samples It may be noted that since each unit is replaced before picking the next unit, there is a possibility of the same unit getting repeated within a Bootstrap sample-set Thus, it is very likely that the composition of each of the K Bootstrap sample-sets would be different and hence the sample means and variances would also be different, mimicking the kind of variation that would have occurred had multiple samples been picked from the population. Advantages of using Bootstrap re-sampling: The need for collecting multiple samples and the associated measurement efforts are eliminated. Steps outlined above for the Bootstrap re-sampling method with random picking of units are best performed using computers. With Bootstrap re-sampling, the estimate of variance will be less biased than obtained using small samples, and thus more representative of the population. The applicability of the Central Limit Theorem, (by which the distribution of sample averages exhibit better normality properties, larger the sample size), increases with Bootstrap re-sampling. Some limitations of the Bootstrap re-sampling method: Bootstrap re-sampling work best with large sample sizes and the sample has to be very representative of the population. This method may not be practical in the absence of computing facilities. The practice of sample replacement would not be possible when the measurement of characteristic involves destructive methods.

Message added by Mayank Gupta

Bootstrapping is a sampling method with replacement. Instead of selecting multiple independent samples from the population, initially a larger sample is drawn which is subjected to re-sampling by replacement. This creates multiple simulated samples of the same larger sample

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Shashikant Adlakha on 21st February 2020

Applause for all the respondents - Shashikant Adlakha, Avinash

Also review the answer provided by Mr Venugopal R, Benchmark Six Sigma's in-house expert.

Bootstrapping

bootstrapping

Followers

February 6, 20206 yr

Q 237. Why is Bootstrapping, a method of drawing samples with replacement, used an alternative to inferential statistics. Explain with simple examples.

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

All questions so far can be seen here - https://www.benchmarksixsigma.com/forum/lean-six-sigma-business-excellence-questions/
Please visit the forum home page at https://www.benchmarksixsigma.com/forum/ to respond to the latest question open till the next Tuesday/ Friday evening 5 PM as per Indian Standard Time
The best answer is always shown at the top among responses and the author finds honorable mention in our Business Excellence dictionary at https://www.benchmarksixsigma.com/forum/business-excellence-dictionary-glossary/ along with the related term

Solved by Shashikant Adlakha

February 21, 20206 yr

Go to solution

February 19, 20206 yr

Hi,

Bootstrapping sample is also one of method to predict result with accuracy,

So1st start is Bootstrap sampling, this refers that sample taken from given population using simple random sampling.

Eg 1000population

We taken 100sample out of 1000 population

Again suppose took 5th observation not from sample rather we take it from population

So from this example we can say Boot strap sample allows duplicate data also.

Eg2-1000 population

We took sample out of this

S1 500observation

S2 500obs

S3 500obs

S100 500obs.

From each observation we built a model on training data and we get different model o/p.

Now since it's next case of ensemble learning so we again take decision on majority voting if it is classification problem and if it is regression we can take avg of model o/p.

Since we aggregate the o/p of different model so we call it as "Bagging".

One good part of Bootstrap sampling is we can skip all preprocessing statistical steps. Rather we can conclude based on majority voting result. It also have high accuracy as aggregation is happening from different model o/p.

February 21, 20206 yr

Solution

Bootstrapping is a kind of sampling technique, which involves random sampling along with replacement of the sample and was documented by Bradley efron. it is a simple, yet powerful tool for drawing statistical inference without banking on much of the assumptions. Entire sampling distribution can be done just from one sample data and the best thing is that no formula is needed for any statistical inference. it is also applied in other statistical derivations such as confidence interval, regression model and machine learning.

Bootstrapping evaluates the property of a predictor (such as variance), by assessing these properties, when sampling again and again from the distribution. When the observations are coming from independent population, a number of resamples can be constituted with replacement, of the observed data set.

Bootstrapping is based on the principle that representation of a population from sample data(sample→ population) can be further modelled by resampling the sample data and draw the inference about sample data (resampled→ sample). The error in a sample statistic against the original population value is unknown, as we are unaware of entire population. As we are aware of the sample taken in, the quality of representativeness of resampled data (resampled → sample) to original sample data can be evaluated by bootstrapping.

I. Confidence Intervals:

There are different tests available to build confidence intervals:

· T-Test

· Two sample t-test

· Z-test

· chi-square Test

Bootstrapping approach can be substituted in place of any of these. First, we calculate the mean of the original sample, that is presumed to be representative of the entire population. By bootstrapping thousands of samples from original sample, means of all the samples can be obtained . We can plot the sampling mean distribution curve and compute 95% confidence interval of means and evaluate if our original mean is falling in 95% interval.

II. Hypothesis Testing with bootstrapped data:

After defining null and alternate hypothesis clearly, we can verify according to 95% confidence interval of means of bootstrapped samples and conclude, if we are rejecting the null hypothesis and go with alternate hypothesis or fail to reject the null hypothesis. We can also compute P values and also reject or go with null hypothesis.

III. Power calculation:

Power and sample size calculations are dependent mostly on the variance and standard deviation of the statistic of interest. When a small pilot sample is available, bootstrapping can be done to derive large number of samples and calculation of variance.

IV. Assessing the distribution of the statistical data of interest: To evaluate a theoretical distribution of a data, when it is unknown and analyse the different parameters arising from this data. Bootstrapping is distribution independent and provides indirect assessment of distribution of the data.

February 21, 20206 yr

Benchmark Six Sigma Expert View by Venugopal R

We normally take a sample from a large population to estimate the parameters of the population. For instance, if we are interested to estimate the average height of male population in a country, we have to rely on the findings based on a random sample. However, we also know that the finding based on one sample would not be an accurate estimate, since the average that we obtain from another sample is bound to be different. This necessitates pulling multiple samples from the large population, so that we obtain a sampling distribution from which the population parameters could be derived easily and more accurately. The task of pulling multiple samples and conducting the measurement could prove cumbersome in certain cases.

Bradley Efron, an American statistician came up with a method in 1979, by which instead of taking different multiple samples, one large sample subjected to re-sampling with replacement could provide us with results that would be almost the same as we would have obtained by using multiple samples. He coined this method as “Bootstrap re-sampling”.

I will try to provide a brief explanation of this method as below:

One large random and representative sample set, say sample size ‘N’, has to be picked up from the population being studied.
Measure each unit in the sample-set and replace them back into the sample-set
Pick one unit from the sample-set, measure it for the characteristics of interest and replace it into the sample-set. Pick another unit, measure it and replace it. When you repeat this procedure N times, you would have completed one “Bootstrap sample-set”.
Keep repeating point no.3 ‘K’ times to obtain data from ‘K’ Bootstrap sample-sets, each containing ‘N’ samples
It may be noted that since each unit is replaced before picking the next unit, there is a possibility of the same unit getting repeated within a Bootstrap sample-set

Thus, it is very likely that the composition of each of the K Bootstrap sample-sets would be different and hence the sample means and variances would also be different, mimicking the kind of variation that would have occurred had multiple samples been picked from the population.

Advantages of using Bootstrap re-sampling:

The need for collecting multiple samples and the associated measurement efforts are eliminated. Steps outlined above for the Bootstrap re-sampling method with random picking of units are best performed using computers.

With Bootstrap re-sampling, the estimate of variance will be less biased than obtained using small samples, and thus more representative of the population.

The applicability of the Central Limit Theorem, (by which the distribution of sample averages exhibit better normality properties, larger the sample size), increases with Bootstrap re-sampling.

Some limitations of the Bootstrap re-sampling method:

Bootstrap re-sampling work best with large sample sizes and the sample has to be very representative of the population.

This method may not be practical in the absence of computing facilities.

The practice of sample replacement would not be possible when the measurement of characteristic involves destructive methods.

6 yr6 yr Rohit Gandhi locked this topic

February 21, 20206 yr

Author

The winner for this question on Bootstrapping is Shashikant Adlakha.

Please go through the answer by Bencmark Expert Mr. Venugopal as well. (Especially the reference to Central Limit Theorem)

3 yr3 yr Rohit Gandhi unlocked this topic

Create an account or sign in to comment

Followers

Go to topic listing

Bootstrapping

Featured Replies

Solved by Shashikant Adlakha

II. Hypothesis Testing with bootstrapped data:

Create an account or sign in to comment

Who's Online (See full list)

Lead AI Transformation without coding

Most Solved

Forum Statistics

Member Statistics

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)