Skip to content
View in the app

A better way to browse. Learn more.

Benchmark Six Sigma Forum

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.
Message added by Mayank Gupta

Bootstrapping is a sampling method with replacement. Instead of selecting multiple independent samples from the population, initially a larger sample is drawn which is subjected to re-sampling by replacement. This creates multiple simulated samples of the same larger sample

 

An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Shashikant Adlakha on 21st February 2020

 

Applause for all the respondents - Shashikant Adlakha, Avinash

 

Also review the answer provided by Mr Venugopal R, Benchmark Six Sigma's in-house expert.

Bootstrapping

Featured Replies

Q 237. Why is Bootstrapping, a method of drawing samples with replacement, used an alternative to inferential statistics. Explain with simple examples. 

 

Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

Solved by Shashikant Adlakha

Hi, 

 

Bootstrapping sample is also one of method to predict result with accuracy, 

So1st start is Bootstrap sampling, this refers that sample taken from given population using simple random sampling. 

Eg 1000population 

We taken 100sample out of 1000 population

Again suppose took 5th observation not from sample rather we take it from population 

So from this example we can say Boot strap sample allows duplicate data also. 

Eg2-1000 population

We took sample out of this

S1    500observation

S2    500obs

S3    500obs

S100  500obs.

 

From each observation we built a model on training data and we get different model o/p.

Now since it's next case of ensemble learning so we again take decision on majority voting if it is classification problem and if it is regression we can take avg of model o/p.

Since we aggregate the o/p of different model so we call it as "Bagging".

One good part of Bootstrap sampling is we can skip all preprocessing statistical steps. Rather we can conclude based on majority voting result. It also have high accuracy as aggregation is happening from different model o/p.

  • Solution

Bootstrapping is  a kind of sampling technique, which involves random sampling along with replacement of the sample and was  documented by Bradley efron. it is a simple, yet powerful tool for drawing statistical inference without banking on much of the assumptions. Entire sampling distribution can be done just from one sample data and the best thing is that no formula is needed for any statistical inference. it is also applied in other statistical derivations such as confidence interval, regression model and machine learning.

Bootstrapping evaluates the property of a predictor (such as variance), by assessing these properties, when sampling again and again from the distribution. When the observations are coming from independent population, a number of resamples can be constituted  with replacement,  of the observed data set.

 

Bootstrapping  is based on the principle that  representation of a population from sample data(sample→ population) can be further modelled by resampling the  sample data and draw the inference about sample data (resampled→ sample). The error in a sample statistic against the original  population value is unknown, as we are unaware of entire population. As we are aware of the sample taken in,  the quality of representativeness of  resampled data (resampled → sample) to  original sample data can be evaluated by bootstrapping.

 

I.   Confidence Intervals: 

   There are  different tests available to  build confidence intervals:

·       T-Test

·       Two sample t-test

·       Z-test

·       chi-square Test

 

Bootstrapping approach can be substituted in  place of any of these. First, we calculate the mean of the original sample, that is presumed  to be representative of the entire population. By bootstrapping thousands of samples from original sample, means of all the samples  can be obtained . We can plot the sampling mean distribution curve and compute 95% confidence interval of means and evaluate if our original mean is falling in 95% interval. 

II. Hypothesis Testing with bootstrapped data:

After defining null and alternate hypothesis clearly, we can verify  according to 95% confidence interval of means of bootstrapped samples and conclude, if we are rejecting the null hypothesis and go with alternate hypothesis or fail to reject the null  hypothesis. We can also compute P values and also reject or  go with null hypothesis.

 

III. Power calculation:

 

Power and sample size calculations are dependent  mostly on the variance and standard deviation of the statistic of interest. When a small pilot sample is available, bootstrapping can  be done to derive large number of samples and calculation of variance.

 

IV. Assessing the distribution of the statistical data of interest:  To evaluate a theoretical  distribution of a data, when it is unknown and analyse the different parameters arising from this data. Bootstrapping is distribution independent and provides indirect assessment of distribution of the data.

 

Benchmark Six Sigma Expert View by Venugopal R

We  normally take a sample from a large population to estimate the parameters of the population. For instance, if we are interested to estimate the average height of male population in a country, we have to rely on the findings based on a random sample. However, we also know that the finding based on one sample would not be an accurate estimate, since the average that we obtain from another sample is bound to be different. This necessitates pulling multiple samples from the large population, so that we obtain a sampling distribution from which the population parameters could be derived easily and more accurately. The task of pulling multiple samples and conducting the measurement could prove cumbersome in certain cases.

 

Bradley Efron, an American statistician came up with a method in 1979, by which instead of taking different multiple samples, one large sample subjected to re-sampling with replacement could provide us with results that would be almost the same as we would have obtained by using multiple samples. He coined this method as “Bootstrap re-sampling”.

 

I will try to provide a brief explanation of this method as below:

  1. One large random and representative sample set, say sample size ‘N’, has to be picked up from the population being studied.
  2. Measure each unit in the sample-set and replace them back into the sample-set
  3. Pick one unit from the sample-set, measure it for the characteristics of interest and replace it into the sample-set. Pick another unit, measure it and replace it. When you repeat this procedure N times, you would have completed one “Bootstrap sample-set”.
  4. Keep repeating point no.3 ‘K’ times to obtain data from ‘K’ Bootstrap sample-sets, each containing ‘N’ samples
  5. It may be noted that since each unit is replaced before picking the next unit, there is a possibility of the same unit getting repeated within a Bootstrap sample-set

Thus, it is very likely that the composition of each of the K Bootstrap sample-sets would be different and hence the sample means and variances would also be different, mimicking the kind of variation that would have occurred had multiple samples been picked from the population.

 

Advantages of using Bootstrap re-sampling:

The need for collecting multiple samples and the associated measurement efforts are eliminated. Steps outlined above for the Bootstrap re-sampling method with random picking of units are best performed using computers.

With Bootstrap re-sampling, the estimate of variance will be less biased than obtained using small samples, and thus more representative of the population.

The applicability of the Central Limit Theorem, (by which the distribution of sample averages exhibit better normality properties, larger the sample size), increases with Bootstrap re-sampling.

 

Some limitations of the Bootstrap re-sampling method:

Bootstrap re-sampling work best with large sample sizes and the sample has to be very representative of the population.

This method may not be practical in the absence of computing facilities.

The practice of sample replacement would not be possible when the measurement of characteristic involves destructive methods.

  • Author

The winner for this question on Bootstrapping is Shashikant Adlakha.

 

Please go through the answer by Bencmark Expert Mr. Venugopal as well. (Especially the reference to Central Limit Theorem)

Create an account or sign in to comment

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.