How we address the Sample Size Dilemma?

1. Introduction

1.1 General

Sample-size determination is often an important step in planning a statistical study—and it is usually a difficult one. Among the important hurdles to be surpassed, one must obtain an estimate of one or more error variances, and specify an effect size of importance. There is the temptation to take some shortcuts. This paper offers some suggestions for successful and meaningful sample-size determination. Ever-changing markets are a difficult environment for testing. You need to control as many sources of variation as you can and then gather enough data for test results to rise above the noise. Sample size calculations are the tool to help you decide how much is “enough.” A big part of planning to succeed is figuring out how many observations you will need in order to meet the objectives of your project. Taking observations costs time and money, so we want to make sure we get just the right amount to make inferences about our outcomes of interest

1.2 Importance of Right Sample Size

Statistical studies (surveys, experiments, observational studies, etc.) are always better when they are carefully planned. Good planning has many aspects. The problem should be carefully defined and operationalised. Experimental or observational units must be selected from the appropriate population. The sample must be of adequate size, relative to the goals of the study. It must be “big enough” that an effect of such magnitude as to be of scientiï¬c signiï¬cance will also be statistically signiï¬cant. It is just as important, however, that the sample is not “too big,” where an effect of little scientiï¬c importance is nevertheless statistically detectable. Sample size is important for economic reasons: An under-sized sample can be a waste of resources for not having the capability to produce useful results, while an over-sized one uses more resources than are necessary. In an experiment involving human or animal subjects, sample size is a pivotal issue for ethical reasons. Sample size calculation is a more complex topic than can be covered in-depth here, but there are several key items you should start thinking about before you consult with a statistician or other researcher familiar with sample size calculations.

1.3 Previous Literature

For such an important issue, there is a surprisingly small amount of published literature. Important general references include Mace (1964), Kraemer and Thiemann (1987), Cohen (1988), Desu and Raghavarao, (1990), Lipsey (1990), Shuster (1990), and Odeh and Fox (1991). There are numerous articles, especially in biostatistics journals, concerning sample-size determination for speciï¬c tests. Also of interest are studies of the extent to which sample size is adequate or inadequate in published studies; see Freiman et al. (1986) and Thornley and Adams (1998). There is a growing amount of software for sample-size determination, including nQuery Advisor (Elashoff, 2000), PASS (Hintze, 2000), UnifyPow (O’Brien, 1998), and Power and Precision (Borenstein et al., 1997). Web resources include a comprehensive list of power-analysis software (Thomas, 1998) and online calculators such as Lenth (2000). Wheeler (1974) provides some useful approximations for use in linear models; Castelloe (2000) gives an up-to-date overview of computational methods.

2. Sample Size

2.1 There are several approaches to sample size. For example, one can specify the desired width of a conï¬dence interval and determine the sample size that achieves that goal; or a Bayesian approach can be used where we optimize some utility function—perhaps one that involves both precision of estimation and cost. One of the most popular approaches to sample-size determination involves studying the power of a test of hypothesis. It is the approach emphasized here, although much of the discussion is applicable in other contexts. The power approach involves these elements:

(a) Specify a hypothesis test on a parameter θ (along with the underlying probability model for the data).

(b ) Specify the signiï¬cance level α of the test.

(c ) Specify an effect size θ that reï¬‚ects an alternative of scientiï¬c interest.

(d) Obtain historical values or estimates of other parameters needed to compute the power function of the test.

2.2 Determining the sample size is one of the early steps that must be taken in the planning of a survey. Unfortunately, there is no magic formula that will tell us what the perfect sample is since there are several factors we need to think about.

3. What Drives Our Needed Sample Size?

There are a few concerns that drive the sample size required for a meaningful test:

(a) We want to be reasonably sure that we don’t have a false positive—that there is no real difference, but we detect one anyway. Statisticians call this Type I error.

(b ) We want to be reasonably sure that we don’t miss a positive outcome (or get a false negative). This is called Type II error.

We want to know whether a variation is better, worse or the same as the original.

4. Factors for Right Sample Size

4.1 Analytical Plan The research objectives and planned analytical approach should be the first factor to consider when making the decision on sample size. For instance, there are statistical procedures (e.g. regression analysis) that require a certain number of observations per variable. Moreover, if comparative analysis between subgroups in the sample is expected, the sample size should be adjusted for it to be able to identify statistically significant differences between the groups.

4.2 Population Variability This refers to the target population's diversity. If the target population exhibits large variability in the behaviors and attitudes of interest being researched, a large sample is needed. If 20% or 80% of the population behaves in certain way, this indicates less variability than if 50% would do so. To be conservative, it is standard practice to use 50% (0.5) as the event probability in sample size calculations since it represents the highest variability that can be expected in the population.

4.2 Level of Confidence This is the level of risk we are willing to tolerate usually expressed as a percentage (e.g. 95% confidence level). Although survey results are reported as point estimates (e.g. 75% of respondents like this product), the fact is that since we are working with a sample of the target population, we can only be confident that the true value of the estimate in that population falls within a particular range or what is called confidence interval. The level of confidence indicates the probability that the true value of the estimate in fact will fall within the boundaries of the confidence interval. How confident can you be? As confident as your tolerance for risk allows you to, knowing that the confidence level is inversely proportional to estimate accuracy or margin of error. The more the level of confident, the larger the sample size.

4.3 Margin of Error Also known as sampling error, indicates the desired level of precision of the estimate. You have probably seen poll results quoted in the media, saying that the margin of error was plus or minus a particular percentage (e.g. +/-3%). This percentage defines the lower and upper bounds of the confidence interval likely to include the parameter estimate, and it is a measure of its reliability. The smaller the margin of error, the larger the sample size and the greater the estimate precision.

4.4 Cost Sample size cost is often one of the largest items in the budget for market research studies, especially if the target sample includes low-incidence segments or the response rates is low. Many times, we have to make a tradeoff between statistical accuracy and research cost.

4.5 Population Size Most of the time, the size of the total target population is unknown, and it is assumed to be large ( >100,000), but in studies where the sample is a large fraction of the population of interest, some adjustments may be needed.

5. Sample Size Calculation Check List As a summary, to determine the sample size needed in a survey, we need to answer the following questions:

(a) What type of data of data analysis will be conducted? Will subgroups be compared?

((b ) What is the probability of the event occurring? - If not previous data exists, use 50% for a conservative sample size estimate.

(c ) How much error is tolerable (confidence interval)? How much precision do we need?

(d) How confident do we need to be that the true population value falls within the confidence interval?

(e) What is the research budget? Can we afford the desired sample?

(f) What is the population size? Large? Small/Finite? If unknown, assume it to be large ( >100,000).

6. How to Choose Sample Size for a Simple Random Sample To choose the right sample size for a simple random sample, you need to define the following inputs.

(a) Specify the desired margin of error ME. This is your measure of precision.

(b ) Specify alpha (α).

(i) For a hypothesis test, alpha (α) is the

significance level

(ii) For an estimation problem, alpha (α) is equal to 1 -

Confidence level

(c ) Find the critical

standard score

(i) For an

estimation problem

or for a

two-tailed hypothesis test

, the critical standard score (z) is the value for which the cumulative probability is 1 - alpha/2.

(ii) For a

one-tailed hypothesis test

, the critical standard score (z) is the value for which the cumulative probability is 1 – alpha.

(d) Unless the population size is very large, you need to specify the size of the population (N).

(e) Given these inputs, the following formulas find the smallest sample size that provides the desired level of precision.

Sample statistic

Population size

Sample size

Mean Known n = { z

* σ

* [ N / (N - 1) ] } / { ME

+ [ z

* σ

/ (N - 1) ] }

Mean Unknown n = ( z

* σ

) / ME

Proportion Known n = [ ( z

* p * q ) + ME

] / [ ME

+ z

* p * q / N ]

Proportion Unknown n = [ ( z

* p * q ) + ME

] / ( ME

)

This approach works when the sample size is relatively large (greater than or equal to 30). Use the first or third formulas when the population size is known. When the population size is large but unknown, use the second or fourth formulas.

For proportions, the sample size requirements vary, based on the value of the proportion. If you are unsure of the right value to use, set

equal to 0.5. This will produce a conservative sample size estimate; that is, the sample size will produce

at least

the precision called for and may produce better precision. A number of tools like Minitab also exist which can be used to calculate the sample size.

Conclusion

Sample-size planning is often important, and almost always difï¬cult. It requires care in eliciting scientiï¬c objectives and in obtaining suitable quantitative information prior to the study. Successful resolution of the sample-size problem requires the close and honest collaboration of statisticians and subject-matter experts. Various types of changes to the study can be recommended if it turns out to be over or under-powered. Sample-size problems are context-dependent. Moreover, sample size is not always the main issue; it is only one aspect of the quality of a study design.

References

1. Odeh, R. E. and Fox, M. (1991), Sample Size Choice: Charts for Experiments with Linear Models, Marcel Dekker, New York, second edn.

2. Mace, A. E. (1964), Sample-size determination, Reinhold, New York.

About the Author

1. Lt Col Hardeep Sandhu is a serving army officer. He has a certificate in advanced computing and Six Sigma and is an Electrical Engineer, MBA in HR & Marketing and M Phil in Management.

2. The officer was third in merit at the national level for selection in the National Defence Academy, Khadakwasla and first in order of merit for commissioning into the Madras Sappers. He has been an Ambassador of the Country in a United Nations Mission in Africa. He has been commended for his role in saving over 100 people over a night in 1997 when the River Brahmaputra was in spate. In addition, he has been commended for his role in the Platinum Jubilee at Indian Military Academy, Dehradun in 2008 and for the Army and Republic Day Parade in 2009.

3. The officer is a qualified Commando. He has represented the Army in a Sailing Expedition and is a keen sailor, rider and golfer. He is an avid traveler and painter and is a gold medalist in hockey, football, volleyball and sailing.

How we address the Sample Size Dilemma?

User Feedback

Recommended Comments

Create an account or sign in to comment

Member Statistics

Who's Online (See full list)

Categories

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)