1. Introduction
1.1 General
Sample-size determination is often an important step in planning a statistical study—and it is usually a difficult one. Among the important hurdles to be surpassed, one must obtain an estimate of one or more error variances, and specify an effect size of importance. There is the temptation to take some shortcuts. This paper offers some suggestions for successful and meaningful sample-size determination. Ever-changing markets are a difficult environment for testing. You need to control as many sources of variation as you can and then gather enough data for test results to rise above the noise. Sample size calculations are the tool to help you decide how much is “enough.” A big part of planning to succeed is figuring out how many observations you will need in order to meet the objectives of your project. Taking observations costs time and money, so we want to make sure we get just the right amount to make inferences about our outcomes of interest
1.2 Importance of Right Sample Size
Statistical studies (surveys, experiments, observational studies, etc.) are always better when they are carefully planned. Good planning has many aspects. The problem should be carefully defined and operationalised. Experimental or observational units must be selected from the appropriate population. The sample must be of adequate size, relative to the goals of the study. It must be “big enough” that an effect of such magnitude as to be of scientiï¬c signiï¬cance will also be statistically signiï¬cant. It is just as important, however, that the sample is not “too big,” where an effect of little scientiï¬c importance is nevertheless statistically detectable. Sample size is important for economic reasons: An under-sized sample can be a waste of resources for not having the capability to produce useful results, while an over-sized one uses more resources than are necessary. In an experiment involving human or animal subjects, sample size is a pivotal issue for ethical reasons. Sample size calculation is a more complex topic than can be covered in-depth here, but there are several key items you should start thinking about before you consult with a statistician or other researcher familiar with sample size calculations.
1.3 Previous Literature
For such an important issue, there is a surprisingly small amount of published literature. Important general references include Mace (1964), Kraemer and Thiemann (1987), Cohen (1988), Desu and Raghavarao, (1990), Lipsey (1990), Shuster (1990), and Odeh and Fox (1991). There are numerous articles, especially in biostatistics journals, concerning sample-size determination for speciï¬c tests. Also of interest are studies of the extent to which sample size is adequate or inadequate in published studies; see Freiman et al. (1986) and Thornley and Adams (1998). There is a growing amount of software for sample-size determination, including nQuery Advisor (Elashoff, 2000), PASS (Hintze, 2000), UnifyPow (O’Brien, 1998), and Power and Precision (Borenstein et al., 1997). Web resources include a comprehensive list of power-analysis software (Thomas, 1998) and online calculators such as Lenth (2000). Wheeler (1974) provides some useful approximations for use in linear models; Castelloe (2000) gives an up-to-date overview of computational methods.
2. Sample Size
2.1 There are several approaches to sample size. For example, one can specify the desired width of a conï¬dence interval and determine the sample size that achieves that goal; or a Bayesian approach can be used where we optimize some utility function—perhaps one that involves both precision of estimation and cost. One of the most popular approaches to sample-size determination involves studying the power of a test of hypothesis. It is the approach emphasized here, although much of the discussion is applicable in other contexts. The power approach involves these elements:
(a) Specify a hypothesis test on a parameter θ (along with the underlying probability model for the data).
(b ) Specify the signiï¬cance level α of the test.
(c ) Specify an effect size θ that reflects an alternative of scientiï¬c interest.
(d) Obtain historical values or estimates of other parameters needed to compute the power function of the test.
2.2 Determining the sample size is one of the early steps that must be taken in the planning of a survey. Unfortunately, there is no magic formula that will tell us what the perfect sample is since there are several factors we need to think about.
3. What Drives Our Needed Sample Size?
There are a few concerns that drive the sample size required for a meaningful test:
(a) We want to be reasonably sure that we don’t have a false positive—that there is no real difference, but we detect one anyway. Statisticians call this Type I error.
(b ) We want to be reasonably sure that we don’t miss a positive outcome (or get a false negative). This is called Type II error.
We want to know whether a variation is better, worse or the same as the original.
4. Factors for Right Sample Size
4.1 Analytical Plan The research objectives and planned analytical approach should be the first factor to consider when making the decision on sample size. For instance, there are statistical procedures (e.g. regression analysis) that require a certain number of observations per variable. Moreover, if comparative analysis between subgroups in the sample is expected, the sample size should be adjusted for it to be able to identify statistically significant differences between the groups.
4.2 Population Variability This refers to the target population's diversity. If the target population exhibits large variability in the behaviors and attitudes of interest being researched, a large sample is needed. If 20% or 80% of the population behaves in certain way, this indicates less variability than if 50% would do so. To be conservative, it is standard practice to use 50% (0.5) as the event probability in sample size calculations since it represents the highest variability that can be expected in the population.
4.2 Level of Confidence This is the level of risk we are willing to tolerate usually expressed as a percentage (e.g. 95% confidence level). Although survey results are reported as point estimates (e.g. 75% of respondents like this product), the fact is that since we are working with a sample of the target population, we can only be confident that the true value of the estimate in that population falls within a particular range or what is called confidence interval. The level of confidence indicates the probability that the true value of the estimate in fact will fall within the boundaries of the confidence interval. How confident can you be? As confident as your tolerance for risk allows you to, knowing that the confidence level is inversely proportional to estimate accuracy or margin of error. The more the level of confident, the larger the sample size.
4.3 Margin of Error Also known as sampling error, indicates the desired level of precision of the estimate. You have probably seen poll results quoted in the media, saying that the margin of error was plus or minus a particular percentage (e.g. +/-3%). This percentage defines the lower and upper bounds of the confidence interval likely to include the parameter estimate, and it is a measure of its reliability. The smaller the margin of error, the larger the sample size and the greater the estimate precision.
4.4 Cost Sample size cost is often one of the largest items in the budget for market research studies, especially if the target sample includes low-incidence segments or the response rates is low. Many times, we have to make a tradeoff between statistical accuracy and research cost.
4.5 Population Size Most of the time, the size of the total target population is unknown, and it is assumed to be large ( >100,000), but in studies where the sample is a large fraction of the population of interest, some adjustments may be needed.
5. Sample Size Calculation Check List As a summary, to determine the sample size needed in a survey, we need to answer the following questions:
(a) What type of data of data analysis will be conducted? Will subgroups be compared?
((b ) What is the probability of the event occurring? - If not previous data exists, use 50% for a conservative sample size estimate.
(c ) How much error is tolerable (confidence interval)? How much precision do we need?
(d) How confident do we need to be that the true population value falls within the confidence interval?
(e) What is the research budget? Can we afford the desired sample?
(f) What is the population size? Large? Small/Finite? If unknown, assume it to be large ( >100,000).
6. How to Choose Sample Size for a Simple Random Sample To choose the right sample size for a simple random sample, you need to define the following inputs.
(a) Specify the desired margin of error ME. This is your measure of precision.
(b ) Specify alpha (α).
Recommended Comments
Create an account or sign in to comment