Why Use Data Sampling?
Sometimes When you trying to gather information on a complete population is just cost prohibitive. Think about CNN’s TV Channel coverage of an election cycle in the USA. It is not that possible to ask every voter how they voted & WHom they gonna vote. Even if it were, not all would answer. Instead of that they use exit polls to derive statistical conclusions about the population as a whole.
Concerns About Data Sampling
When you are taking a sample from larger population you must make sure that the samples are an appropriate size and are sampled without any bias. You should address these concerns while collecting data
For example, it is very helpful if the sample size is large enough for the data to follow normal distribution as this will really opens the door to use an array of statistical tools.
How Large Should your Data Sample Be?
calculation for how large a sample data set should be actually depends on:
Type of data (continuous or discrete) being measured
How much precisely you want your statistical inferences to be.
Estimate of the standard deviation or historical standard deviation for the entire population.
confidence level desired.
Below points are for Hypothesis test
Sample size needed for hypothesis tests depend on:
Desired Risk (Both alpha and beta)
Minimum value to be detected in between the population means
variation in the characteristic being measured (S or sigma) – the population variance.
Even parameter shift sensitivity
(Population size does NOT come into the determination of how big a population is.)