There are two common statistical approaches that are being followed when it comes to statistical testing i.e. The Frequentist Approach, which is based on the observation of data at a given moment or instance & The Bayesian Approach, which is basically a forecasting approach & it involves analyzing prior information.
The frequentist approach is also described as experimental or inductive as it relies on observations while the bayesian approach is theoretical or deductive as it enables to combine the information provided by data with a priori knowledge from previous studies or expert opinions.
Let us take a very simple example to understand both the concepts:-
Let us toss a coin 10 times, now when it comes to frequentist approach, the probability of getting either a head or a tail is 0.5, now let’s say we get heads on 7 out of 10 tosses, then the probability of getting the heads will be 7/10 i.e. 0.7.
Now let’s say we have a prior information through previous experiments of expert experience that heads will come 6 out of 10 times thus we have a prior probability of 0.6, now we will compare the outcome of the experiments with this prior probability.
Thus we can say that the objective of the frequentist approach is to explore the data collected in order to identify a significant effect that could only be explained through by the hypothesis of the experiment & for the bayesian approach the focus is on comparing two hypothesis by comparing the data collected at the time of the experiment with the prior information available therefore assessing the chances that one was true comparison to other.
As an organization performing experiments & relying on statistical analysis for analyzing the results of these experiments, it is thus important to understand the difference between the above two approaches on the basis of different parameters which are as shown below:-
In terms of analyzing the test data :-
Frequentist approach requires the experiment to be completed first by collecting sufficient samples before analyzing the data, this limits the test to be an offline experiment.
Bayesian approach analysis can be performed during the experiment while collecting the data. Also it is an online experiment as the analysis results get updated when new batch of data gets ingested.
Sample Size :-
Frequentist approach requires calculating the sample size prior to conducting the test, also the number of samples among test groups needs to be balanced.
Bayesian approach does not require a pre-defined sample size & also there is no need to have same number of samples amongst the test groups thus allowing an imbalanced sample size.
Test results explanation :-
For the frequentist approach, conclusions can be made like “We reject/ fail to reject the hypothesis that group A is better then group B. This conclusion is based on the observation of the historical data collected during the test. This approach uses p-value in order to quantify the confidence of the business conclusions.
For the bayesian approach, we introduce the element of probability while making an interpretation of results such as “ There is a 98% probability that group A is better than group B”. Thus this probabilistic result quantifies the confidence of the business conclusions.
Leveraging Test Results :-
Frequentist approach gives summary statistics of the samples collected during the experiment period, thus cannot be used for making any conclusions about the future unseen data.
Bayesian approach leverages the parameters of the distribution from the data & gives the posterior predictive distribution for unobserved, future values on the observed data.
Duration of the Test :-
In the frequentist approach, the duration of the experiment can be estimated basis the designed sample size as it is easy to estimate how long an experiment will be conducted.
In the bayesian approach, the duration of the experiment cannot be estimated as more samples coming every day helps to get more confidence conclusions, but cannot estimate how long a specific experiment would take.
Granularity of input data :-
In the frequentist approach, the level of granularity of the input data is at the very base level for eg: data collected basis each user / ID & also it depends on the duration for which the test is conducted.
In the bayesian approach, the level of granularity of the data depends on the frequency of the updating the test results, for eg : in case you are testing the Click through rate & the results are updated every 24 hours, one needs to calculate the number of total seen events & number of click events every day in order to arrive at the daily click through rate.
Performing Multiple Comparison :-
Frequentist approach leverages bonferonni adjustment in case when multiple variants are required to be tested at the same time.
Bayesian approach uses hierarchical bayesian methods for cases involving multiple variants.
Testing Approach :-
The frequentist approach recommends different tests based on the distribution(s) that a variable of variable(s) follows.
The bayesian approach leverages conjugate families for variables following different distributions for eg : Click through rate would leverage the beta distribution conjugate wherein prior parameters need to be set for the beta distribution, collected data is updated basis the baye’s rules in order to get the posterior of the parameters, then samples are taken from the posterior distribution & inferences are made on the test results accordingly.