Single Factor Experiments - Unbalanced Data. This is when the number of observations taken within each treatment is different.
Advantages of using a Balanced Design / Disadvantages of Unbalanced Design
The test statistic is relatively insensitive to small departures from the assumption of equal variances for the treatments if the sample sizes are equal. (This is not the case with unequal size samples)
The Power of the Test is maximized if the samples are of equal size
In an Unbalanced Design ANOVA, a modification is made to the Sum of Squares formulas.
Factorial Design – Unbalanced Data
Reasons.
Designed as a balanced design initially, however, due to unforeseen problems in running the experiment, may result in loss of some observations
Designed as an unbalanced experiment intentionally.
This may be the case when certain treatment combinations may be more expensive or more difficult to run, hence fewer observations may be taken in these treatment combination cells.
This may be the case when some treatment combinations may be of greater interest to the experimenter as they may represent new or unexplored conditions, so the researcher may do more replication in these cells.
Unbalanced Design Examples
Proportional Data. Here the number of observations in any two rows or columns is proportional. In this case, normal ANOVA works with minor modifications for the sums of squares formula
Approximate Methods
When the unbalanced data is not far away from the balanced data, an approximation can be done to convert the unbalanced data to a balanced one. Some of the ways approximation are done is given below
Estimating the mission Observations. If only a few observations are different, a reasonable procedure for estimating the missing values can be done. For a model with interaction, the estimated value should reduce the Error Sum of Squares. This can be done by taking the average of the observations in Cell (2,2) having 3 observations (1 observation missing).
Setting Data Aside. In this case Cell (2,2) has one data point more than the other cells, we set aside one observation from Cell (2,2) in order to obtain a balanced design
Method for Unweighted Means. This method was introduced by Yates (1934) in which the cell averages are treated as data and subjected to standard balanced data analysis to obtain the Sum of Squares for rows, columns and interactions. This is an approximate procedure because the sums of squares of the rows, columns and interactions are not distributed as chi-square random variables.
Weighed Squares of Means Method. Also proposed by Yates (1934). In this method, the terms of the sums of squares are weighted in inverse proportions to their variance.
Exact Method
This is done when empty cells occur (nij = 0) or when nij are very different. Here we develop the sums of squares for testing the main effects and interactions by representing the ANOVA model as a regression model.
References
Design and Analysis of Experiments by Douglas C Montgomery, International Students Edition, Eight Edition