Total variation in experimental data is partitioned into components assignable to specific sources by the analysis of variance. This statistical technique is applicable to data for which (1) effects of sources are additive, (2) uncontrolled or unexplained experimental variations (which are grouped as experimental errors) are independent of other sources of variation, (3) variance of experimental errors is homogeneous, and (4) experimental errors follow a normal distribution. When data depart from these assumptions, one must exercise extreme care in interpreting the results of an analysis of variance. Statistical tests indicate the contribution of the components to the observed variation.
In an illustrative experiment, t methods of treatment are under study, and n samples are measured for each treatment for a total of nt samples. Measurement of the ith sample that received the jth treatment records an overall effect μ, an effect produced by the jth treatment, and an effect produced by experimental error. The three effects are additive, so that Eq. (1)
holds, where ; and . The statistical problem is to test for the existence of these effects.
The analysis of variance in this example is presented in the table. Entries in the sum of squares column represent that part of the total variation that is attributable to each source. Total sum of squares Q is the sum over all squared deviations of observations from the grand mean , Eq. (2).
Similarly, within treatments, sum of squares E is the sum over all squared deviations of observations within a treatment from the mean of that treatment, Eq. (3).
Also, between treatments, sum of squares T is n times the sum over all treatments of the squared deviations of treatment means from a grand mean as defined by Eqs. (2) and (3). The sum of squares is generally computed more easily from the equivalent formulas (4)–(6).
The entries under degrees of freedom represent the number of independent comparisons upon which the sum of squares for the source of variation is based. In every case the linear restriction imposed by the relationship of the particular mean to the observations results in the loss of one degree of freedom. Therefore the number of degrees of freedom is always one less than the number of deviations used to compute the sum of squares.
The mean squares in the analysis of variance are obtained by dividing the sum of squares by the corresponding degrees of freedom. The within-treatments mean square is an estimate of , the variance of the error term in the additive model. It represents the random or unexplained variation in the data. The between-treatments mean square is an estimate of , where is the variance of the treatment effects .
If the treatment means differ substantially, the effects estimated by (()) will differ correspondingly and will have a large variance . If on the other hand the means do not differ, the treatment effects would be zero and would be zero. In this case the treatment mean square would be equal to the error mean square and both would be independent estimates of . By comparing the ratio of between treatment mean square to within-treatment mean square with unity, the variation due to treatments is compared with the variation due to random or unexplained factors. If this ratio, called the F ratio, is close to unity, there is no evidence of a treatment effect. However, if ratio is substantially greater than unity there may be a significant treatment effect.
To compare the mean squares objectively, one uses the F test of significance in which the statistical hypothesis is that . Under this hypothesis it can be concluded that the treatment effects are significantly different from zero at the significance level α if the calculated F ratio is greater than the value of F at the α point on the F distribution with and degrees of freedom. See also: Biometrics; Quality control; Statistics