Two-Way Analysis of Variance - University of Notre Dame

Preparing to load PDF file. please wait...

0 of 0
100%
Two-Way Analysis of Variance - University of Notre Dame

Transcript Of Two-Way Analysis of Variance - University of Notre Dame

Two-Way Analysis of Variance
Note: Much of the math here is tedious but straightforward. We’ll skim over it in class but you should be sure to ask questions if you don’t understand it.
I. OVERVIEW.
A. Sometimes a researcher might want to simultaneously examine the effects of two treatments (where both treatments have nominal-level measurement).
EXAMPLES:
T the effect of sex and race on wages T the effects of the level of pollution and the level of city services on housing prices T the effects of religion and region on income
To elaborate: with sex and race, we might wonder if T are there differences because of sex alone T are there differences because of race alone T are there differences attributable to particular combinations of sex and race - that is, are there interaction effects? For example, white males, white females, and black males may all have similar wages, but black females could have much lower wages. We’ll discuss interaction effects more shortly.
B. Two-Way Anova with a Balanced Design and the Classic Experimental Approach. We can use Analysis of Variance techniques for these and more complicated problems. These techniques can get fairly involved and employ several different options, each of which has various strengths and weaknesses. If this were a psychology class, we might spend a lot more time going over ANOVA, where such techniques are more widely used. But, in Sociology, we are much more likely to use regression and other techniques for our advanced work. Therefore, for our purposes, I will primarily focus on the special case of balanced designs (this is also what Hays, Harnett and other texts focus on). In a balanced design, all cell frequencies are equal, i.e. the number of observations in each combination of treatments is the same. So, for example, there would be 5 white males, 5 black males, 5 white females, and 5 black females. Balanced designs are unlikely in survey research but they are quite common (and often encouraged) in experimental studies. Equal cell frequencies make it easier to disentangle the effects of the row and column variables (e.g. sex and race) and also minimizes the effect of non-homogenous population variances if they exist.
In addition, I’ll note that several programs give you various options for the “Method” to use for Anova. If the design is balanced, I don’t think it matters what method you use. But, if you choose what SPSS calls the Classic Experimental Approach, many of the formulas that follow will be valid even when the design is not balanced. The Regression Approach and the Hierarchical Approach are other options (and several other options, with varying names, are also listed in different procedures). The SPSS manual and other sources have more information if you find yourself needing to know about these.
Two-Way Analysis of Variance - Page 1

As noted below, these assumptions are not required for everything we will be talking about. These assumptions will affect how computations are done with the raw data but, once that is done, the hypothesis testing procedures will be largely the same. Ergo, the most critical parts of our discussion will apply even when designs are not balanced.

C. The model. When we have 2 treatments, the model can be written as yijk = µ +τ j + λ k + (τλ ) jk + ε ijk

where µ = the grand mean, τj is the treatment effect for the jth category of the row variable, λk is the treatment effect for the kth category of the column variable, (τλ)jk is the interaction effect for the combination of the jth row category and the kth column category.
EXAMPLE: Suppose the overall average income is $20,000, the average black income is $15,000, the average female income is $17,000, and the average black woman’s income is $10,000. This means that µ = $20,000, τB = -$5,000, λW = -$3,000, (τλ)BW = -$2,000.
D. As before, we want to partition the variance. Note that

2 ∑ ∑ ∑( yijk - y )2 Total SS TSS

sy=

=

= = MS Total

N -1

N -1 N -1

Further, note that

Component ( yijk − y) = ( yijk − y jk ) + (yj − y) + ( yk − y) + ( y jk − y j − yk + y)

Description Deviation of the individual score from the overall mean
Deviation of the individual score from the group mean, i.e. εˆijk Deviation of the jth row’s mean from the overall mean, i.e. τˆ j Deviation of the kth column’s mean from the overall mean, i.e. λˆk Deviation of “combination” mean from row and column means; the interaction, i.e. (τˆλˆ) jk

Note that we are using the same trick we did before of adding and then subtracting the same terms.
Hence, ΣΣΣ ( yijk − y)2 can be broken out as follows (any seemingly omitted terms
conveniently work out to be zero):

Two-Way Analysis of Variance - Page 2

∑ ∑ ∑( yijk - y jk )2 = ∑ ∑ ∑ εˆi2jk = SS Error,
d.f.= N - JK This is analogous to SS Within from 1-way ANOVA. This represents the deviation of individuals from the means of others who have the same value on the row and column variables (e.g. are of the same sex and race); that is, this represents the component of the scores that cannot be accounted for by group membership. The d.f. arise from the fact that there are N cases, and J*K means have to be estimated. Also,
∑ ∑ ∑( y j - y )2 = ∑ ∑ ∑τˆ2j = SS Rows,
d.f.= J - 1
∑ ∑ ∑( yk - y )2 = ∑ ∑ ∑ λˆ2k = SS Columns, d.f.= K - 1
∑ ∑ ∑( y jk - y j - yk + y )2 = ∑ ∑ ∑(τˆλˆ)2jk = SS Interaction, d.f.= (J - 1)(K - 1)
Other useful partitionings include SS Main = SS Total - SS Interaction - SS Re sidual
d.f. = J + K - 2 Note also that, when all cell frequencies are equal, i.e. the number of observations in each combination of treatments is the same,
SS Main = SS Columns + SS Rows. This will not necessarily be true otherwise. The fact that it is true in a balanced design is one of its main advantages.
Two-Way Analysis of Variance - Page 3

Another useful partitioning is
SS Cells = SS Explained = SS Main + SS Interaction = SS Total - SS Error d.f. = JK - 1
When all cell frequencies are equal, SS Cells = SS Columns + SS Rows + SS Interaction.

Finally, note that, Total SS = SS Main + SS Interactions + SS Error = SS Explained + SS Error d.f.= J - 1+ K - 1+ JK + 1 - J - K + N - JK = N - 1

Again, when all cell frequencies are equal, Total SS = SS Columns + SS Rows + SS Interaction + SS Error.

E. When doing statistical inference, we assume that T for each treatment combination JK, the random error terms εijk are N(0, σ2); the variance σ2 is the same for each treatment combination.
T the random error terms are independent

II. TESTS OF INTEREST:

A. H0: (τλ)jk = 0 for all j, k HA: (τλ)jk <> 0 for at least 1 j, k

This is a test of whether there are any interaction effects; the appropriate test statistic is

F (J -1)(K -1),N -JK = SS Interaction/(J - 1)(K - 1) = MS Interaction

SS Error/(N - JK)

MS Error

If the null hypothesis is true, F - F([J - 1][K - 1], N - JK)

B. H0: τ1 = τ2 =... = τJ = 0 HA: At least 1 τj <> 0

Two-Way Analysis of Variance - Page 4

This tests whether there are any row effects. The appropriate test statistic is F J -1,N -JK = SS Rows/(J - 1) = MS Rows SS Error/(N - JK) MS Error
If the null hypothesis is true, F - F([J - 1], N - JK) C. H0: λ1 = λ2 =... = λK = 0 HA: At least 1 λk <> 0
This tests whether there are any column effects. The appropriate test statistic is F K -1,N -JK = SS Columns/(K - 1) = MS Columns SS Error/(N - JK) MS Error
If the null hypothesis is true, F - F([K - 1], N - JK). NOTE:The last two tests are primarily of interest if you conclude that interaction effects
are not significant. If, on the other hand, you conclude that the interaction effects do not equal zero, then you know both treatments (i.e. the row and column effects) are significant.
D. H0: All τ’s and λ’s = 0 HA: At least one τ or λ does not equal 0
This tests whether any of the main effects (i.e. row or column effects; or, non-interaction effects) are nonzero. The appropriate test statistic is
F J+K -2,N -JK = SS Main/(J + K - 2) = MS Main SS Error/(N - JK) MS Error
If the null hypothesis is true, F - F([J + K - 2], N - JK). E. H0: All τ’s, λ’s, and (τλ)’s = 0 HA: At least one τ, λ, or (τλ) does not equal 0 This tests whether there are any effects at all. If the null hypothesis is true, then every
cell in the table will have the same true mean. The appropriate test statistic is F JK -1,N -JK = SS Cells/(JK - 1) = MS Cells SS Error/(N - JK) MS Error
If the null hypothesis is true, F - F([JK - 1], N - JK).
Two-Way Analysis of Variance - Page 5

III. ROW, COLUMN, AND INTERACTION EFFECTS – EXAMPLES
What are interaction effects? Here are some substantive examples:
T Medicines A and B may have no effect when either is taken alone. But, the two together may have an effect. “The whole is different from the sum of the parts.”
T Another example: we might find that greater income leads to greater fertility for those who want children, and lower fertility for those who do not want children. We say that the effect of income is dependent on desires, or that desires and income interact in determining fertility.
T Good teachers and small classrooms might both encourage learning. A good teacher in a small classroom might be especially effective. The whole is greater than the sum of the parts.
Following are hypothetical 2-way ANOVA examples. The dependent variable is income (in thousands of dollars), the row variable is gender (Male or Female), the column variable is type of occupation (A, B, or C). Unless otherwise stated, assume that frequencies are equal for all cells.
1. Row (Gender) effects only.

Male Female

Occ A
µMA = 18 τλMA = 0
µFA = 14 τλFA = 0
µA = 16 λA = 0

Occ B
µMB = 18 τλMB = 0
µFB = 14 τλFB = 0
µB = 16 λB = 0

Occ C
µMC = 18 τλMC = 0
µFC = 14 τλFC = 0
µC = 16 λC = 0

µM = 18 τM = 2
µF = 14 τF = -2
µ = 16

The 2 rows differ, but the three columns are all the same. Within each occupation, men make $4,000 more on average than do women; each of the three occupations pays equally well.

2. Column (Occupation) effects only.

Male Female

Occ A
µMA = 12 τλMA = 0
µFA = 12 τλFA = 0
µA = 12 λA = -4

Occ B
µMB = 16 τλMB = 0
µFB = 16 τλFB = 0
µB = 16 λB = 0

Occ C
µMC = 20 τλMC = 0
µFC = 20 τλFC = 0
µC = 20 λC = 4

µM = 16 τM = 0
µF = 16 τF = 0
µ = 16

Two-Way Analysis of Variance - Page 6

The three columns differ, but the two rows are the same. Occupation C pays better than B and B pays better than A. Within each occupation, however, men and women make the same.
3. Row and column effects.

Male Female

Occ A
µMA = 14 τλMA = 0
µFA = 10 τλFA = 0
µA = 12 λA = -4

Occ B
µMB = 18 τλMB = 0
µFB = 14 τλFB = 0
µB = 16 λB = 0

Occ C
µMC = 22 τλMC = 0
µFC = 18 τλFC = 0
µC = 20 λC = 4

µM = 18 τM = 2
µF = 14 τF = -2
µ = 16

Both the rows and columns differ. Within each occupation, men make $4,000 more on average than women do. Within each gender, those in occupation C average $4,000 more than those in B, and those in B average $4,000 more than those in A.
4. Interaction effects I.

Male Female

Occ A
µMA = 15 τλMA = -1
µFA = 15 τλFA = 1
µA = 15 τA = -1

Occ B
µMB = 15 τλMB = -1
µFB = 15 τλFB = 1
µB = 15 τB = -1

Occ C
µMC = 21 τλMC = 2
µFC = 15 τλFC = -2
µC = 18 τC = 2

µM = 17 τM = 1
µF = 15 τF = -1
µ = 16

Five of the six cells have the same mean. However, for some reason, the combination of males and occupation C results in high male earnings.
5. Interaction effects II - differing magnitudes of effects.

Male Female

Occ A
µMA = 12 τλMA = -1
µFA = 10 τλFA = 1
µA = 11 λA = -5

Occ B
µMB = 16 τλMB = -1
µFB = 14 τλFB = 1
µB = 15 λB = -1

Occ C
µMC = 26 τλMC = 2
µFC = 18 τλFC = -2
µC = 22 λC = 6

µM = 18 τM = 2
µF = 14 τF = -2
µ = 16

Two-Way Analysis of Variance - Page 7

Men make more than women, and the advantage is especially great in occupation C. Or, those in occupation C make more than those in other occupations, and the advantage is especially great for men.
6. Interaction effects III - differing directions of effects.

Male Female

Occ A
µMA = 18 τλMA = +2
µFA = 14 τλFA = -2
µA = 16 λA = 0

Occ B
µMB = 16 τλMB = 0
µFB = 16 τλFB = 0
µB = 16 λB = 0

Occ C
µMC = 14 τλMC = -2
µFC = 18 τλFC = 2
µC = 16 λC = 0

µM = 16 τM = 0
µF = 16 τF = 0
µ = 16

In this example, the effect of gender depends on occupation. Males do better than women in Occupation A but worse in occupation C; in Occupation B there is no difference. Or, occupation C is better paying for women but not for men, whereas for occupation A the opposite is true. Note that, if you only looked at the main effects, you would erroneously conclude that gender and occupation have no effects on income, when in reality they do have effects but the effects work in opposing directions.

Two-Way Analysis of Variance - Page 8

IV. Computational Procedures - Two-Way Anova – Balanced Designs
Let A = row variable, B = column variable, J = number of categories for A, K = number of categories for B, TAj = the sum of the scores in group Aj, TBk = the sum of the scores in group Bk, TAjBk is the sum of the scores for the observations which fall in both groups Aj and Bk (there are J*K of these totals), nAj = number of observations in group Aj, nBk = number of observations in group Bk, nAjBk is the number of observations which fall in both groups Aj and Bk. [NOTE: While I will show you how to do the raw data calculations, in practice they are tedious enough that I generally would not expect you to do them by hand, at least on an exam. You should know how to do the other formulas, however, as they show how the different parts of the ANOVA table are related to each other.]
Note that many (albeit not all) of the formulas for raw data calculations and Sums of Squares assume a balanced design, i.e. all cell frequencies are equal for each possible combination of values for the row and column variables. Computations are somewhat more complicated when designs are not balanced. The Mean Square formulas and the F tests are accurate regardless of whether the design is balanced or not.

Formula (1) = (ΣΣΣyijk)2/n = Nµˆ 2
(2) = ΣΣΣyijk2 (3) = Σ TAj2/nAj
(4) = Σ TBk2/nBk
(5) = ΣΣ TAjBk2/nAjBk

Explanation
Raw Data Calculations (Balanced Design)
Sum all the observations. Square the result. Divide by the total number of observations.
Square each observation. Sum the squared observations.
Add up the values for the observations for group A1. Square the result. Divide by the number of observations in group A1. Repeat for each category of A. Add the results for each of the J groups together.
Add up the values for the observations for group B1. Square the result. Divide by the number of observations in group B1. Repeat for each category of B. Add the results for each of the K groups together.
Add up the values for the observations which fall in both group A1 and B1. Square this value, and divide by nA1B1. Repeat for each of the J*K combinations, and sum the results.

Two-Way Analysis of Variance - Page 9

Sums of Squares Calculations (Balanced Design)

SS Total = (2) - (1) SS Rows = (3) - (1)
SS Columns = (4) - (1) SS Interaction = (5) + (1) - (3) - (4) = SS Total - SS Rows - SS Columns - SS Error = SS Total – SS Main – SS Error SS Error = (2) - (5) = SS Total - SS Cells
SS Main = (3) + (4) – [2 * (1)] = SS Columns + SS Rows = SS Total – SS Error – SS Interaction SS Cells = (5) - (1) = SS Main + SS interaction = SS Total - SS Error.

Total sum of squares Row sum of squares. This is also sometimes called SSA. Column sum of squares. Also called SSB. Interaction sum of squares. Also called SSAB. It may be easier to use the second formula.
Error sum of squares. It is analogous to SS Within in one-way ANOVA. Also called SS Residual. Main effects Sum of Squares. Also called SSA+B
This is analogous to SS Between in one-way ANOVA. Also called SS Explained.

Mean Square Calculations (Balanced or unbalanced)

MS Total = s2 = SS Total/(n-1)
MS Rows = SS Rows/(J-1)
MS Columns = SS Columns/(K-1)
MS Interaction = SS Interaction/((J-1)(K-1))
MS Main = SS Main/(J + K - 2)
MS Cells = SS Cells/((J*K)-1)
MS Error = SS Error/ (n - J*K)

Remember that MS Total = s2 Also called MSA. Also called MSB. Also called MSAB Also called MSA+B Also called MS Explained. Also called MS Residual.

Two-Way Analysis of Variance - Page 10
ObservationsAnalysisEffectsVarianceRow