# Factor Variables and Marginal Effects in Stata 11

## Transcript Of Factor Variables and Marginal Effects in Stata 11

Factor Variables and Marginal Effects in Stata 11

Christopher F Baum

Boston College and DIW Berlin

January 2010

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 1 / 18

Using factor variables

Using factor variables

One of the biggest innovations in Stata version 11 is the introduction of factor variables. Just as Stata’s time series operators allow you to refer to lagged variables (L. or differenced variables (D.), the i. operator allows you to specify factor variables for any non-negative integer-valued variable in your dataset. In the auto.dta dataset, where rep78 takes on values 1. . . 5, you could list rep78 i.rep78, or summarize i.rep78, or regress mpg i.rep78. Each one of those commands produces the appropriate indicator variables ‘on-the-ﬂy’: not as permanent variables in your dataset, but available for the command.

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 2 / 18

Using factor variables

For the list command, the variables will be named 1b.rep78, 2.rep78 ...5.rep78. The b. is the base level indicator, by default assigned to the smallest value. You can specify other base levels, such as the largest value, the most frequent value, or a particular value. For the summarize command, only levels 2. . . 5 will be shown; the base level is excluded from the list. Likewise, in a regression on i.rep78, the base level is the variable excluded from the regressor list to prevent perfect collinearity. The conditional mean of the excluded variable appears in the constant term.

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 3 / 18

Using factor variables

Interaction effects

Interaction effects

If this was the only feature of factor variables (being instantiated when called for) they would not be very useful. The real advantage of these variables is the ability to deﬁne interaction effects for both integer-valued and continuous variables. For instance, consider the indicator foreign in the auto dataset. We may use a new operator, #, to deﬁne an interaction:

regress mpg i.rep78 i.foreign i.rep78#i.foreign

All combinations of the two categorical variables will be deﬁned, and included in the regression as appropriate (omitting base levels and cells with no observations).

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 4 / 18

Using factor variables Interaction effects

In fact, we can specify this model more simply: rather than

regress mpg i.rep78 i.foreign i.rep78#i.foreign

we can use the factorial interaction operator, ##:

regress mpg i.rep78##i.foreign

which will provide exactly the same regression, producing all ﬁrst-level and second-level interactions. Interactions are not limited to pairs of variables; up to eight factor variables may be included.

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 5 / 18

Using factor variables Interaction effects

Furthermore, factor variables may be interacted with continuous variables to produce analysis of covariance models. The continuous variables are signalled by the new c. operator:

regress mpg i.foreign i.foreign#c.displacement

which essentially estimates two regression lines: one for domestic cars, one for foreign cars. Again, the factorial operator could be used to estimate the same model:

regress mpg i.foreign##c.displacement

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 6 / 18

Using factor variables Interaction effects

As we will see in discussing marginal effects, it is very advantageous to use this syntax to describe interactions, both among categorical variables and between categorical variables and continuous variables. Indeed, it is likewise useful to use the same syntax to describe squared (and cubed. . . ) terms:

regress mpg i.foreign c.displacement c.displacement#c.displacement

In this model, we allow for an intercept shift for foreign, but constrain the slopes to be equal across foreign and domestic cars. However, by using this syntax, we may ask Stata to calculate the marginal effect ∂mpg/∂displacement, taking account of the squared term as well, as Stata understands the mathematics of the speciﬁcation in this explicit form.

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 7 / 18

Computing marginal effects

Computing marginal effects

With the introduction of factor variables in Stata 11, a powerful new command has been added: margins, which supersedes earlier versions’ mfx and adjust commands. Those commands remain available, but the new command has many advantages. Like those commands, margins is used after an estimation command. In the simplest case, margins applied after a simple one-way ANOVA estimated with regress i.rep78, with margins i.rep78, merely displays the conditional means for each category of rep78.

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 8 / 18

Computing marginal effects

. regress mpg i.rep78

Source

SS

Model Residual

549.415777 1790.78712

Total

2340.2029

df

MS

4 137.353944 64 27.9810488

68 34.4147485

Number of obs =

F( 4, 64) =

Prob > F

=

R-squared

=

Adj R-squared =

Root MSE

=

69 4.91 0.0016 0.2348 0.1869 5.2897

mpg

rep78 2 3 4 5

_cons

Coef. Std. Err.

t P>|t|

-1.875 -1.566667

.6666667 6.363636

21

4.181884 3.863059 3.942718 4.066234

3.740391

-0.45 -0.41

0.17 1.56

5.61

0.655 0.686 0.866 0.123

0.000

[95% Conf. Interval]

-10.22927 -9.284014 -7.209818 -1.759599

13.52771

6.479274 6.150681 8.543152 14.48687

28.47229

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 9 / 18

Computing marginal effects

. margins i.rep78

Adjusted predictions Model VCE : OLS

Number of obs =

69

Expression : Linear prediction, predict()

rep78 1 2 3 4 5

Delta-method Margin Std. Err.

z P>|z|

21 19.125 19.43333 21.66667 27.36364

3.740391 1.870195 .9657648 1.246797 1.594908

5.61 10.23 20.12 17.38 17.16

0.000 0.000 0.000 0.000 0.000

[95% Conf. Interval]

13.66897 15.45948 17.54047 19.22299 24.23767

28.33103 22.79052

21.3262 24.11034

30.4896

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 10 / 18

Christopher F Baum

Boston College and DIW Berlin

January 2010

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 1 / 18

Using factor variables

Using factor variables

One of the biggest innovations in Stata version 11 is the introduction of factor variables. Just as Stata’s time series operators allow you to refer to lagged variables (L. or differenced variables (D.), the i. operator allows you to specify factor variables for any non-negative integer-valued variable in your dataset. In the auto.dta dataset, where rep78 takes on values 1. . . 5, you could list rep78 i.rep78, or summarize i.rep78, or regress mpg i.rep78. Each one of those commands produces the appropriate indicator variables ‘on-the-ﬂy’: not as permanent variables in your dataset, but available for the command.

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 2 / 18

Using factor variables

For the list command, the variables will be named 1b.rep78, 2.rep78 ...5.rep78. The b. is the base level indicator, by default assigned to the smallest value. You can specify other base levels, such as the largest value, the most frequent value, or a particular value. For the summarize command, only levels 2. . . 5 will be shown; the base level is excluded from the list. Likewise, in a regression on i.rep78, the base level is the variable excluded from the regressor list to prevent perfect collinearity. The conditional mean of the excluded variable appears in the constant term.

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 3 / 18

Using factor variables

Interaction effects

Interaction effects

If this was the only feature of factor variables (being instantiated when called for) they would not be very useful. The real advantage of these variables is the ability to deﬁne interaction effects for both integer-valued and continuous variables. For instance, consider the indicator foreign in the auto dataset. We may use a new operator, #, to deﬁne an interaction:

regress mpg i.rep78 i.foreign i.rep78#i.foreign

All combinations of the two categorical variables will be deﬁned, and included in the regression as appropriate (omitting base levels and cells with no observations).

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 4 / 18

Using factor variables Interaction effects

In fact, we can specify this model more simply: rather than

regress mpg i.rep78 i.foreign i.rep78#i.foreign

we can use the factorial interaction operator, ##:

regress mpg i.rep78##i.foreign

which will provide exactly the same regression, producing all ﬁrst-level and second-level interactions. Interactions are not limited to pairs of variables; up to eight factor variables may be included.

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 5 / 18

Using factor variables Interaction effects

Furthermore, factor variables may be interacted with continuous variables to produce analysis of covariance models. The continuous variables are signalled by the new c. operator:

regress mpg i.foreign i.foreign#c.displacement

which essentially estimates two regression lines: one for domestic cars, one for foreign cars. Again, the factorial operator could be used to estimate the same model:

regress mpg i.foreign##c.displacement

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 6 / 18

Using factor variables Interaction effects

As we will see in discussing marginal effects, it is very advantageous to use this syntax to describe interactions, both among categorical variables and between categorical variables and continuous variables. Indeed, it is likewise useful to use the same syntax to describe squared (and cubed. . . ) terms:

regress mpg i.foreign c.displacement c.displacement#c.displacement

In this model, we allow for an intercept shift for foreign, but constrain the slopes to be equal across foreign and domestic cars. However, by using this syntax, we may ask Stata to calculate the marginal effect ∂mpg/∂displacement, taking account of the squared term as well, as Stata understands the mathematics of the speciﬁcation in this explicit form.

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 7 / 18

Computing marginal effects

Computing marginal effects

With the introduction of factor variables in Stata 11, a powerful new command has been added: margins, which supersedes earlier versions’ mfx and adjust commands. Those commands remain available, but the new command has many advantages. Like those commands, margins is used after an estimation command. In the simplest case, margins applied after a simple one-way ANOVA estimated with regress i.rep78, with margins i.rep78, merely displays the conditional means for each category of rep78.

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 8 / 18

Computing marginal effects

. regress mpg i.rep78

Source

SS

Model Residual

549.415777 1790.78712

Total

2340.2029

df

MS

4 137.353944 64 27.9810488

68 34.4147485

Number of obs =

F( 4, 64) =

Prob > F

=

R-squared

=

Adj R-squared =

Root MSE

=

69 4.91 0.0016 0.2348 0.1869 5.2897

mpg

rep78 2 3 4 5

_cons

Coef. Std. Err.

t P>|t|

-1.875 -1.566667

.6666667 6.363636

21

4.181884 3.863059 3.942718 4.066234

3.740391

-0.45 -0.41

0.17 1.56

5.61

0.655 0.686 0.866 0.123

0.000

[95% Conf. Interval]

-10.22927 -9.284014 -7.209818 -1.759599

13.52771

6.479274 6.150681 8.543152 14.48687

28.47229

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 9 / 18

Computing marginal effects

. margins i.rep78

Adjusted predictions Model VCE : OLS

Number of obs =

69

Expression : Linear prediction, predict()

rep78 1 2 3 4 5

Delta-method Margin Std. Err.

z P>|z|

21 19.125 19.43333 21.66667 27.36364

3.740391 1.870195 .9657648 1.246797 1.594908

5.61 10.23 20.12 17.38 17.16

0.000 0.000 0.000 0.000 0.000

[95% Conf. Interval]

13.66897 15.45948 17.54047 19.22299 24.23767

28.33103 22.79052

21.3262 24.11034

30.4896

Christopher F Baum (Boston College/DIW) Factor Variables and Marginal Effects

Jan 2010 10 / 18