Regression modeling with a two-level categorical variable
Suppose that Z is a two-level categorical variable such that Z = A or B.
Define
X={1,if Z=A0,otherwise
Then we can use the following regression model, Y=β0+β1X+ϵ
- β0=μB(called the base line)
- β1=μA−μB
- Consequently, β0+β1=μA
Since E(Y)=β0+β1X,
if Z = A, X = 1, E(Y)=β0+β1=μA
if Z = B, X = 0, E(Y)=β0=μB
Suppose that Z is a three-level categorical variable such that Z = A, B or C.
Define
X1={1,if Z=A0,otherwise
X2={1,if Z=B0,otherwise
Then we can use the following regression model, y=β0+β1X1+β2X2+ϵ
- β0=μC (called the base line)
- β1=μA−μC
- β2=μB−μC
Since E(Y)=β0+β1X1+β2X2,
if Z = A, (1, 0), E(Y)=β0+β1=μA
if Z = B, (0, 1), E(Y)=β0+β2=μB
if Z = C, (0, 0), E(Y)=β0=μC
Two categorical variables
Consider two categorical variables: One at 3 levels (F1,F2,F3) and the other at 2 levels (B1,B2).
Then, the model can be written as Y=β0+β1X1+β2X2+β3X3+ϵ,
where
X1=1ifF2X1=0,ifnot
X2=1ifF3X2=0,ifnot
X3=1ifB2X3=0,ifnot
Note that F1 and B1 : base levels
- β0=μ11 (mean of combination of base levels)
- β1=μ2j−μ1j for any level Bj (j = 1, 2)
- β2=μ3j−μ1j for any level Bj (j = 1, 2)
- β3=μi2−μi1 for any level Fi (i = 1, 2, 3)
Interaction model with two categorical variables
Consider an extended model as follows:
Y=β0+β1X1+β2X2+β3X3+β4X4+β5X5+ϵ,
where
X1=1ifF2X1=0,ifnot
X2=1ifF3X2=0,ifnot
X3=1ifB2X3=0,ifnot
X4=X1X3,andX5=X2X3
Note that F1 and B1 : Base levels.
- β0=μ11 (mean of combination of base levels)
- β1=μ21−μ11 for any level B1 only
- β2=μ31−μ11 for any level B1 only
- β3=μ12−μ11 for any level F1 only
- β4=(μ22−μ12)−(μ21−μ11)
- β5=(μ32−μ12)−(μ31−μ11)
Since F2, B1, μ21=β0+β1 then we can write β1=μ21−μ11.
Example(Two categorical variables with interaction)

이걸 보고 우리가 질문할 수 있는 것은 다음과 같습니다.
- interaction이 유의한가요?
- H0:β4=β5=0 vs. H1:β4≠0orβ5≠0
- SAS에서 추가적인 옵션이 test를 걸어줘서 확인을 해도 되나, T-test에서 유추가 가능합니다.
- interaction이 없는 모델과 비교할 때는 R2a을 비교합니다.
결론: 범주형에 대한 회귀분석을 진행할 때도 interaction을 고려해볼 수 있다는 것입니다.
'통계학 > 회귀분석(Regression Analysis)' 카테고리의 다른 글
Matrix format (0) | 2025.02.18 |
---|---|
Transformation of variables (0) | 2024.11.19 |
다중선형회귀 (Multiple linear regression) (0) | 2024.10.06 |
단순선형회귀 (Simple linear regression) (3) | 2024.09.25 |