Multiple Regression: Omitted Varible Bias and Regression Anatomy

class: center, middle, inverse, title-slide

# Multiple Regression: Omitted Varible Bias and Regression Anatomy
### Fernando Hoces la Guardia
### 07/19/2022

---

# Regression Journey

- Regression as Matching on Groups. Ch2 of MM up to page 68 (not included).

- Regression as Line Fitting and Conditional Expectation. Ch2 of MM, Appendix.

- Multiple Regression and Omitted Variable Bias. Ch2 of MM pages 68-79 and Appendix.

- Regression Inference, Binary Variables and Logarithms. Ch2 of MM, Appendix + others.

---
# Regression Journey

- Regression as Matching on Groups. Ch2 of MM up to page 68 (not included).

- Regression as Line Fitting and Conditional Expectation. Ch2 of MM, Appendix.

- **Multiple Regression and Omitted Variable Bias. Ch2 of MM pages 68-79 and Appendix.**

- Regression Inference, Binary Variables and Logarithms. Ch2 of MM, Appendix + others.

---

# Today's Lecture

- Omitted Variable Bias (very important)

- Regression Anatomy (not essential)

---
class: inverse, middle

# Omitted Variable Bias (OVB)

---
# Omitted Variable Bias (OVB)

- We are back into the focus on causality!

- The most common regression version of selection bias is called omitted variable bias (OVB).

- Let's go back to the causal question from  Dale and Krueger (2002) to motivate this concept.

---
background-image: url("Images/MMtbl23_emphA_OVB.png"), url("Images/MMtbl23_emphB.png")
background-size: 50%, 50%
background-position: 100% 5%, 100% 95%
# Back to Earnings and Private/Public College Choice

.font90[
.pull-left[
- In moving from (1) to (2) we were controlling for `$SAT$`
- Including `$SAT$` had an effect on the coefficient of `$P_i$`
- Let's review the change from (4) to (5). 
- Including `$SAT$`, after controlling for selectivity, seems to not change our causal estimates. 
- Today we will formalize this relationship and it will help us understand how other unobservables might affect our causal estimates

]
]

---
# Can We Control for Everything?

- In our regressions we would like to control for how much resources had the family of each student. 
- A proxy for resources is parental income, but it does not capture other aspects of being rich or poor in resources. 
- One example is that two families could have the same income but different family sizes.
  - Imagine a family of 3 and a family of 6 with the same parental income. The larger family has far fewer resources to pay for higher tuition fees than the smaller family.
  - So even controlling for parental income, we would not have Other Things Equal.
- OVB helps us describe what happens when a relevant variable is omitted

---
# What Can We Say About This Bias?

- To understand OVB, let's go back to the simple example of 5 students and two selectivity groups (A and B) for the effect of private college on earnings.

- First, assume that we have all the variables we need and then explore how omitting the variable group `$(A_i)$` will bias our estimates.

- Let’s label the regression that has the variable  `$(A_i)$` as the “long” regression `$(l)$` and the regression that does not have this variable as the “short” regression `$(s)$`.

$$
`\begin{aligned}
 Y_i &= \alpha^{l} + \beta^{l} P_i + \gamma A_i + e^{l}_i\\
 Y_i &= \alpha^{s} + \beta^{s} P_i + e^{s}_i
\end{aligned}`
$$
---
background-image: url("Images/MMtbl21.png")
background-size: 50%
background-position: 100% 50%
count:false

# Short and Long Regressions: Simple Example 1/2

.font90[
.pull-left[
- From the toy example data on Table 2.1 of MM, we have already compute the regression estimates `$\alpha^{l} = 40,000$`, `$\beta^{l} = 10,000$`, and  `$\gamma^{l} =60,000$`
- Any ideas on how to compute the regression coefficient `$\beta^{s}$`?
]
]

---
background-image: url("Images/MMtbl21.png")
background-size: 50%
background-position: 100% 50%

# Short and Long Regressions: Simple Example 1/2
.font90[
.pull-left[
- From the toy example data on Table 2.1 of MM, we have already compute the regression estimates `$\alpha^{l} = 40,000$`, `$\mathbf{\beta^{l} = 10,000}$`, and  `$\gamma^{l} =60,000$`
- Any ideas on how to compute the regression coefficient `$\beta^{s}$`?
- As we saw yesterday `$\beta^{s}$` is the simple difference in earnings `$(Y_i)$` between treatment `$(P_i = 1)$` and control `$(P_i = 0)$`. From table 2.1 (focusing only on groups A and B) we have that `$\mathbf{\beta^{s} = 20,000}$`. 
- Omitting `$A_i$` leads to bias = `$\beta^{s} - \beta^{l} = 10,000$`
]
]

---
# Short and Long Regressions: Simple Example 2/2

-  OVB is define as the difference between effect omitting (on short) minus the effect not omitting (on long). `$OVB \equiv \beta^{s} - \beta^{l}$`. In this toy example is 10k.

- The source of this bias  is in attributing to `$P_i$` the difference between groups (A and B) captured by `$A_i$`.

- We can now establish more formally the two components that connect the coefficients from the long and short regression: 
  1. The relationship between the omitted variable `$(A_i)$` and treatment  `$(P_i)$`. 
  2. The relationship between the outcome `$(Y_i)$` and the omitted variable `$(A_i)$`. This relationship is givent by the parameter `$\gamma$` in the long regression.

---
# OVB Formula: General

.font140[
$$
`\begin{equation}
\text{Effect of included in short} = \text{Effect of included in long} + \\ 
 \text{Effect of omitted on outcome, in long} \times \\
 \text{Relationhip between omitted and included} \times \\
\end{equation}`
$$
]

> *"Short equals long plus effect of omitted in long (on outcome) times the regression of omitted on included"*

---
# OVB Formula: General (Causal)

.font140[
$$
`\begin{equation}
\text{Effect of treatment in short} = \text{Effect of treatment in long} + \\ 
 \text{Effect of omitted on outcome, in long} \times \\
 \text{Relationhip between omitted and treatment} \times \\
\end{equation}`
$$
]

> *"Short equals long plus effect of omitted in long (on outcome) times the regression of omitted on included"*

---
# OVB Formula in Example 1/3

.font140[
$$
`\begin{equation}
\text{Effect of } P_i \text{ in short} = \text{Effect of } P_i \text{ in long} + \\ 
 \text{Effect of } A_i \text{ on } Y_i \text{ (in long)} \times \\
 \text{Relationship between } A_i \text{ and } P_i 
\end{equation}`
$$
]

> *"Short equals long plus effect of omitted in long (on outcome) times the regression of omitted on included"*

---
# OVB Formula in Example 2/3

$$
`\begin{equation}
\text{Effect of } P_i \text{ in short} = \text{Effect of } P_i \text{ in long} + \\ 
 \text{Effect of } A_i \text{ on } Y_i \text{ (in long)} \times \\
 \text{Relationship between } A_i \text{ and } P_i 
\end{equation}`
$$
--

$$
`\begin{equation}
\beta^{s} = \beta^{l} + \\ 
 \text{Relationship between }A_i \text{ and } P_i \times \\
 \gamma
\end{equation}`
$$

$$
`\begin{equation}
OVB = \beta^{s} - \beta^{l}  =  \text{Relationship between }A_i \text{ and } P_i \times \gamma
\end{equation}`
$$

The relationship between `$A_i$` and `$P_i$` can be estimated using an auxiliary regression:

$$
`\begin{equation}
A_i = \pi_0 + \pi_1 P_i + u_i
\end{equation}`
$$

---
# OVB Formula in Example 3/3

.font140[
$$
`\begin{equation}
OVB = \beta^{s} - \beta^{l}  =  \pi_1 \times \gamma
\end{equation}`
$$
]

--
- We know `$\gamma = 60,000$`, how could we estimate `$\pi_1$`?

- `$\pi_1 =  \overline A_1 - \overline A_0  = 2/3 -1/2 = 0.1667$`

- `$OVB = \beta^{s} - \beta^{l}  = 0.1667 \times 60,000 = 10,000$`

- The same we obtained by computing `$\beta^{s} - \beta^{l}$` before!

- The key idea is that we care about the bias that we cannot observe `$(\beta^{s} - \beta^{l})$`, but we can investigate it by thinking about plausible values for the relationship between omitted and included `$(\pi_1)$`  and the effect of omitted in long `$(\gamma)$`.

---
# OVB in Dale and Krueger Study 1/3

- Let's discuss how the omitted variable "Family Size" `$(FS_i)$` could be generating some OVB.

- What would be the short equation in this case (hint: is not that short)?

$$
`\begin{equation}
ln Y_i = \alpha^{s} + \beta^{s} P_i +\sum_{j=1}^{150} \gamma^{s}_{j} GROUP_{ji} + \delta^{s}_1 SAT + \delta^{s}_2 ln PI_{i} + e^{s}_i
\end{equation}`
$$

- What would be the long equation in this case (hint: long basically means "longer" than short)?

$$
`\begin{equation}
ln Y_i = \alpha^{l} + \beta^{l} P_i + \sum_{j=1}^{150} \gamma^{l}_{j} GROUP_{ji} + \delta^{l}_1 SAT + \delta^{l}_2 ln PI_{i} + \lambda FS_i +  e^{l}_i
\end{equation}`
$$

---
count:false

# OVB in Dale and Krueger Study 2/3 
.font90[

$$
`\begin{equation}
ln Y_i = \alpha^{s} + \beta^{s} P_i +\sum_{j=1}^{150} \gamma^{s}_{j} GROUP_{ji} + \delta^{s}_1 SAT + \delta^{s}_2 ln PI_{i} + e^{s}_i
\end{equation}`
$$
$$
`\begin{equation}
ln Y_i = \alpha^{l} + \beta^{l} P_i + \sum_{j=1}^{150} \gamma^{l}_{j} GROUP_{ji} + \delta^{l}_1 SAT + \delta^{l}_2 ln PI_{i} + \lambda FS_i +  e^{l}_i
\end{equation}`
$$

- What would be the auxiliary regression in this case?
]

.font90[

$$
`\begin{equation}
FS_i = \pi_0 + \pi_1 P_i + \sum_{j=1}^{150} \pi_{3,j} GROUP_{ji} + \pi_4 SAT + \pi_5 ln PI_{i}  +  u_i
\end{equation}`
$$

$$
`\begin{equation}
OVB = \beta^{s} - \beta^{l}  =  \pi_1 \times \lambda
\end{equation}`
$$

-  Time to think about the sign and magnitude of  `$\pi_1$` and `$\lambda$` in this case. ]

---

# OVB in Dale and Krueger Study 2/3 
.font90[

$$
`\begin{equation}
ln Y_i = \alpha^{s} + \mathbf{\beta^{s}} P_i +\sum_{j=1}^{150} \gamma^{s}_{j} GROUP_{ji} + \delta^{s}_1 SAT + \delta^{s}_2 ln PI_{i} + e^{s}_i
\end{equation}`
$$
$$
`\begin{equation}
ln Y_i = \alpha^{l} + \mathbf{\beta^{l}} P_i + \sum_{j=1}^{150} \gamma^{l}_{j} GROUP_{ji} + \delta^{l}_1 SAT + \delta^{l}_2 ln PI_{i} + \mathbf{\lambda} FS_i +  e^{l}_i
\end{equation}`
$$

- What would be the auxiliary regression in this case?
]

.font90[

$$
`\begin{equation}
FS_i = \pi_0 + \mathbf{\pi_1} P_i + \sum_{j=1}^{150} \pi_{3,j} GROUP_{ji} + \pi_4 SAT + \pi_5 ln PI_{i} +  u_i
\end{equation}`
$$

$$
`\begin{equation}
OVB = \mathbf{\beta^{s} - \beta^{l}  =  \pi_1 \times \lambda}
\end{equation}`
$$

-  Time to think about the sign and magnitude of  `$\pi_1$` and `$\lambda$` in this case. ]

---
# OVB in Dale and Krueger Study 3/3

-  `$\pi_1$` is likely to be negative and large in magnitude. 
- `$\lambda$` higher family sizes might lead to less resources per children and this could have a negative effect on future earnings. Hence `$\lambda < 0$`
- Hence omitting `$FS_i$` will probably lead to a OVB that is positive (estimated effects are larger than true effects) positive. 
--

- Let's think of other potentially omitted variables: received tutoring? parental education? 
--

- One thing that is interesting about this particular example is that most stories that you can think have either `$\lambda < 0, \pi_1<0$` or `$\lambda > 0, \pi_1>0$` leading us to suspect that the estimated effect of private college in a regression are likely to be overestimated.

---
# Robustness to Inclusion/Exclussion of Regressors

- In regression, we can never know if we have control for enough variables to eliminate OVB/selection bias.

- Given this, we should always ask how much do the estimated coefficients change when including new variables.

- Confidence on regression estimates of causal effects grow when treatment effects are insensitive to the inclusion of new variables.

---
background-image: url("Images/MMtbl23_emphA_OVB.png"), url("Images/MMtbl23_emphB.png")
background-size: 50%, 50%
background-position: 100% 5%, 100% 95%
count:false

# Robustness: Dale and Krueger Study 1/2
.font80[
.pull-left[
- Moving from column (1) to (2): 
  - (1) was omitting `$SAT_i$`, and (2) is the long version of (1)
  - `$OVB = \beta^{s} - \beta^{l} = 0.212 - 0.152 = 0.06$`
  - How about computing the same but using the OVB formula? 
    - We need the auxiliary regression (page 76 of MM): `$\pi_1 = 1.165$`
    - Where is the "effect of omitted in long" `$(\lambda)$`?
]
]
---
background-image: url("Images/MMtbl23_emphA_OVB.png"), url("Images/MMtbl23_emphB.png")
background-size: 50%, 50%
background-position: 100% 5%, 100% 95%

# Robustness: Dale and Krueger Study 1/2
.font80[
.pull-left[
- Moving from column (1) to (2): 
  - (1) was omitting `$SAT_i$`, and (2) is the long version of (1).
  - `$OVB = \beta^{s} - \beta^{l} = 0.212 - 0.152 = 0.06$`
  - How about computing the same but using the OVB formula? 
    - We need the auxiliary regression (page 76 of MM): `$\pi_1 = 1.165$`
    - Where is the "effect of omitted in long" `$(\lambda)$`?
    - `$\lambda= 0.051$`
    - `$OVB = \pi_1 \times \lambda = 1.165 \times 0.051 = 0.06$`!
]
]

---
background-image: url("Images/MMtbl23_emphA_OVB.png"), url("Images/MMtbl23_emphB.png")
background-size: 50%, 50%
background-position: 100% 5%, 100% 95%
count: false

---
background-image: url("Images/MMtbl23_emphA_OVB.png"), url("Images/MMtbl23_emphB.png")
background-size: 50%, 50%
background-position: 100% 5%, 100% 95%

# Robustness: Dale and Krueger Study 2/2
.font80[
.pull-left[
- Moving from column (4) to (5): 
  - (4) was omitting `$SAT_i$`, and (5) is the long version of (4)
  - `$OVB = \beta^{s} - \beta^{l} = 0.034 - 0.031 = 0.003$`
  - How about computing the same but using the OVB formula? 
    - We need the auxiliary regression (page 76 of MM): `$\pi_1 = 0.066$`
    - Where is the  `$(\lambda)$`?
    - `$\lambda= 0.036$`
    - `$OVB = \pi_1 \times \lambda = 0.066 \times 0.036 = 0.0024$`!
    - Differences are due to rounding of small numbers
    - Most of the change comes from `$\pi_1$`
]
]

---
count:false

# Proof of OVB Formula

.font90[

.pull-left[

$$
`\begin{aligned}
\beta^{s}_1 &= \frac{Cov(X_{1i}, Y_{1i})}{Var(X_{1i})}
\end{aligned}`
$$

]
.pull-right[

.right[

Substitute for `$Y_i$` using equation for long.

]
]

]

---
count:false

# Proof of OVB Formula

.font90[

.pull-left[

]
.pull-right[

.right[

Substitute for `$Y_i$` using equation for long.

<br>

But what is a key <br>
property of any residuals?

]
]

]

---
count:false

# Proof of OVB Formula

.font90[

.pull-left[

$$
`\begin{aligned}
\beta^{s}_1 &= \frac{Cov(X_{1i}, Y_{1i})}{Var(X_{1i})}\\
\beta^{s} &= \frac{Cov(X_{1i}, \alpha^{l} + \beta^{l} X_{1i} + \gamma X_{2i} + e^{l}_i  )}{Var(X_{1i})} \\ 
 &= \frac{\beta^{l} Var(X_{1i}) +\gamma Cov(X_{1i}, X_{2i}) + Cov(X_{1i}, e^{l}_{i}) }{Var(X_{1i})}\\
\beta^{s} &= \frac{\beta^{l} Var(X_{1i}) +\gamma Cov(X_{1i}, X_{2i})  }{Var(X_{1i})}\\
&= \beta^{l}  + \gamma\frac{Cov(X_{1i}, X_{2i})  }{Var(X_{1i})}
\end{aligned}`
$$

]
.pull-right[

.right[

Substitute for `$Y_i$` using equation for long.

<br>

But what is a key <br>
property of any residuals?

What is that last term? <br>
(think auxiliary regression)

]
]

]

---
count:true

# Proof of OVB Formula

.font90[

.pull-left[

]
.pull-right[

.right[

Substitute for `$Y_i$` using equation for long.

<br>

But what is a key <br>
property of any residuals?

What is that last term? <br>
(think auxiliary regression)

]
]

]

---
# Acknowledgments

.pull-left[
- [Ed Rubin's Graduate Econometrics](https://github.com/edrubin/EC607S21)
- MM
]
.pull-right[

]