All Things Regression

class: center, middle, inverse, title-slide

# All Things Regression
## Part I
### Fernando Hoces la Guardia
### 07/19/2022

---

# Regression Journey

- Regression as Matching on Groups. Ch2 of MM up to page 68 (not included).

- Regression as Line Fitting and Conditional Expectation. Ch2 of MM, Appendix.

- Multiple Regression and Omitted Variable Bias. Ch2 of MM pages 68-79 and Appendix. 
- All Things Regression: Anatomy, Inference, Logarithms, Binary Outcomes, and `$R^2$`. Ch2 of MM, Appendix + others.

---
# Regression Journey

- Regression as Matching on Groups. Ch2 of MM up to page 68 (not included).

- Regression as Line Fitting and Conditional Expectation. Ch2 of MM, Appendix.

- Multiple Regression and Omitted Variable Bias. Ch2 of MM pages 68-79 and Appendix.

- **All Things Regression: Anatomy, Inference, Logarithms, Binary Outcomes, and `$R^2$`. Ch2 of MM, Appendix + others.**

---

# Today and Tomorrow's Lecture

- Regression Anatomy

- Regression Inference

- Non-linearities: 
   - Logarithms
   - Others

- Binary Outcomes

- `$R^2$`

---
class: inverse, middle

# Regression Anatomy

---
# Regression Anatomy

- In addition to the intuition of regression as matching in subgroups, here we will explore another interpretation of what does it mean to control for multiple variables (regressors)

- We started with our exploration to regression with just on regressor: 
`$Y_i = \alpha + \beta  P_i +e_i$`

- We then added multiple regressors and interpreted the beta coefficient as a weighted average of difference within subgroups.

- The first resgression is sometimes called a bivariate regression (or bivariate analysis, aka univariate analysis, in the sense that there is only one independent variable).

- The second is called a multivariate regression (aka multivariate analysis).

---
#  "Controlling For" a Second Interpretation 1/2

.font90[
- In a **multiple** regression like the following:

$$
`\begin{equation}
Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + e_i
\end{equation}`
$$

- The coefficient of `$X_{1i}$` `$(\beta_1)$` is the same as the one obtained from a **bivariate** regression between the outcome variable `$(Y_i)$` and the residual term `$\widetilde X_{1i}$`, that corresponds to the following (auxiliary) regression:

$$
`\begin{equation}
X_{1i} = \pi_0 + \pi_1 X_{2i} + \widetilde X_{1i}
\end{equation}`
$$
Meaning: 
$$
`\begin{align}
  \beta_1 &= \dfrac{\mathop{\text{Cov}} \left( \text{Y}_{i},\, \widetilde{X}_{1i} \right)}{\mathop{\text{Var}} \left( \widetilde{X}_{1i} \right)}
\end{align}`
$$
]

---
#  "Controlling For" a Second Interpretation 2/2

$$
`\begin{equation}
X_{1i} = \pi_0 + \pi_1 X_{2i} + \widetilde X_{1i}
\end{equation}`
$$

- Let’s think about what this residual means: 
  - All variation (information) in `$X_{1i}$` that cannot be explained by variation (information) in  `$X_{2i}$`. 
  - Then the bivariate regression (of `$Y_i$` and `$\widetilde X_{1i}$` ) is basically regressing `$Y_i$` on “all of `$X_{1i}$` that is not explained by `$X_{2i}$`” or “all of `$X_{1i}$`, removing, or controlling for, the variation in `$X_{2i}$`”

---

# Regression Anatomy: Visually

- This formula also applies for the residual after regression `$Y_i$` on `$X_{2i}$`, and this last one has a nice visual interpretation.

- (Regression Anatomy here is a simplified version of a more general idea called the Frisch-Waugh-Lovell theorem, it is outside of the scope of the course, but if you learn linear algebra, it has a really cool interpretation)

- Graphical example (Again from the great slides of [Ed Rubin](https://github.com/edrubin/EC607S21)) for the case where `$X_2i$` is a binary variable

---
# Regression Anatomy: Visually

`$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + e_i$`

<img src="16_all_things_reg_files/figure-html/fig_anatomy1-1.svg" style="display: block; margin: auto;" />
---
count: true
# Regression Anatomy: Visually