Ec140 - Regression as Line Fitting and Conditional Expectation

class: center, middle, inverse, title-slide

# Ec140 - Regression as Line Fitting and Conditional Expectation
## (Part I)
### Fernando Hoces la Guardia
### 07/14/2022

---

# Regression Journey

- Regression as Matching on Groups. Ch2 of MM up to page 68 (not included).

- Regression as Line Fitting and Conditional Expectation. Ch2 of MM, Appendix + [SoPo Econometrics](https://github.com/ScPoEcon/ScPoEconometrics-Slides). (Part I today)

- Multiple Regression and Omitted Variable Bias. Ch2 of MM pages 68-79.

- Regression Inference, Binary Variables and Logarithms. Ch2 of MM, Appendix + others.

---
# Regression Journey

- Regression as Matching on Groups. Ch2 of MM up to page 68 (not included).

- **Regression as Line Fitting and Conditional Expectation. Ch2 of MM, Appendix + [SoPo Econometrics](https://github.com/ScPoEcon/ScPoEconometrics-Slides). (Part I today)**

- Multiple Regression and Omitted Variable Bias. Ch2 of MM pages 68-79.

- Regression Inference, Binary Variables and Logarithms. Ch2 of MM, Appendix + others.

---
class: inverse, middle

# Regression as Line Fitting and Conditional Expectation

---
# Regression as Line Fitting: Today's Goal
.font90[
- The goals of today's class are two: 
  1. Provide an explanation to what regression does when "it generate fitted values" (or "it fits a line"), and
  2. Provide some insight to a useful formula that represents the main coefficient of interest `$(\beta)$`.

- Today's class will be a bit more technical than previous classes.

- For this reason it is important to always keep in mind what the goal is.

- Even if you end up completely lost about today's material, these explanations are not essential for you to do well in class.

- They are meant to solidify your intuition, not to discourage you from continuing the exploration of learning causal inference.

]

---
# Regression as Line Fitting

- Example: Class size and student performance (Slides adapted from [SciencePo Econometrics](https://github.com/ScPoEcon/ScPoEconometrics-Slides) course, and [data from Raj Chetty and Greg Bruich's course](https://opportunityinsights.org/course/))

---

# Class size and student performance: Regression line

How to visually summarize the relationship: **a line through the scatter plot**

.left-wide[
<img src="13_reg_line_fit_ce_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto auto auto 0;" />
]

.right-thin[
<br>

* A *line*! Great. But **which** line? This one?

* That's a *flat* line. But average mathematics score is somewhat *increasing* with class size.

]

---

# Class size and student performance: Regression line

How to visually summarize the relationship: **a line through the scatter plot**

.left-wide[
<img src="13_reg_line_fit_ce_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto auto auto 0;" />
]

.right-thin[
<br>

* **That** one?

* Slightly better! Has a **slope** and an **intercept**

* We need a rule to decide!

]

---

# It's All About the Residuals

- In *Regression as Matching* we define the residuals, `$e_i$`, as the difference between the observed `$(Y_i)$` and fitted values `$(\widehat Y_i)$`. 
    $$
    e_i \equiv   Y_i - \widehat{Y}_i 
    $$

- By fitted values, we mean a line (for now) that summarizes the relationship between `$X$` and `$Y$`.

- The equation for such a line with an intercept `$a$` and a slope `$b$` is:
    $$
    \widehat{Y}_i = a + b X\_i
    $$

---

# What's A Line: A Refresher

---

# What's A Line: A Refresher

---

# What's A Line: A Refresher

---

# Simple Linear Regression: Residual

* If all the data points were __on__ the line then `$\widehat{Y}_i = Y_i$`.

---

# Simple Linear Regression: Residual

* If all the data points were __on__ the line then `$\widehat{Y}_i = Y_i$`.

---