class: center, middle, inverse, title-slide

# Ec140 - Causality and Selection Bias
### Fernando Hoces la Guardia
### 06/29/2022

---

<style type="text/css">
.remark-slide-content {
    font-size: 30px;
    padding: 1em 1em 1em 1em;
}
</style>

# Today's Lecture

- Our First Causal Question in Real Life
      - Causality
      - Correlation v. Causation
      - Other things equal

- Selection Bias

---
# Causal Inference to Inform Policy: Setting

Access to health care insurance is a huge political issue in the US. Subsidizing the provision and mandating the adoption of insurance was at the core of the, heavily debated, Affordable Health Care Act, also known as *Obamacare*.

__Policy:__ Subsidize, and/or enforce, a health care insurance for the entire population.

__Rationale:__ Increasing access to health care (through insurance), can improve the health outcomes of the population. 
  - Can you think of another rationale? 
    
Let's look at some data to investigate this rationale.

---
# National Health Interview Survey, 2009

.pull-left[
- This is just a random sample of 100 observations from the real dataset. The complete data contains 80634 observations (individuals).

]

.pull-right[
.font70[
<div id="htmlwidget-c0d9436d036f0f3c1032" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-c0d9436d036f0f3c1032">{"x":{"filter":"none","vertical":false,"caption":"<caption>2009 National Health Interview Survey<\/caption>","fillContainer":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100"],[1,1,1,1,0,0,1,0,1,0,1,1,0,1,0,1,1,1,0,1,1,0,1,1,1,1,1,1,1,1,0,1,1,1,1,0,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0,1,1,0,1,1,1,1,1,1,1,1,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1],[1,1,1,1,1,1,0,0,0,0,1,1,0,1,1,0,0,1,1,0,0,1,0,1,1,1,1,0,1,1,1,0,0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,0,1,0,1,1,0,0,0,1,0,1,0,0,0,1,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,1,1,1,0,1,1,0,0,1,1,1],[24,26,15,37,43,22,71,19,62,45,65,42,23,80,23,50,18,64,30,22,45,43,42,79,37,4,75,33,3,44,54,39,16,20,9,19,8,21,36,11,12,6,53,20,53,48,44,74,20,50,22,32,22,60,33,20,1,39,3,60,34,43,8,33,25,22,8,22,30,15,31,29,63,47,22,36,12,21,17,19,27,84,31,55,32,24,56,66,52,29,42,85,23,44,29,52,13,70,53,59],[3,3,4,3,2,3,3,4,5,3,3,5,3,3,4,5,5,5,5,5,5,5,5,1,5,5,2,3,3,5,3,3,5,5,5,4,5,5,5,4,5,5,3,3,4,4,5,5,5,5,4,4,4,3,3,5,5,4,4,3,5,5,4,5,5,1,4,4,5,3,2,5,5,4,4,5,4,3,2,4,5,2,4,3,5,4,4,4,3,4,5,2,4,4,3,3,4,4,2,4],[1134,5514,3321,3466,6609,2360,4210,1584,2885,2077,1608,2631,1879,2584,6855,2413,2181,2403,2772,2052,6950,1681,3434,1561,3090,5336,4400,1271,949,4395,1192,4864,2022,6472,14225,6447,1613,2946,2529,1651,825,3886,1946,5021,2715,4828,2990,1108,4026,4431,3325,4866,5782,3401,1263,1247,2170,2907,1184,3425,12017,3421,4937,3436,3745,4382,2447,8627,4208,3175,3583,2042,3202,3889,1946,1816,3224,4673,1329,1268,3745,4614,5583,3722,2850,4548,2241,1293,3218,4349,3167,4562,4523,1694,2390,1573,4366,2714,2491,6573]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th><span style=\"color: #007935 !important\">Insurance<?/span><\/th>\n      <th><span style=\"color: #007935 !important\">Female?<\/span><\/th>\n      <th><span style=\"color: #007935 !important\">Age<\/span><\/th>\n      <th><span style=\"color: #007935 !important\">Health<\/span><\/th>\n      <th><span style=\"color: #007935 !important\">Weight<\/span><\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":6,"lengthChange":false,"searching":false,"columnDefs":[{"className":"dt-right","targets":[1,2,3,4,5]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"rowCallback":"function(row, data, displayNum, displayIndex, dataIndex) {\nvar value=data[1]; $(this.api().cell(row, 1).node()).css({'color':'#9370DB'});\nvar value=data[2]; $(this.api().cell(row, 2).node()).css({'color':'#9370DB'});\nvar value=data[3]; $(this.api().cell(row, 3).node()).css({'color':'#9370DB'});\nvar value=data[4]; $(this.api().cell(row, 4).node()).css({'color':'#9370DB'});\nvar value=data[5]; $(this.api().cell(row, 5).node()).css({'color':'#9370DB'});\nvar value=data[0]; $(this.api().cell(row, 0).node()).css({'color':'#FD5F00'});\n}"}},"evals":["options.rowCallback"],"jsHooks":[]}</script>
]
]

---
# National Health Interview Survey, 2009

.pull-left[
- This is just a random sample of 100 observations from the real dataset. The complete data contains 80634 observations (individuals).

- What tools from the course (so far) should we use to look at this data?
]

.pull-right[
.font70[
<div id="htmlwidget-c0d9436d036f0f3c1032" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-c0d9436d036f0f3c1032">{"x":{"filter":"none","vertical":false,"caption":"<caption>2009 National Health Interview Survey<\/caption>","fillContainer":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60","61","62","63","64","65","66","67","68","69","70","71","72","73","74","75","76","77","78","79","80","81","82","83","84","85","86","87","88","89","90","91","92","93","94","95","96","97","98","99","100"],[1,1,1,1,0,0,1,0,1,0,1,1,0,1,0,1,1,1,0,1,1,0,1,1,1,1,1,1,1,1,0,1,1,1,1,0,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0,1,1,0,1,1,1,1,1,1,1,1,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1],[1,1,1,1,1,1,0,0,0,0,1,1,0,1,1,0,0,1,1,0,0,1,0,1,1,1,1,0,1,1,1,0,0,0,1,1,0,0,1,0,0,0,1,0,0,0,1,0,1,0,1,1,0,0,0,1,0,1,0,0,0,1,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,1,1,1,0,1,1,0,0,1,1,1],[24,26,15,37,43,22,71,19,62,45,65,42,23,80,23,50,18,64,30,22,45,43,42,79,37,4,75,33,3,44,54,39,16,20,9,19,8,21,36,11,12,6,53,20,53,48,44,74,20,50,22,32,22,60,33,20,1,39,3,60,34,43,8,33,25,22,8,22,30,15,31,29,63,47,22,36,12,21,17,19,27,84,31,55,32,24,56,66,52,29,42,85,23,44,29,52,13,70,53,59],[3,3,4,3,2,3,3,4,5,3,3,5,3,3,4,5,5,5,5,5,5,5,5,1,5,5,2,3,3,5,3,3,5,5,5,4,5,5,5,4,5,5,3,3,4,4,5,5,5,5,4,4,4,3,3,5,5,4,4,3,5,5,4,5,5,1,4,4,5,3,2,5,5,4,4,5,4,3,2,4,5,2,4,3,5,4,4,4,3,4,5,2,4,4,3,3,4,4,2,4],[1134,5514,3321,3466,6609,2360,4210,1584,2885,2077,1608,2631,1879,2584,6855,2413,2181,2403,2772,2052,6950,1681,3434,1561,3090,5336,4400,1271,949,4395,1192,4864,2022,6472,14225,6447,1613,2946,2529,1651,825,3886,1946,5021,2715,4828,2990,1108,4026,4431,3325,4866,5782,3401,1263,1247,2170,2907,1184,3425,12017,3421,4937,3436,3745,4382,2447,8627,4208,3175,3583,2042,3202,3889,1946,1816,3224,4673,1329,1268,3745,4614,5583,3722,2850,4548,2241,1293,3218,4349,3167,4562,4523,1694,2390,1573,4366,2714,2491,6573]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th><span style=\"color: #007935 !important\">Insurance<?/span><\/th>\n      <th><span style=\"color: #007935 !important\">Female?<\/span><\/th>\n      <th><span style=\"color: #007935 !important\">Age<\/span><\/th>\n      <th><span style=\"color: #007935 !important\">Health<\/span><\/th>\n      <th><span style=\"color: #007935 !important\">Weight<\/span><\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":6,"lengthChange":false,"searching":false,"columnDefs":[{"className":"dt-right","targets":[1,2,3,4,5]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false,"rowCallback":"function(row, data, displayNum, displayIndex, dataIndex) {\nvar value=data[1]; $(this.api().cell(row, 1).node()).css({'color':'#9370DB'});\nvar value=data[2]; $(this.api().cell(row, 2).node()).css({'color':'#9370DB'});\nvar value=data[3]; $(this.api().cell(row, 3).node()).css({'color':'#9370DB'});\nvar value=data[4]; $(this.api().cell(row, 4).node()).css({'color':'#9370DB'});\nvar value=data[5]; $(this.api().cell(row, 5).node()).css({'color':'#9370DB'});\nvar value=data[0]; $(this.api().cell(row, 0).node()).css({'color':'#FD5F00'});\n}"}},"evals":["options.rowCallback"],"jsHooks":[]}</script>
]
]

---
background-image: url("Images/MMtbl11.png")
background-size: contain

# National Health Interview Survey, 2009 (MM, Ch1)

---
background-image: url("Images/MMtbl11_notes.png")
background-size: contain

# National Health Interview Survey, 2009: Notes

---
background-image: url("Images/MMtbl11_notes.png")
background-size: 100%
background-position: 50% 100%

# National Health Interview Survey, 2009: Notes

---
background-image: url("Images/MMtbl11.png")
background-size: 70%
background-position: 60% 20%

# Let's Read This Summary Statistics

.pull-left[
.font110[
- `\(\mathop{\mathbb{E}}(Y|X)\)` ?
- `\(\sigma\)` ?
]
]

---
background-image: url("Images/MMtbl11_health.png")
background-size: 60%
background-position: 100% 50%

# National Health Interview Survey, 2009 (MM, Ch1)

.pull-left[
.font130[
- Can we interpret <br>
these differences <br>
**causally**?
]
]

---

# The Concept of Causality

__Causality__: what are we talking about?

- We say that `\(X\)` *causes* `\(Y\)`

--

- if we were to intervene and *change* the value of `\(X\)` ***without changing anything else***...
    
--

- then `\(Y\)` would also change ***as a result***.
  
--

- The key point here is the ***without changing anything else***, often referred as the **other things equal** assumption (or *ceteris paribus* if you want to sound fancy).

--

- ⚠️ It does **NOT** mean that `\(X\)` is the only factor that causes `\(Y\)`.

---
# Correlation vs Causation

***Correlation does not equal causation*** has become a ubiquitous mantra, but can you tell why it is true?

--

Some correlations obviously don't imply causation ([e.g. spurious correlation website](https://www.tylervigen.com/spurious-correlations)).

--

<img src="Images/spurious.png" width="800px" style="display: block; margin: auto;" />

---
background-image: url("Images/Smoking_lung_cancer.png")
background-size: 40%
background-position: 10% 70%

# Correlation vs Causation: Smoking and Lung Cancer
.pull-left[
.font90[But not all correlations are so easy to rule out]
]

.pull-right[
.font90[
***Does smoking cause lung cancer?***

- Today, we know the answer is *YES*!

- But let's go back in the 1950's

- We are at the start of a big increase in deaths from lung cancer...
  
  - ... which is happening after a fast growth in cigarette consumption

- It's very tempting to claim that smoking causes lung cancer based on this graph.
]
]

---

# Correlation vs Causation: Smoking and Lung Cancer

At the time many people were still skeptical, including some famous statisticians:

.pull-left[
.font90[
Macro confounding factors:

Other macro factors which can cause cancers also changed between 1900 and 1950:

- Tarring of roads,
  
  - Inhalation of motor exhausts (leaded gasoline fumes),
  
  - General greater air pollution.
  ]
]

.pull-right[
.font90[
Self selection:

Smokers and non-smokers may be different in the first place: 
  
  - __Selection on observable characteristics__: age, education, income, etc.
  
  - __Selection on unobservable characteristics__: genes (the hypothetical confounding genome theory of [Fisher](https://en.wikipedia.org/wiki/Ronald_Aylmer_Fisher)). 
  ]
]

---
background-image: url("Images/MMtbl11_health.png")
background-size: 60%
background-position: 100% 50%

# .font90[Back to Our Original Example: Health and Health Insurance]

.pull-left[
.font130[
- Can we interpret <br>
these differences <br>
**causally**?

- Are all **other** <br>
**things equal** between <br>
insured and uninsured?
]
]

---
# Selection Bias 
[Wikipedia Definition](https://en.wikipedia.org/wiki/Selection_bias):
>Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analyzed.

- Econometric textbooks, tend to define selection bias in term of a regression or (as MM) a randomized controlled trial.

- We will start from this more general definition to connect with the concept of **conditional expectation**.

- Then we will connect with regression and experiments.

---
background-image: url("Images/Survivorship-bias.png")
background-size: contain

# SB Example 1: Airplanes in World War II

---
background-image: url("Images/Survivorship-bias.png")
background-size: 40%
background-position: 100% 50%

# .font80[SB Example 1: Airplanes in World War II. Using Expectation 1/2]

.pull-left[
.font70[

- How would you use conditional expectations to characterize this problem?

- Let's start by simplifying the problem by assuming that each plane only had two sections. Now define two random variables: binary variables (bernulli) to indicate if the plane received damage in locations one, and two.   `\((DL1:\{\text{No damaged in lct 1, Damaged in lct} 1\} \to \{0,1\}\)`, same for `\(DL2)\)`.

- We also need to define random variable for that we are conditioning on. In this case, let's use a binary variable for return `\((R:\{\text{Plane didn't return, Plane returned} \}\to\{0,1\})\)`

]
]

---
background-image: url("Images/Survivorship-bias.png")
background-size: 40%
background-position: 100% 50%

# .font80[SB Example 1: Airplanes in World War II. Using Expectation 2/2]

.pull-left[
.font70[

- One way of characterizing the problem would be that the engineers thought they where observing `\(\mathop{\mathbb{E}}(DL1)\)` and `\(\mathop{\mathbb{E}}(DL2)\)` and concluding `\(\mathop{\mathbb{E}}(DL1) > \mathop{\mathbb{E}}(DL2)\)`.

- But in they were actually observing `\(\mathop{\mathbb{E}}(DL1|R=1)\)` and `\(\mathop{\mathbb{E}}(DL2|R=1)\)` and most likely `\(\mathop{\mathbb{E}}(DL1|R=0) < \mathop{\mathbb{E}}(DL2|R=0)\)`

- If you don't like the math notation, you can provide the same answer, but in narrative form.

- This is called [survivorship bias](https://en.wikipedia.org/wiki/Survivorship_bias), and is a type of selection bias. 
]
]

---
background-image: url("Images/MMtbl11_health.png")
background-size: 50%
background-position: 100% 50%
# SB Example 2: Health Insurance 1/2
.font70[
.pull-left[
- We can do something similar for our health insurance example. 
- The "hidden" information could be many things. For example: maybe uninsured people are less have different standards of what constitutes good health, and for the same true health status, uninsured tend to report much higher scores than insured (thanks Andy!). 
]
]
---
background-image: url("Images/MMtbl11_health.png")
background-size: 50%
background-position: 100% 50%
# SB Example 2: Health Insurance 2/2
.font70[
.pull-left[
- Define a binary random variable that represents if an individual tends to over report good health or not `\((ORep:\{\text{no over report, over reports} \}\to\{0,1\})\)`. In this case the previous comparison translates into: 
- `\(\mathop{\mathbb{E}}(H|HI=1, \color{#FD5F00}{Orep = 1} )\)`  for column (4), and `\(\mathop{\mathbb{E}}(H|HI=0, \color{#FD5F00}{Orep = 0} )\)`  for column (5).
- This is a violation of *other things equal* assumption. 
]
]
---
# .font90[SB Example 3: Country Characterization by Foreign Visitors]

- Characterization of Americans according to foreigners visiting Berkeley.

- Characterization of Chinese according to foreigner visiting a specific city.

---
background-image: url("Images/selection_bias_2x.png")
background-size: contain
background-position: 90% 50%

# More Examples
.pull-left[
- Convention of Statisticians. [XQCD](https://xkcd.wtf/2618/)
- [Heike Crabs](https://www.youtube.com/watch?v=dIeYPHCJ1B8)
- Appearance and Intelligence of Movie Stars (From [Causal Inference, The Mixtape](https://mixtape.scunning.com/03-directed_acyclical_graphs#sample-selection-and-collider-bias))
- Think of at least two examples yourself!
- ([Hernan Cascicari on Surveys](https://www.youtube.com/watch?v=_wHXjs7PPTw) <br> [in Spanish, and strong language warning])
]

---
class: title-slide-final
background-image: url("Images/correlation.png")
background-size: 60%
background-position: 50% 100%

# Acknowledgments

.pull-left[
- [Kyle Raze's Undergraduate Econometrics 1](https://github.com/kyleraze/EC320_Econometrics)
- SoPo
- XQCD
- MM
]
.pull-right[
- [Matt Hollian](http://mattholian.blogspot.com/2015/01/econometrics-and-kung-fu.html#more) 
- Causal Mixtape (Also Hanny Fry)
- The plane pic
- MM bookdown and MM blog post
]

Notes for current slide

Notes for next slide

Ec140 - Causality and Selection BiasFernando Hoces la Guardia06/29/20221 / 24

Today's Lecture

Our First Causal Question in Real Life
- Causality
- Correlation v. Causation
- Other things equal
Selection Bias

2 / 24

Causal Inference to Inform Policy: Setting

Access to health care insurance is a huge political issue in the US. Subsidizing the provision and mandating the adoption of insurance was at the core of the, heavily debated, Affordable Health Care Act, also known as Obamacare.

Policy: Subsidize, and/or enforce, a health care insurance for the entire population.

Rationale: Increasing access to health care (through insurance), can improve the health outcomes of the population.

Can you think of another rationale?

Let's look at some data to investigate this rationale.

3 / 24

National Health Interview Survey, 2009

This is just a random sample of 100 observations from the real dataset. The complete data contains 80634 observations (individuals).

4 / 24

National Health Interview Survey, 2009

This is just a random sample of 100 observations from the real dataset. The complete data contains 80634 observations (individuals).
What tools from the course (so far) should we use to look at this data?

5 / 24

National Health Interview Survey, 2009 (MM, Ch1)6 / 24

National Health Interview Survey, 2009: Notes7 / 24

National Health Interview Survey, 2009: Notes8 / 24

Let's Read This Summary StatisticsE(Y|X)E⁡(Y|X) ?
σσ ?

9 / 24

National Health Interview Survey, 2009 (MM, Ch1)Can we interpret 

these differences 

causally?

10 / 24

The Concept of Causality

Causality: what are we talking about?

We say that $X$ causes $Y$

11 / 24

The Concept of Causality

Causality: what are we talking about?

We say that $X$ causes $Y$
- if we were to intervene and change the value of $X$ without changing anything else...

11 / 24

The Concept of Causality

Causality: what are we talking about?

We say that $X$ causes $Y$
- if we were to intervene and change the value of $X$ without changing anything else...
- then $Y$ would also change as a result.

11 / 24

The Concept of Causality

Causality: what are we talking about?

We say that $X$ causes $Y$
- if we were to intervene and change the value of $X$ without changing anything else...
- then $Y$ would also change as a result.
The key point here is the without changing anything else, often referred as the other things equal assumption (or ceteris paribus if you want to sound fancy).

11 / 24

The Concept of Causality

Causality: what are we talking about?

We say that $X$ causes $Y$
- if we were to intervene and change the value of $X$ without changing anything else...
- then $Y$ would also change as a result.
The key point here is the without changing anything else, often referred as the other things equal assumption (or ceteris paribus if you want to sound fancy).

⚠️ It does NOT mean that $X$ is the only factor that causes $Y$ .

11 / 24

Correlation vs Causation

Correlation does not equal causation has become a ubiquitous mantra, but can you tell why it is true?

12 / 24

Correlation vs Causation

Correlation does not equal causation has become a ubiquitous mantra, but can you tell why it is true?

Some correlations obviously don't imply causation (e.g. spurious correlation website).

12 / 24

Correlation vs Causation

Correlation does not equal causation has become a ubiquitous mantra, but can you tell why it is true?

Some correlations obviously don't imply causation (e.g. spurious correlation website).

12 / 24

Correlation vs Causation: Smoking and Lung Cancer

But not all correlations are so easy to rule out

Does smoking cause lung cancer?

Today, we know the answer is YES!
But let's go back in the 1950's
- We are at the start of a big increase in deaths from lung cancer...
- ... which is happening after a fast growth in cigarette consumption
It's very tempting to claim that smoking causes lung cancer based on this graph.

13 / 24

Correlation vs Causation: Smoking and Lung Cancer

At the time many people were still skeptical, including some famous statisticians:

Macro confounding factors:

Other macro factors which can cause cancers also changed between 1900 and 1950:

Tarring of roads,
Inhalation of motor exhausts (leaded gasoline fumes),
General greater air pollution.

Self selection:

Smokers and non-smokers may be different in the first place:

Selection on observable characteristics: age, education, income, etc.
Selection on unobservable characteristics: genes (the hypothetical confounding genome theory of Fisher).

14 / 24

Back to Our Original Example: Health and Health Insurance

Can we interpret
these differences
causally?
Are all other
things equal between
insured and uninsured?

15 / 24

Selection Bias

Wikipedia Definition:

Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analyzed.

Econometric textbooks, tend to define selection bias in term of a regression or (as MM) a randomized controlled trial.
We will start from this more general definition to connect with the concept of conditional expectation.
Then we will connect with regression and experiments.

16 / 24

SB Example 1: Airplanes in World War II17 / 24

SB Example 1: Airplanes in World War II. Using Expectation 1/2

How would you use conditional expectations to characterize this problem?
Let's start by simplifying the problem by assuming that each plane only had two sections. Now define two random variables: binary variables (bernulli) to indicate if the plane received damage in locations one, and two. $(D L 1 : {No damaged in lct 1, Damaged in lct 1} \to {0, 1}$ , same for $D L 2)$ .
We also need to define random variable for that we are conditioning on. In this case, let's use a binary variable for return $(R : {Plane didn't return, Plane returned} \to {0, 1})$

18 / 24

SB Example 1: Airplanes in World War II. Using Expectation 2/2

One way of characterizing the problem would be that the engineers thought they where observing $E (D L 1)$ and $E (D L 2)$ and concluding $E (D L 1) > E (D L 2)$ .
But in they were actually observing $E (D L 1 | R = 1)$ and $E (D L 2 | R = 1)$ and most likely $E (D L 1 | R = 0) < E (D L 2 | R = 0)$
If you don't like the math notation, you can provide the same answer, but in narrative form.
This is called survivorship bias, and is a type of selection bias.

19 / 24

SB Example 2: Health Insurance 1/2We can do something similar for our health insurance example. 
The "hidden" information could be many things. For example: maybe uninsured people are less have different standards of what constitutes good health, and for the same true health status, uninsured tend to report much higher scores than insured (thanks Andy!). 

20 / 24

SB Example 2: Health Insurance 2/2Define a binary random variable that represents if an individual tends to over report good health or not (ORep:{no over report, over reports}→{0,1})(ORep:{no over report, over reports}→{0,1}). In this case the previous comparison translates into: 
E(H|HI=1,Orep=1)E⁡(H|HI=1,Orep=1)  for column (4), and E(H|HI=0,Orep=0)E⁡(H|HI=0,Orep=0)  for column (5).
This is a violation of other things equal assumption. 

21 / 24

SB Example 3: Country Characterization by Foreign Visitors

Characterization of Americans according to foreigners visiting Berkeley.
Characterization of Chinese according to foreigner visiting a specific city.

22 / 24

More Examples

Convention of Statisticians. XQCD
Heike Crabs
Appearance and Intelligence of Movie Stars (From Causal Inference, The Mixtape)
Think of at least two examples yourself!
(Hernan Cascicari on Surveys
[in Spanish, and strong language warning])

23 / 24

Acknowledgments

Kyle Raze's Undergraduate Econometrics 1
SoPo
XQCD
MM

Matt Hollian
Causal Mixtape (Also Hanny Fry)
The plane pic
MM bookdown and MM blog post

24 / 24

Today's Lecture

Our First Causal Question in Real Life
- Causality
- Correlation v. Causation
- Other things equal
Selection Bias

2 / 24

Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Esc	Back to slideshow