+ - 0:00:00
Notes for current slide
Notes for next slide

Ec140 - Causality and Selection Bias

Fernando Hoces la Guardia

06/29/2022

1 / 24

Today's Lecture

  • Our First Causal Question in Real Life

    • Causality
    • Correlation v. Causation
    • Other things equal
  • Selection Bias

2 / 24

Causal Inference to Inform Policy: Setting

Access to health care insurance is a huge political issue in the US. Subsidizing the provision and mandating the adoption of insurance was at the core of the, heavily debated, Affordable Health Care Act, also known as Obamacare.

Policy: Subsidize, and/or enforce, a health care insurance for the entire population.

Rationale: Increasing access to health care (through insurance), can improve the health outcomes of the population.

  • Can you think of another rationale?

Let's look at some data to investigate this rationale.

3 / 24

National Health Interview Survey, 2009

  • This is just a random sample of 100 observations from the real dataset. The complete data contains 80634 observations (individuals).
4 / 24

National Health Interview Survey, 2009

  • This is just a random sample of 100 observations from the real dataset. The complete data contains 80634 observations (individuals).

  • What tools from the course (so far) should we use to look at this data?

5 / 24

National Health Interview Survey, 2009 (MM, Ch1)

6 / 24

National Health Interview Survey, 2009: Notes

7 / 24

National Health Interview Survey, 2009: Notes

8 / 24

Let's Read This Summary Statistics

  • E(Y|X) ?
  • σ ?
9 / 24

National Health Interview Survey, 2009 (MM, Ch1)

  • Can we interpret
    these differences
    causally?
10 / 24

The Concept of Causality

Causality: what are we talking about?

  • We say that X causes Y
11 / 24

The Concept of Causality

Causality: what are we talking about?

  • We say that X causes Y

    • if we were to intervene and change the value of X without changing anything else...
11 / 24

The Concept of Causality

Causality: what are we talking about?

  • We say that X causes Y

    • if we were to intervene and change the value of X without changing anything else...

    • then Y would also change as a result.

11 / 24

The Concept of Causality

Causality: what are we talking about?

  • We say that X causes Y

    • if we were to intervene and change the value of X without changing anything else...

    • then Y would also change as a result.

  • The key point here is the without changing anything else, often referred as the other things equal assumption (or ceteris paribus if you want to sound fancy).

11 / 24

The Concept of Causality

Causality: what are we talking about?

  • We say that X causes Y

    • if we were to intervene and change the value of X without changing anything else...

    • then Y would also change as a result.

  • The key point here is the without changing anything else, often referred as the other things equal assumption (or ceteris paribus if you want to sound fancy).

  • ⚠️ It does NOT mean that X is the only factor that causes Y.
11 / 24

Correlation vs Causation

Correlation does not equal causation has become a ubiquitous mantra, but can you tell why it is true?

12 / 24

Correlation vs Causation

Correlation does not equal causation has become a ubiquitous mantra, but can you tell why it is true?

Some correlations obviously don't imply causation (e.g. spurious correlation website).

12 / 24

Correlation vs Causation

Correlation does not equal causation has become a ubiquitous mantra, but can you tell why it is true?

Some correlations obviously don't imply causation (e.g. spurious correlation website).

12 / 24

Correlation vs Causation: Smoking and Lung Cancer

But not all correlations are so easy to rule out

Does smoking cause lung cancer?

  • Today, we know the answer is YES!

  • But let's go back in the 1950's

    • We are at the start of a big increase in deaths from lung cancer...

    • ... which is happening after a fast growth in cigarette consumption

  • It's very tempting to claim that smoking causes lung cancer based on this graph.

13 / 24

Correlation vs Causation: Smoking and Lung Cancer

At the time many people were still skeptical, including some famous statisticians:

Macro confounding factors:

Other macro factors which can cause cancers also changed between 1900 and 1950:

  • Tarring of roads,

  • Inhalation of motor exhausts (leaded gasoline fumes),

  • General greater air pollution.

Self selection:

Smokers and non-smokers may be different in the first place:

  • Selection on observable characteristics: age, education, income, etc.

  • Selection on unobservable characteristics: genes (the hypothetical confounding genome theory of Fisher).

14 / 24

Back to Our Original Example: Health and Health Insurance

  • Can we interpret
    these differences
    causally?

  • Are all other
    things equal between
    insured and uninsured?

15 / 24

Selection Bias

Wikipedia Definition:

Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analyzed.

  • Econometric textbooks, tend to define selection bias in term of a regression or (as MM) a randomized controlled trial.

  • We will start from this more general definition to connect with the concept of conditional expectation.

  • Then we will connect with regression and experiments.

16 / 24

SB Example 1: Airplanes in World War II

17 / 24

SB Example 1: Airplanes in World War II. Using Expectation 1/2

  • How would you use conditional expectations to characterize this problem?

  • Let's start by simplifying the problem by assuming that each plane only had two sections. Now define two random variables: binary variables (bernulli) to indicate if the plane received damage in locations one, and two. (DL1:{No damaged in lct 1, Damaged in lct1}{0,1}, same for DL2).

  • We also need to define random variable for that we are conditioning on. In this case, let's use a binary variable for return (R:{Plane didn't return, Plane returned}{0,1})

18 / 24

SB Example 1: Airplanes in World War II. Using Expectation 2/2

  • One way of characterizing the problem would be that the engineers thought they where observing E(DL1) and E(DL2) and concluding E(DL1)>E(DL2).

  • But in they were actually observing E(DL1|R=1) and E(DL2|R=1) and most likely E(DL1|R=0)<E(DL2|R=0)

  • If you don't like the math notation, you can provide the same answer, but in narrative form.

  • This is called survivorship bias, and is a type of selection bias.

19 / 24

SB Example 2: Health Insurance 1/2

  • We can do something similar for our health insurance example.
  • The "hidden" information could be many things. For example: maybe uninsured people are less have different standards of what constitutes good health, and for the same true health status, uninsured tend to report much higher scores than insured (thanks Andy!).
20 / 24

SB Example 2: Health Insurance 2/2

  • Define a binary random variable that represents if an individual tends to over report good health or not (ORep:{no over report, over reports}{0,1}). In this case the previous comparison translates into:
  • E(H|HI=1,Orep=1) for column (4), and E(H|HI=0,Orep=0) for column (5).
  • This is a violation of other things equal assumption.
21 / 24

SB Example 3: Country Characterization by Foreign Visitors

  • Characterization of Americans according to foreigners visiting Berkeley.

  • Characterization of Chinese according to foreigner visiting a specific city.

22 / 24

More Examples

23 / 24

Acknowledgments

  • Matt Hollian
  • Causal Mixtape (Also Hanny Fry)
  • The plane pic
  • MM bookdown and MM blog post
24 / 24

Today's Lecture

  • Our First Causal Question in Real Life

    • Causality
    • Correlation v. Causation
    • Other things equal
  • Selection Bias

2 / 24
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow