This video is on causal assumptions.
The primary learning objectives here are to understand some of the causal
assumptions that we need to make to link potential outcomes to observed data.
In particular, we aim to understand the following four assumptions,
what's known as SUTVA, consistency, ignorability, and positivity.
So identifiability, identifiability of causal effects,
it's going to require making some untestable assumptions.
And statistical identifiability in general has to
do with identifying some parameter from actual data.
So a parameter is considered identifiable is if you can,
if you're basically able to estimate it from data.
And in the causal inference area, there's this fundamental problem of
causal inference where we don't see both potential outcomes, and therefore we're
going to have to make some assumptions if we want to identify causal effects.
And, in particular, in the causal inference world,
some of the assumptions we have to make are untestable, and
these untestable kinds of assumptions are called causal assumptions.
And the most common assumptions are the following, so
there's the Stable Unit Treatment Value Assumption, which is also known as SUTVA.
There's consistency, ignorability, positivity,
and assumptions, assumptions will have to do with the observed data.
And we're going to assume, as we talk about these assumptions,
that our observed data consists of an outcome, Y, a treatment variable A,
and then some set of pre-treatment covariates X.
So X you could think of as the kinds of information
you might want to collect for your particular study.
So if you are in a medical setting, it could be demographics,
age, race, and so on, it could be clinical variables,
diagnosis, laboratory values, and so on.
But just think of these as a collection of variables that you might want to
control for, for example.
So we have our data consists of Y, A and X, so the first assumption,
known as SUTVA, you could actually think of it as being two assumptions,
the first one being no interference.
What we mean by that is that units do not interfere with each other,
and units here would refer to typically people,
whatever your population is targeting.
So typically in biomedical research, we're talking about patients,
but what do we mean by interfere?
So that would have to do with whether treatment assignment of one
person affects the outcome or treatment assignment of another unit,
or the treatment effectiveness of another person.
And another word for this is either spillover or contagion, so
you could imagine a couple of scenarios where there would be interference.
So if you were, maybe you were doing some behavioral intervention, so
your treatment is some kind of behavioral intervention, but
the people in your study interact with each other.
So, how effective the intervention is on one person might depend on
what intervention the other people that they interact with got, so
that would be an example where there is interference.
So, how effective a treatment is on a person,
on one person, might depend on what treatment other people got.
You could also think of this in vaccine studies where how effective
a vaccine is for
one person might depend on what proportion of the population receive the vaccine.
So typically we're going to assume no interference,
that this isn't happening, that when we assign somebody a treatment how
effective that is isn't dependent on what is happening with other people.
There are causal inference methods that can handle interference,
but we're not covering those in this course.
Another part of the SUTVA assumption is that there is one version of treatment,
and this is important because it's going to have to do with
having your potential outcomes being linked effectively to your observed data.
If there's multiple versions of treatment, it becomes difficult to understand
even what a causal effect means and it causes a number of other problems.
So, we think of one version of treatment where there's one variable that we can
hypothetically intervene on and it's very well defined what we mean by treatment.
So, if you make the SUTVA assumption,
the advantage of it is that you can write potential outcomes for
the ith person in terms of only their own treatments.
So, when we define potential outcomes,
we talked about the potential outcome as an outcome if the person
hypothetically received treatment, equal little a, for example.
We didn't define it in terms of the outcome that would be observed
given what everybody in the whole population received as treatment.
So it’s a SUTVA assumption that allows us to do that, we don't need to write
potential outcomes in terms of the treatments of everybody else, we only need
to write the potential outcomes in terms of this particular person's treatment.
So this really simplifies the problem quite a bit and
that's the reason that that's usually made, and in many situations,
this will be a reasonable assumption.
The consistency assumption is the next one we'll talk about, and this is really,
in principles it's a pretty obvious or simple assumption where here we're really
directly linking potential outcomes and observed data.
So we're saying that the potential outcome under treatment A equal little a,
which we define as Y superscript little a, that's just equal to
the observed outcome if the actual treatment received was A equal little a.
So your, when treatment is actually equal to little
a then our observed outcome is directly equal or
corresponding to the potential outcome, Y superscript little a.
So, if you remember, for
potential outcomes what we imagine Y superscript little a is the outcome
that would be observed if treatment actually took value little a.
So then if treatment does actually take value little a, then we're
saying that the observed outcome is equal to that potential outcome, so
this is just directly linking potential outcomes and observed outcomes.
So, in other words, our observed outcome, Y, is equal to potential outcome,
Y superscript little a, if treatment is equal to little a, and
that's true for all a, for any possible treatment.
Next we'll get into the ignorability assumption,
which is probably the most important assumption that we'll discuss, and
the one that people usually give the most attention to.
And this is also sometimes what we refer to as the no unmeasured confounders
So now we're going to have to involve these other kinds of variables,
these pre-treatment covariates X.
And the basic idea is that treatment assignment is assumed to be independent
from potential outcomes, conditional on these pre-treatment variables.
So these pre-treatment variables are a set of variables and
if we have the right ones and enough of them,
then we are assuming that effectively treatment is randomly assigned.
So, this notation here, the symbol in the middle there,
it means independence, so it's saying that potential outcomes Y zero
comma Y one are independent of treatment variable A conditional on X,
so conditional on these sort of baseline pre-treatment variables.
So you could think of these, these variables acts as these,
what people typically think of as confounding variables, treatment might
be in practice assigned to people, for example, who are older, or sicker.
So it's not randomly assigned, but once you control for things like age and
health, then we might be able to think of treatment as being randomly assigned.
So X is that collection of variables that are going to sort of create this
kind of independence.
So among people with the same values of X,
then we could essentially think of treatment as being randomly assigned.
And what we mean by random here is strictly that it's independent
of the potential outcomes, so it might not be random in some other sense.
So, if, for example, your outcome here is blood pressure, so,
let's say systolic blood pressure, if treatment was assigned
completely independently from the expected response to treatment,
if the clinician was making a determination on who should get treated,
that's independent from who would benefit from treatment.
Then it would be independent, but, of course, that's unrealistic but now imagine
that they're basing the treatment decision on some variables that we've observed.
So, they might be more likely to give treatment to people who are older or
to people who have history of higher blood pressure, or those sorts of things,
things that we can capture on our dataset, those are a collection of X's.
And if we have enough of those, the idea is that now treatment is effectively
randomized, so that's the important assumption known as ignorability.
And it's called ignorability, because what we mean is that treatment assignment
itself becomes ignorable, it becomes a non-factor, as long as we have
enough of these X's, if we have the right covariates, now treatment assignment,
we don't have to worry about anymore, it's effectively randomized.
So this is the, again, the notation and we'll consider a simple example,
suppose X is just a single variable and we'll just say that it's age and let's
just say it's either a young girl, that we're going to really simplify things.
So let's just say, older or younger is really the X variable that matters,
and let's just say, older people are more likely to get treatment A equal one,
but older people might be more likely to have the outcome,
let's say hip fracture, regardless of treatment.
So in this case, age is related to treatment, who gets treatment,
and also the age is related to the risk of the outcome, even regardless of treatment.
So, in that case, treatment is not randomly assigned, right, because people
who are older are more likely to get treated, it's not random, but.
So this is what we mean by marginally, so Y zero and Y one are not independent from
A marginally, meaning not conditional on X, so it's not random in general,
but within levels of X, we might have random treatment assignment.
Imagine that X is the only variable like this, so
that people who are older are more likely to get treated than people younger and
that's the only variable that's taken into account.
In that case, we could say that within levels of X, in other words,
among people who are younger and among people who are older,
treatment is effectively randomly assigned at that point, so,
in that case, treatment assignment is ignorable given age.
And so that's clearly an assumption, and
one of the things that we will try to do is figure out what are these X variables
that we need to collect to make the ignorability assumption hold?
Next we'll move on to the positivity assumption, so
positivity is referring to this idea that everybody had
some chance of getting either treatment, and
that's sort of conditional on these X's.
So, at every level of X, and for every treatment, people had a nonzero
probability, a greater than zero chance of getting treatment, so in other words,
treatment is not deterministic as a function of X, for example.
So in the previous example when we talked about older versus younger,
it would be a violation of the positivity assumption if everybody who was older got
treated, but it's not a violation of the positivity assumption if older people
are just more likely to get treated, but everybody still could get treated.
And hopefully the reason that we need this assumption is clear because,
remember, we're going to need to have data where we can
learn about what would happen under either treatment scenario.
So if for a given value of X, everybody is treated, then there's really no way for
us to learn what would've happened if they weren't treated.
But as long as we have some people who are treated and
some who aren't within every level of X, then there's some hope of
learning about the sort of causal effects of treatment within levels of X.
So we need this positivity assumption just so we can have some
data at every level of X for people who are treated and not treated.
So this is just reiterating, we just need for
it not to be deterministic, if it is deterministic,
there's some cases where people with certain diseases
might be ineligible in a sense for a particular treatment.
Well, in that case, we don't want to make inference about that population, so
we would probably exclude them from the study and
make sure that when we have our inference, our causal effect results,
that we're not thinking of them as part of this population, right?
So positivity assumption is also helping us sort of just find who our population of
interest is, if there are people who could never get the treatment,
then typically we would want to exclude them and only make inference about
the population of people who have some chance of getting the treatment.
And so, in general, though, of course, we need variability in treatment
assignment if we're going to have an identification.
So if everything is deterministic, then we're just not going to have the data that
we'll need to identify causal effects.
So that's the positivity assumption that at every level of X,
everybody has some chance of getting either treatment.
Next, we're going to move away from just defining assumptions to
then linking observed data and potential outcomes,
we'll use assumptions to link observed data and potential outcomes.
So here's one expected value that only involves observed data, so here
we have the expected value of Y, given A equal little a and X equal little x.
So this is the expected value of Y among the subpopulation of people who have
treatment equal to little a, and whose covariates are equal to little x.
So this is, we observe capital Y, capital A and capital X,
those are all observed data, there's no potential outcomes there,
so we can use some of these causal assumptions then to link
the observed data to potential outcomes.
So we started with this thing, this expected value that only involves
observed data, so that's on the left, but that is actually equal
to the expected value of Y superscript A given A equal little a and
X equal little x, by the consistency assumption.
So if you remember, the consistency assumption said that
the outcome that we observe when treatment is equal to little a is
the same as the potential outcome, y superscript little a.
So that's why we can, as long as the consistency assumption holds,
then we have this equation here, we can just link these two.
So you'll notice we already went from something only involving observed data
to now something that involves potential outcomes, and
we did it just from this consistency assumption.
Next we can think about the ignorability assumption, and what the ignorability
assumption allows us to do is to drop this conditioning on treatment.
So, from the previous line to this line, well,
all we did was drop the conditioning on A equal little a,
what allows us to do that is the ignorability assumption.
If you remember, the ignorability assumption said that conditional on X,
conditional on these covariates,
the treatment assignment mechanism doesn't matter, it's just random.
So, in other words, conditioning on A isn't providing us any
additional information about the mean of the potential outcome here,
because as long as you condition on X, it's randomly assigned.
So we're able to drop this conditioning on A here, so
that's by the ignorability assumption.
So now we've gone from our original statement, which was expectability of Y
given A and X, to now something involving a potential outcome where we're only
conditioning an X, and that's strictly from consistency and ignorability.
And now if we want what's known as a marginal causal effect, which is a kind
of causal effect we've talked about previously, something involving,
let's say, a difference in potential outcomes where we don't condition on X,
what we have to do then is we have to average over the distribution of X.
In statistics, ignorability is a feature of an experiment design whereby the method of data collection (and the nature of missing data) do not depend on the missing data. A missing data mechanism such as a treatment assignment or survey sampling strategy is "ignorable" if the missing data matrix, which indicates which variables are observed or missing, is independent of the missing data conditional on the observed data.
This idea is part of the Rubin Causal Inference Model, developed by Donald Rubin in collaboration with Paul Rosenbaum in the early 1970s.
Pearl  devised a simple graphical criterion, called back-door, that entails ignorability and identifies sets of covariates that achieve this condition.
Missing at random
- Andrew Gelman, John B. Carlin, Hal S. Stern and Donald B. Rubin. Bayesian Data Analysis. Chapman & Hall/CRC: New York, 2004.
- Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.