# Section 1 A ride

## 1.1 Introduction

Our ambition is to present a gentle introduction to the field of targeted learning. As an example, we consider statistical inference on a simple causal quantity that is ubiquitous in the causal literature. We use this exemplar parameter to introduce key concepts that can be applied to more complicated problems. The introduction weaves together two main threads, one theoretical and the other computational.

### 1.1.1 A causal story

We focus on a causal story where a random reward (a real number between 0 and 1) is given based on an action undertaken (one among two) and the random context where the action is performed (summarized by a real number between 0 and 1). The causal quantity of interest is the average difference of the two counterfactual rewards. This is a story as old as time. Should we take the red pill or the blue pill? Should we show our customers advertisement A or advertisement B? Should we require individuals to undergo cancer screening? At their core, each of these questions is asking what action should be taken to maximize a “reward.”

We will build several estimators and discuss their respective merits,
theoretically and computationally. The construction of the most involved
estimator will unfold in *targeted learning territory*, at the frontier of
machine learning and semiparametrics, the statistical theory of inference
based on semiparametric models.

### 1.1.2 The `tlrider`

package

The computational illustrations will be developed based on the companion
package `tlrider`

. The package can be installed by
running the following code:

`::install_github("achambaz/tlride/tlrider") devtools`

The version used in this document is 1.1.1.

Additional packages are also required, including `tidyverse`

(Wickham and Grolemund 2016), `caret`

(Kuhn 2020) and `ggdag`

(Barrett 2018). Assuming that these are installed too, we can
run the next chunk of code:

```
library(tidyverse)
library(caret)
library(ggdag)
library(tlrider)
```

### 1.1.3 What we will discuss

To begin, we discuss the nature of the parameter of interest, viewing it as the value of a statistical mapping evaluated at the law of the data (Section 2), with an emphasis on the smoothness and double-robustness properties inherited from the mapping (Sections 3 and 4). We then turn to the estimation of the parameter of interest. We first introduce and comment upon a simple inference strategy assuming provisionally that a relevant feature of the law of the data is known to us (Section 6). Second, we present the notion of nuisance parameters and adopt an algorithmic stance on their estimation (Section 7). Third, we introduce and comment upon two “naive” inference strategies (Section 8), the one-step correction procedure (Section 9) and, finally, the targeted minimum loss estimation procedure tailored to the inference of the parameter of main interest. In the appendix, we collect our notation (Section A), and present some results that are used in the main text and their proofs (Sections B and C).

## 1.2 A simulation study

### 1.2.1 Reproducible experiment as a law

We are interested in a reproducible experiment. Every time this experiment is
run, it generates an observation that we call \(O\). We view \(O\) as a random
variable drawn from *the law of the experiment* that we denote by \(P_{0}\).

We view \(P_{0}\) as an element of *the model* \(\calM\). The model is a
collection of laws. In particular, the model contains all laws that we think
may plausibly describe the law of the experiment. Thus, the choice of model
is based on our scientific knowledge of the experiment. The more we know about
the experiment, the smaller is \(\calM\). In all our examples, we use large
models that reflect a lack of knowledge about many aspects of the experiment.

### 1.2.2 A synthetic reproducible experiment

Instead of considering a real-life reproducible experiment, we focus for
pedagogical purposes on a *synthetic* reproducible experiment. Thus we can
from now on take on two different roles: that of an *oracle* knowing
completely the nature of the experiment, and that of a *statistician* eager to
know more about the experiment by observing some of its outputs.

Let us run the example built into the `tlrider`

package:

`example(tlrider)`

A few objects have been defined:

```
ls()
#> [1] "another_experiment" "experiment" "expit"
#> [4] "filter" "logit" "sigma0"
```

The function `expit`

implements the link function \(\expit : \bbR \to ]0,1[\) given
by \(\expit(x) \defq (1 + e^{-x})^{-1}\). The function `logit`

implements its
inverse function \(\logit : ]0,1[ \to \bbR\) given by \(\logit(p) \defq \log [p/(1-p)]\).

Let us take a look at `experiment`

:

```
experiment#> A law for (W,A,Y) in [0,1] x {0,1} x [0,1].
#>
#> If the law is fully characterized, you can use method
#> 'sample_from' to sample from it.
#>
#> If you built the law, or if you are an _oracle_, you can also
#> use methods 'reveal' to reveal its relevant features (QW, Gbar,
#> Qbar, qY -- see '?reveal'), and 'alter' to change some of them.
#>
#> If all its relevant features are characterized, you can use
#> methods 'evaluate_psi' to obtain the value of 'Psi' at this law
#> (see '?evaluate_psi') and 'evaluate_eic' to obtain the efficient
#> influence curve of 'Psi' at this law (see '?evaluate_eic').
```

The law \(P_{0}\) of the synthetic experiment `experiment`

built by us generates
a generic observation \(O\) that decomposes as \[\begin{equation*} O \defq (W, A,
Y) \in [0,1] \times \{0,1\} \times [0,1]. \end{equation*}\] We interpret \(W\) as
a real valued summary measure of a random context where an action \(A\) chosen
among two is undertaken, leading to a real valued reward \(Y\).

We can sample from the experiment (simply run `?sample_from`

to see the man
page of method `sample_from`

). The next chunk of code runs the experiment five
times, independently:

```
<- sample_from(experiment, n = 5))
(five_obs #> W A Y
#> [1,] 0.393 1 0.789
#> [2,] 0.399 1 0.782
#> [3,] 0.373 1 0.942
#> [4,] 0.371 1 0.838
#> [5,] 0.372 1 0.779
```

### 1.2.3 Revealing `experiment`

Acting as oracles, we can peek into `experiment`

and *reveal* a selection of
relevant features (simply run `?reveal`

to see the man page of method
`reveal`

). Made by us, the selection exhibits features that will play an
important role in the text.

```
<- reveal(experiment)
relevant_features names(relevant_features)
#> [1] "QW" "Gbar" "Qbar" "qY" "sample_from"
```

We have an oracular knowledge of `experiment`

and can thus comment upon the
features of \(P_{0}\) revealed in `relevant_features`

.

`QW`

The `QW`

feature describes the marginal law of \(W\), that we call \(Q_{0,W}\).^{1}

```
$QW
relevant_features#> function(W,
#> mixture_weights = c(1/10, 9/10, 0),
#> mins = c(0, 11/30, 0),
#> maxs = c(1, 14/30, 1)) {
#> out <- sapply(1:length(mixture_weights),
#> function(ii){
#> mixture_weights[ii] *
#> stats::dunif(W,
#> min = mins[ii],
#> max = maxs[ii])
#> })
#> return(rowSums(out))
#> }
#> <environment: 0x56115038ae00>
```

It appears that \(Q_{0,W}\) is a mixture of the uniform laws over \([0,1]\)
(weight \(1/10\)) and \([11/30,14/30]\) (weight \(9/10\)).^{2}

`Gbar`

The `Gbar`

feature describes the conditional probability of action \(A = 1\)
given \(W\). For each \(a \in \{0,1\}\), we denote \[\begin{align*} \Gbar_0(W)
&\defq \Pr_{P_0}(A = 1 | W), \\\ell\Gbar_0(a,W) &\defq \Pr_{P_0}(A = a |
W).\end{align*}\] Obviously, \[\begin{equation*}\ell\Gbar_{0}(A,W) =
A\Gbar_{0}(W) + (1-A) (1-\Gbar_{0}(W)).\end{equation*}\]

```
$Gbar
relevant_features#> function(W) {
#> expit(1 + 2 * W - 4 * sqrt(abs((W - 5/12))))
#> }
#> <environment: 0x56115038ae00>
```

Note how real numbers of the form \(1 + 2W - 4 * \sqrt{|W - 5/12|})\) are mapped into the interval \([0,1]\) by the \(\expit\) link function. We refer the reader to Figure 7.2 for a visualization of \(\Gbar_{0}\).

`qY`

The `qY`

feature describes the conditional density of \(Y\) given \(A\) and \(W\).
For each \(y\in ]0,1[\), we denote by \(q_{0,Y}(y, A, W)\) the conditional density
evaluated at \(y\) of \(Y\) given \(A\) and \(W\).

```
$qY
relevant_features#> function(obs, Qbar, shape10 = 2, shape11 = 3){
#> A <- obs[, "A"]
#> AW <- obs[, c("A", "W")]
#> QAW <- Qbar(AW)
#> shape1 <- ifelse(A == 0, shape10, shape11)
#> stats::dbeta(Y,
#> shape1 = shape1,
#> shape2 = shape1 * (1 - QAW) / QAW)
#> }
#> <environment: 0x56115038ae00>
```

It appears that the conditional law of \(Y\) given \(A\) and \(W\) is the Beta law
with conditional mean and variance characterized by the `Qbar`

feature of
`experiment`

(see below) and the `shape10`

and `shape11`

parameters.

`Qbar`

As for the `Qbar`

feature, it describes the conditional mean of \(Y\) given \(A\)
and \(W\).

```
$Qbar
relevant_features#> function(AW) {
#> A <- AW[, "A"]
#> W <- AW[, "W"]
#> A * (cos((-1/2 + W) * pi) * 2/5 + 1/5 +
#> (1/3 <= W & W <= 1/2) / 5 +
#> (W >= 3/4) * (W - 3/4) * 2) +
#> (1 - A) * (sin(4 * W^2 * pi) / 4 + 1/2)
#> }
#> <bytecode: 0x5611585c3d40>
#> <environment: 0x56115038ae00>
```

We denote \(\Qbar_0(A,W) = \Exp_{P_{0}}(Y|A,W)\) the conditional mean of \(Y\) given \(A\) and \(W\). Note how \(\Qbar_0(A,W)\) does depend heavily on \(A\) and \(W\). We refer the reader to Section 1.3 for a visualization of \(\Qbar_{0}\).

`sample_from`

Finally, the `sample_from`

feature is the function called by method
`sample_from`

when it is applied to an object of class `LAW`

, like
`experiment`

.

```
$sample_from
relevant_features#> function(n, ideal = FALSE) {
#> ## preliminary
#> n <- R.utils::Arguments$getInteger(n, c(1, Inf))
#> ideal <- R.utils::Arguments$getLogical(ideal)
#> ## ## 'Gbar' and 'Qbar' factors
#> Gbar <- experiment$.Gbar
#> Qbar <- experiment$.Qbar
#> ## sampling
#> ## ## context
#> params <- formals(experiment$.QW)
#> mixture_weights <- eval(params$mixture_weights)
#> mins <- eval(params$mins)
#> maxs <- eval(params$maxs)
#> W <- sample_from_mixture_of_uniforms(n, mixture_weights,
#> mins, maxs)
#> ## ## counterfactual rewards
#> zeroW <- cbind(A = 0, W)
#> oneW <- cbind(A = 1, W)
#> Qbar_zeroW <- Qbar(zeroW)
#> Qbar_oneW <- Qbar(oneW)
#> Yzero <- stats::rbeta(n,
#> shape1 = 2,
#> shape2 = 2 * (1 - Qbar_zeroW) / Qbar_zeroW)
#> Yone <- stats::rbeta(n,
#> shape1 = 3,
#> shape2 = 3 * (1 - Qbar_oneW) / Qbar_oneW)
#> ## ## action undertaken
#> A <- stats::rbinom(n, size = 1, prob = Gbar(W))
#> ## ## actual reward
#> Y <- A * Yone + (1 - A) * Yzero
#> ## ## observation
#> if (ideal) {
#> obs <- cbind(W = W, Yzero = Yzero, Yone = Yone, A = A, Y = Y)
#> } else {
#> obs <- cbind(W = W, A = A, Y = Y)
#> }
#> return(obs)
#> }
#> <bytecode: 0x561159a2efa0>
#> <environment: 0x56115038ae00>
```

We will comment upon the `ideal`

argument in the above `sample_from`

feature
in Section 2.1.

## 1.3 ⚙ Visualization

- Run the following chunk of code. It visualizes the conditional mean \(\Qbar_{0}\).

```
<- relevant_features$Gbar
Gbar <- relevant_features$Qbar
Qbar <- relevant_features$QW
QW
<- tibble(w = seq(0, 1, length.out = 1e3)) %>%
features mutate(Qw = QW(w),
Gw = Gbar(w),
Q1w = Qbar(cbind(A = 1, W = w)),
Q0w = Qbar(cbind(A = 0, W = w)),
blip_Qw = Q1w - Q0w)
%>% select(-Qw, -Gw) %>%
features rename("Q(1,.)" = Q1w,
"Q(0,.)" = Q0w,
"Q(1,.) - Q(0,.)" = blip_Qw) %>%
pivot_longer(-w, names_to = "f", values_to = "value") %>%
ggplot() +
geom_line(aes(x = w, y = value, color = f), size = 1) +
labs(y = "f(w)", title = bquote("Visualizing" ~ bar(Q)[0])) +
ylim(NA, 1)
```

- Adapt the above chunk of code to visualize the marginal density \(Q_{0,W}\) and conditional probability \(\Gbar_{0}\).

## 1.4 ⚙ Make your own experiment

You can easily make your own experiment.

Check out the man page of method

`alter`

by running`?alter`

.Run the following chunk of code:

```
<- LAW() ## creates an object of class 'LAW'
my_experiment alter(my_experiment, ## characterize its relevant features
QW = tribble(
~value, ~weight,
0, 1/4, ## the probabilities must...
1, 3/4 ## ... sum up to 1
),Gbar = function(W) {
<- rep_len(0, length(W))
out == 0] <- 1/3
out[W == 1] <- 3/5
out[W return(out)
},Qbar = function(AW) {
<- matrix(c(1/2, 2/3, 7/8, 4/5), ncol = 2,
probs dimnames = list(c("A=0", "A=1"),
c("W=0", "W=1")))
cbind(AW[, "A"] + 1, AW[, "W"] + 1)]
probs[
},qY = function(obs) {
<- matrix(c(1/2, 2/3, 7/8, 4/5), ncol = 2,
probs dimnames = list(c("A=0", "A=1"),
c("W=0", "W=1")))
<- probs[cbind(obs[, "A"] + 1, obs[, "W"] + 1)]
probs "Y"] * probs + (1 - obs[, "Y"]) * (1 - probs)
obs[,
},sample_from = function(n) {
## preliminary
<- R.utils::Arguments$getInteger(n, c(1, Inf))
n ## 'QW', 'Gbar' and 'Qbar' features
<- my_experiment$.QW
QW <- my_experiment$.Gbar
Gbar <- my_experiment$.Qbar
Qbar ## sampling
<- rbinom(n, size = 1, prob = QW$weight[QW$value == 1])
W <- rbinom(n, size = 1, prob = Gbar(W))
A <- cbind(W = W, A = A)
AW <- rbinom(n, size = 1, Qbar(AW))
Y return(cbind(AW, Y = Y))
})
```

Note that the `QW`

feature of `my_experiment`

is a `tibble`

, contrary to the
`QW`

feature of `experiment`

which is a `function`

. The former describes a
discrete law and the latter is a density. Those are the two configurations
handled by the package.

- What does the next chunk do?

```
sample_from(my_experiment, 3))
(#> W A Y
#> [1,] 0 1 1
#> [2,] 1 0 1
#> [3,] 1 0 1
```

- Characterize entirely the law of
`my_experiment`

. Hint:

```
<- sample_from(my_experiment, 1e4)
obs %>% as_tibble %>% group_by(W, A, Y) %>%
obs summarize(how_many = n(), .groups = "drop")
#> # A tibble: 8 × 4
#> W A Y how_many
#> <int> <int> <int> <int>
#> 1 0 0 0 841
#> 2 0 0 1 876
#> 3 0 1 0 264
#> 4 0 1 1 497
#> 5 1 0 0 370
#> 6 1 0 1 2668
#> # … with 2 more rows
%>% as_tibble %>%
obs summarize("P(W=1)" = mean(W))
#> # A tibble: 1 × 1
#> `P(W=1)`
#> <dbl>
#> 1 0.752
%>% as_tibble %>% group_by(W) %>%
obs summarize("P(A=1|W)" = mean(A))
#> # A tibble: 2 × 2
#> W `P(A=1|W)`
#> <int> <dbl>
#> 1 0 0.307
#> 2 1 0.596
%>% as_tibble %>% group_by(W, A) %>%
obs summarize("P(Y=1|A,W)" = mean(Y), .groups = "drop")
#> # A tibble: 4 × 3
#> W A `P(Y=1|A,W)`
#> <int> <int> <dbl>
#> 1 0 0 0.510
#> 2 0 1 0.653
#> 3 1 0 0.878
#> 4 1 1 0.806
```

- Now, make your own experiment.