# Section 4 Double-robustness

## 4.1 Linear approximations of parameters

### 4.1.1 From gradients to estimators

We learned in Section 3 that the stochastic behavior of a regular, asymptotically linear estimator of \(\Psi(P)\) can be characterized by its influence curve. Moreover, we said that this influence curve must in fact be a gradient of \(\Psi\) at \(P\).

In this section, we show that the converse is also true: given a gradient
\(D^*\) of \(\Psi\) at \(P\), under so-called *regularity conditions*, it is
possible to construct an estimator with influence curve equal to
\(D^*(P)\). This fact will suggest concrete strategies for generating efficient
estimators of smooth parameters. We take here the first step towards
generating such estimators: linearizing the parameter.

### 4.1.2 A Euclidean perspective

As in Section 3.3.3, drawing a parallel to Euclidean
geometry is helpful. We recall that if \(f\) is a differentiable mapping from
\(\bbR^p\) to \(\bbR\), then a Taylor series approximates \(f\) at a point \(x_0 \in \bbR^p\): \[\begin{equation*} f(x_0) \approx f(x) + \langle(x_0 - x), \nabla
f(x)\rangle,\end{equation*}\] where \(x\) is a point in \(\bbR^p\), \(\nabla f(x)\) is
the gradient of \(f\) evaluated at \(x\) and \(\langle u,v\rangle\) is the scalar
product of \(u,v \in \bbR^{p}\). As the squared distance \(\|x-x_{0}\|^{2} = \langle x-x_{0}, x-x_{0}\rangle\) between \(x\) and \(x_0\) decreases, the *linear
approximation* to \(f(x_0)\) becomes more accurate.

### 4.1.3 The remainder term

Returning to the present problem with this in mind, we find that indeed a similar approximation strategy may be applied.

For clarity, let us introduce a new shorthand notation. For any measurable function \(f\) of the observed data \(O\), we may write from now on \(P f \defq \Exp_P(f(O))\). One may argue that the notation is valuable beyond the gain of space. For instance, (3.8)

\[\begin{equation*} \sqrt{n} (\psi_n - \Psi(P)) = \frac{1}{\sqrt{n}} \sum_{i=1}^n \IC(O_i) + o_P(1) \end{equation*}\]

can be rewritten as

\[\begin{equation*} \sqrt{n} (\psi_n - \Psi(P)) = \sqrt{n} (P_{n} - P) \IC + o_P(1), \end{equation*}\]

thus suggesting more clearly the importance of the so-called *empirical
process* \(\sqrt{n} (P_{n} - P)\).

In particular, if \(\Psi\) is smooth uniformly over directions, then for any given \(P \in \calM\), we can write

\[\begin{equation} \Psi(P_0) = \Psi(P) + (P_0 - P) D^*(P) - \Rem_{P_0}(P), \tag{4.1} \end{equation}\]

where \(\Rem_{P_0}(P)\) (defined *implicitly* by (4.1) –
see (4.2)) is a
*remainder term* satisfying that \[\begin{equation*} \frac{\Rem_{P_0}(P)}{d(P,
P_0)} \rightarrow 0 \ \mbox{as} \ d(P, P_0) \rightarrow 0 , \end{equation*}\]
with \(d\) a measure of discrepancy for distributions in \(\calM\). Note that
(4.1) can be equivalently written as \[\begin{equation*}
\Psi(P_0) = \Psi(P) + \Exp_{P_0}(D^*(P)(O)) - \Exp_P(D^*(P)(O)) -
\Rem_{P_0}(P). \end{equation*}\] The remainder term formalizes the notion that
if \(P\) is *close* to \(P_0\) (*i.e.*, if \(d(P,P_0)\) is small), then the linear
approximation of \(\Psi(P_0)\) is more accurate. In light of
the Euclidean perspective of Section 4.1.2, the
remainder term \(\Rem_{P_0}(P)\) plays the role of the squared distance \(\|x-x_0\|^{2}\).

### 4.1.4 Expressing the remainder term as a function of the relevant features

The equations for the definition of the parameter (2.6), form of the canonical gradient (3.4), and linearization of parameter (4.1) combine to determine the remainder:

\[\begin{equation} \Rem_{P_0}(P) \defq \Psi(P) - \Psi(P_0) - (P_0 - P)D^*(P) \tag{4.2} \end{equation}\]

hence

\[\begin{multline} \Rem_{P_0}(P)= \Exp_{P_0} \Bigg[ \left(\Gbar_0(W) - \Gbar(W)\right) \\ \times \left(\frac{\Qbar_0(1,W) - \Qbar(1,W)}{\ell\Gbar(1,W)} + \frac{\Qbar_0(0,W) - \Qbar(0,W)}{\ell\Gbar(0,W)} \right) \Bigg]. \tag{4.3} \end{multline}\]

Acting as oracles, we can compute explicitly the remainder term
\(\Rem_{P_0}(P)\). The `evaluate_remainder`

method makes it very easy (simply
run `?evaluate_remainder`

to see the man page of the method):

```
evaluate_remainder(experiment, experiment))
(#> [1] 0
<- evaluate_remainder(experiment, another_experiment,
(rem list(list(), list(h = 0))))
#> [1] 0.199
```

We recover the equality \(\Rem_{P_{0}} (P_{0}) = 0\), which is fairly obvious given (4.1). In addition, we learn that \(\Rem_{P_{0}} (\Pi_{0})\) equals 0.199. In the next subsection, we invite you to make better acquaintance with the remainder term by playing around with it numerically.

## 4.2 ⚙ The remainder term

Compute numerically \(\Rem_{\Pi_0}(\Pi_h)\) for \(h \in [-1,1]\) and plot your results. What do you notice?

☡ Approximate \(\Rem_{P_{0}} (\Pi_{0})\) numerically without relying on method

`evaluate_remainder`

and compare the value you get with that of`rem`

. (Hint: use (4.2) and a large sample of observations drawn independently from \(P_{0}\).)

## 4.3 ☡ Double-robustness

### 4.3.1 The key property

Let us denote by \(\|f\|_{P}^{2}\) the square of the \(L^{2}(P)\)-norm of any
function \(f\) from \(\calO\) to \(\bbR\) *i.e.*, using a recently introduced
notation, \(\|f\|_{P}^{2} \defq Pf^{2}\). For instance, \(\|\Qbar_{1} - \Qbar_{0}\|_{P}\) or \(\|\Gbar_{1} - \Gbar_{0}\|_{P}\) is a distance separating
the features \(\Qbar_{1}\) and \(\Qbar_{0}\) or \(\Gbar_{1}\) and \(\Gbar_{0}\).

The efficient influence curve \(D^{*}(P)\) at \(P \in \calM\) enjoys a rather
remarkable property: it is *double-robust*. Specifically, for every \(P \in \calM\), the remainder term \(\Rem_{P_{0}} (P)\) satisfies

\[\begin{equation} \Rem_{P_{0}} (P)^{2} \leq \|\Qbar - \Qbar_{0}\|_{P_0}^{2} \times \|(\Gbar - \Gbar_{0})/\ell\Gbar_{0}\|_{P_0}^{2}, \tag{4.4} \end{equation}\]

where \(\Qbar\) and \(\Gbar\) are the counterparts under \(P\) to \(\Qbar_{0}\) and \(\Gbar_{0}\). The proof consists in a straightforward application of the Cauchy-Schwarz inequality to the right-hand side expression in (4.2).

### 4.3.2 Its direct consequence

It may not be clear yet why (4.4) is an important property, and
why \(D^{*}\) is said *double-robust* because of it. To answer the latter
question, let us consider a law \(P\in \calM\) such that *either* \(\Qbar = \Qbar_{0}\) *or* \(\Gbar = \Gbar_{0}\).

It is then the case that *either* \(\|\Qbar - \Qbar_{0}\|_{P} = 0\) *or*
\(\|\Gbar - \Gbar_{0}\|_{P} = 0\). Therefore, in light of (4.4), it
also holds that \(\Rem_{P_{0}} (P) = 0\).^{9} It thus appears that (4.1)
simplifies to

\[\begin{align*} \Psi(P_0) &= \Psi(P) + (P_0 - P) D^*(P)\\ &= \Psi(P) + P_0 D^*(P),\end{align*}\]

where the second equality holds because \(PD^{*}(P) = 0\) for all \(P\in \calM\) by definition of \(D^{*}(P)\).

It is now clear that for such a law \(P\in \calM\), \(\Psi(P) = \Psi(P_{0})\) is equivalent to

\[\begin{equation} P_{0} D^{*}(P) = 0. \tag{4.5} \end{equation}\]

Most importantly, in words, if \(P\) solves the so-called \(P_{0}\)-specific
efficient influence curve equation (4.5) and if, in addition,
\(P\) has the same \(\Qbar\)-feature *or* \(\Gbar\)-feature as \(P_{0}\), then
\(\Psi(P) = \Psi(P_{0})\).

The conclusion is valid no matter how \(P\) may differ from \(P_{0}\) otherwise,
hence the notion of being *double-robust*. This property is useful to build
consistent estimators of \(\Psi(P)\), as we shall see in Section
5.

## 4.4 ⚙ Double-robustness

Go back to Problem 1 in 4.2. In light of Section 4.3, what is happening?

Create a copy of

`experiment`

and replace its`Gbar`

feature with some other function of \(W\) (see`?copy`

,`?alter`

and Problem 2 in Section 3.2). Call \(P'\) the element of model \(\calM\) thus characterized. Can you guess the values of \(\Rem_{P_{0}}(P')\), \(\Psi(P')\) and \(P_{0} D^{*}(P')\)? Support your argument.