\(\newcommand{\bbO}{\mathbb{O}}\) \(\newcommand{\bbD}{\mathbb{D}}\) \(\newcommand{\bbP}{\mathbb{P}}\) \(\newcommand{\bbR}{\mathbb{R}}\) \(\newcommand{\Algo}{\widehat{\mathcal{A}}}\) \(\newcommand{\Algora}{\widetilde{\mathcal{A}}}\) \(\newcommand{\calF}{\mathcal{F}}\) \(\newcommand{\calM}{\mathcal{M}}\) \(\newcommand{\calP}{\mathcal{P}}\) \(\newcommand{\calO}{\mathcal{O}}\) \(\newcommand{\calQ}{\mathcal{Q}}\) \(\newcommand{\defq}{\doteq}\) \(\newcommand{\Exp}{\textrm{E}}\) \(\newcommand{\IC}{\textrm{IC}}\) \(\newcommand{\Gbar}{\bar{G}}\) \(\newcommand{\one}{\textbf{1}}\) \(\newcommand{\psinos}{\psi_{n}^{\textrm{os}}}\) \(\renewcommand{\Pr}{\textrm{Pr}}\) \(\newcommand{\Phat}{P^{\circ}}\) \(\newcommand{\Psihat}{\widehat{\Psi}}\) \(\newcommand{\Qbar}{\bar{Q}}\) \(\newcommand{\tcg}[1]{\textcolor{olive}{#1}}\) \(\DeclareMathOperator{\Dirac}{Dirac}\) \(\DeclareMathOperator{\expit}{expit}\) \(\DeclareMathOperator{\logit}{logit}\) \(\DeclareMathOperator{\Rem}{Rem}\) \(\DeclareMathOperator{\Var}{Var}\)

Section 4 Double-robustness

4.1 Linear approximations of parameters

4.1.1 From gradients to estimators

We learned in Section 3 that the stochastic behavior of a regular, asymptotically linear estimator of \(\Psi(P)\) can be characterized by its influence curve. Moreover, we said that this influence curve must in fact be a gradient of \(\Psi\) at \(P\).

In this section, we show that the converse is also true: given a gradient \(D^*\) of \(\Psi\) at \(P\), under so-called regularity conditions, it is possible to construct an estimator with influence curve equal to \(D^*(P)\). This fact will suggest concrete strategies for generating efficient estimators of smooth parameters. We take here the first step towards generating such estimators: linearizing the parameter.

4.1.2 A Euclidean perspective

As in Section 3.3.3, drawing a parallel to Euclidean geometry is helpful. We recall that if \(f\) is a differentiable mapping from \(\bbR^p\) to \(\bbR\), then a Taylor series approximates \(f\) at a point \(x_0 \in \bbR^p\): \[\begin{equation*} f(x_0) \approx f(x) + \langle(x_0 - x), \nabla f(x)\rangle,\end{equation*}\] where \(x\) is a point in \(\bbR^p\), \(\nabla f(x)\) is the gradient of \(f\) evaluated at \(x\) and \(\langle u,v\rangle\) is the scalar product of \(u,v \in \bbR^{p}\). As the squared distance \(\|x-x_{0}\|^{2} = \langle x-x_{0}, x-x_{0}\rangle\) between \(x\) and \(x_0\) decreases, the linear approximation to \(f(x_0)\) becomes more accurate.

4.1.3 The remainder term

Returning to the present problem with this in mind, we find that indeed a similar approximation strategy may be applied.

For clarity, let us introduce a new shorthand notation. For any measurable function \(f\) of the observed data \(O\), we may write from now on \(P f \defq \Exp_P(f(O))\). One may argue that the notation is valuable beyond the gain of space. For instance, (3.8)

\[\begin{equation*} \sqrt{n} (\psi_n - \Psi(P)) = \frac{1}{\sqrt{n}} \sum_{i=1}^n \IC(O_i) + o_P(1) \end{equation*}\]

can be rewritten as

\[\begin{equation*} \sqrt{n} (\psi_n - \Psi(P)) = \sqrt{n} (P_{n} - P) \IC + o_P(1), \end{equation*}\]

thus suggesting more clearly the importance of the so-called empirical process \(\sqrt{n} (P_{n} - P)\).

In particular, if \(\Psi\) is smooth uniformly over directions, then for any given \(P \in \calM\), we can write

\[\begin{equation} \Psi(P_0) = \Psi(P) + (P_0 - P) D^*(P) - \Rem_{P_0}(P), \tag{4.1} \end{equation}\]

where \(\Rem_{P_0}(P)\) (defined implicitly by (4.1) – see (4.2)) is a remainder term satisfying that \[\begin{equation*} \frac{\Rem_{P_0}(P)}{d(P, P_0)} \rightarrow 0 \ \mbox{as} \ d(P, P_0) \rightarrow 0 , \end{equation*}\] with \(d\) a measure of discrepancy for distributions in \(\calM\). Note that (4.1) can be equivalently written as \[\begin{equation*} \Psi(P_0) = \Psi(P) + \Exp_{P_0}(D^*(P)(O)) - \Exp_P(D^*(P)(O)) - \Rem_{P_0}(P). \end{equation*}\] The remainder term formalizes the notion that if \(P\) is close to \(P_0\) (i.e., if \(d(P,P_0)\) is small), then the linear approximation of \(\Psi(P_0)\) is more accurate. In light of the Euclidean perspective of Section 4.1.2, the remainder term \(\Rem_{P_0}(P)\) plays the role of the squared distance \(\|x-x_0\|^{2}\).

4.1.4 Expressing the remainder term as a function of the relevant features

The equations for the definition of the parameter (2.6), form of the canonical gradient (3.4), and linearization of parameter (4.1) combine to determine the remainder:

\[\begin{equation} \Rem_{P_0}(P) \defq \Psi(P) - \Psi(P_0) - (P_0 - P)D^*(P) \tag{4.2} \end{equation}\]

hence

\[\begin{multline} \Rem_{P_0}(P)= \Exp_{P_0} \Bigg[ \left(\Gbar_0(W) - \Gbar(W)\right) \\ \times \left(\frac{\Qbar_0(1,W) - \Qbar(1,W)}{\ell\Gbar(1,W)} + \frac{\Qbar_0(0,W) - \Qbar(0,W)}{\ell\Gbar(0,W)} \right) \Bigg]. \tag{4.3} \end{multline}\]

Acting as oracles, we can compute explicitly the remainder term \(\Rem_{P_0}(P)\). The evaluate_remainder method makes it very easy (simply run ?evaluate_remainder to see the man page of the method):

(evaluate_remainder(experiment, experiment))
#> [1] 0
(rem <- evaluate_remainder(experiment, another_experiment,
                           list(list(), list(h = 0))))
#> [1] 0.199

We recover the equality \(\Rem_{P_{0}} (P_{0}) = 0\), which is fairly obvious given (4.1). In addition, we learn that \(\Rem_{P_{0}} (\Pi_{0})\) equals 0.199. In the next subsection, we invite you to make better acquaintance with the remainder term by playing around with it numerically.

4.2 ⚙ The remainder term

Compute numerically \(\Rem_{\Pi_0}(\Pi_h)\) for \(h \in [-1,1]\) and plot your results. What do you notice?
☡ Approximate \(\Rem_{P_{0}} (\Pi_{0})\) numerically without relying on method evaluate_remainder and compare the value you get with that of rem. (Hint: use (4.2) and a large sample of observations drawn independently from \(P_{0}\).)

4.3 ☡ Double-robustness

4.3.1 The key property

Let us denote by \(\|f\|_{P}^{2}\) the square of the \(L^{2}(P)\)-norm of any function \(f\) from \(\calO\) to \(\bbR\) i.e., using a recently introduced notation, \(\|f\|_{P}^{2} \defq Pf^{2}\). For instance, \(\|\Qbar_{1} - \Qbar_{0}\|_{P}\) or \(\|\Gbar_{1} - \Gbar_{0}\|_{P}\) is a distance separating the features \(\Qbar_{1}\) and \(\Qbar_{0}\) or \(\Gbar_{1}\) and \(\Gbar_{0}\).

The efficient influence curve \(D^{*}(P)\) at \(P \in \calM\) enjoys a rather remarkable property: it is double-robust. Specifically, for every \(P \in \calM\), the remainder term \(\Rem_{P_{0}} (P)\) satisfies

\[\begin{equation} \Rem_{P_{0}} (P)^{2} \leq \|\Qbar - \Qbar_{0}\|_{P_0}^{2} \times \|(\Gbar - \Gbar_{0})/\ell\Gbar_{0}\|_{P_0}^{2}, \tag{4.4} \end{equation}\]

where \(\Qbar\) and \(\Gbar\) are the counterparts under \(P\) to \(\Qbar_{0}\) and \(\Gbar_{0}\). The proof consists in a straightforward application of the Cauchy-Schwarz inequality to the right-hand side expression in (4.2).

4.3.2 Its direct consequence

It may not be clear yet why (4.4) is an important property, and why \(D^{*}\) is said double-robust because of it. To answer the latter question, let us consider a law \(P\in \calM\) such that either \(\Qbar = \Qbar_{0}\) or \(\Gbar = \Gbar_{0}\).

It is then the case that either \(\|\Qbar - \Qbar_{0}\|_{P} = 0\) or \(\|\Gbar - \Gbar_{0}\|_{P} = 0\). Therefore, in light of (4.4), it also holds that \(\Rem_{P_{0}} (P) = 0\).⁹ It thus appears that (4.1) simplifies to

\[\begin{align*} \Psi(P_0) &= \Psi(P) + (P_0 - P) D^*(P)\\ &= \Psi(P) + P_0 D^*(P),\end{align*}\]

where the second equality holds because \(PD^{*}(P) = 0\) for all \(P\in \calM\) by definition of \(D^{*}(P)\).

It is now clear that for such a law \(P\in \calM\), \(\Psi(P) = \Psi(P_{0})\) is equivalent to

\[\begin{equation} P_{0} D^{*}(P) = 0. \tag{4.5} \end{equation}\]

Most importantly, in words, if \(P\) solves the so-called \(P_{0}\)-specific efficient influence curve equation (4.5) and if, in addition, \(P\) has the same \(\Qbar\)-feature or \(\Gbar\)-feature as \(P_{0}\), then \(\Psi(P) = \Psi(P_{0})\).

The conclusion is valid no matter how \(P\) may differ from \(P_{0}\) otherwise, hence the notion of being double-robust. This property is useful to build consistent estimators of \(\Psi(P)\), as we shall see in Section 5.

4.4 ⚙ Double-robustness

Go back to Problem 1 in 4.2. In light of Section 4.3, what is happening?
Create a copy of experiment and replace its Gbar feature with some other function of \(W\) (see ?copy, ?alter and Problem 2 in Section 3.2). Call \(P'\) the element of model \(\calM\) thus characterized. Can you guess the values of \(\Rem_{P_{0}}(P')\), \(\Psi(P')\) and \(P_{0} D^{*}(P')\)? Support your argument.

This also trivially follows from (4.3).↩︎