\(\newcommand{\bbO}{\mathbb{O}}\) \(\newcommand{\bbD}{\mathbb{D}}\) \(\newcommand{\bbP}{\mathbb{P}}\) \(\newcommand{\bbR}{\mathbb{R}}\) \(\newcommand{\Algo}{\widehat{\mathcal{A}}}\) \(\newcommand{\Algora}{\widetilde{\mathcal{A}}}\) \(\newcommand{\calF}{\mathcal{F}}\) \(\newcommand{\calM}{\mathcal{M}}\) \(\newcommand{\calP}{\mathcal{P}}\) \(\newcommand{\calO}{\mathcal{O}}\) \(\newcommand{\calQ}{\mathcal{Q}}\) \(\newcommand{\defq}{\doteq}\) \(\newcommand{\Exp}{\textrm{E}}\) \(\newcommand{\IC}{\textrm{IC}}\) \(\newcommand{\Gbar}{\bar{G}}\) \(\newcommand{\one}{\textbf{1}}\) \(\newcommand{\psinos}{\psi_{n}^{\textrm{os}}}\) \(\renewcommand{\Pr}{\textrm{Pr}}\) \(\newcommand{\Phat}{P^{\circ}}\) \(\newcommand{\Psihat}{\widehat{\Psi}}\) \(\newcommand{\Qbar}{\bar{Q}}\) \(\newcommand{\tcg}[1]{\textcolor{olive}{#1}}\) \(\DeclareMathOperator{\Dirac}{Dirac}\) \(\DeclareMathOperator{\expit}{expit}\) \(\DeclareMathOperator{\logit}{logit}\) \(\DeclareMathOperator{\Rem}{Rem}\) \(\DeclareMathOperator{\Var}{Var}\)

Section 5 Inference

5.1 Where we stand

In the previous sections, we analyzed our target parameter and presented relevant theory for understanding the statistical properties of certain types of estimators of the parameter. The theory is also relevant for building and comparing a variety of estimators.

We assume from now on that we have available a sample \(O_{1}, \ldots, O_{B}\) of independent observations drawn from \(P_{0}\). This is literally the case!, and the observations are stored in obs that we created in Section 3.6.

iter <- 1e3

Equal to 1000 thousands, the sample size B is very large. We will in fact use 1000 disjoint subsamples composed of \(n\) independent observations among \(O_{1}, \ldots, O_{B}\), where \(n\) equals B/iter, i.e., 1000. We will thus be in a position to investigate the statistical properties of every estimation procedure by replicating it independently 1000 times.

5.2 Where we are going

The following sections explore different statistical paths to inferring \(\psi_{0}\) or, rather (though equivalently), \(\Psi(P_{0})\).

  • Section 6 presents a simple inference strategy. It can be carried out in situations where \(\Gbar_{0}\) is already known to the statistician.

  • Section 7 discusses the estimation of some infinite-dimensional features of \(P_{0}\). The resulting estimators are later used to estimate \(\psi_{0}\).

  • Section 8 extends the inference strategy discussed in Section 6 to the case that \(\Gbar_{0}\) is not known to the statistician but estimated by her. It also presents another inference strategy that relies upon the estimation of \(\Qbar_{0}\). A theoretical analysis reveals that both strategies, called the inverse probability of treatment weighted and G-computation estimation methodologies, suffer from an inherent flaw.

  • Section 9 builds upon the aforementioned analysis and develops a methodological workaround to circumvent the problem revealed by the analysis. It appears indeed that the flawed estimators can be corrected. However, the so-called one-step correction comes at a price that may be high in small samples.

  • Section 10 also builds on the aforementioned analysis but draws a radically different conclusion from it. Instead of trying to circumvent the problem by correcting the flawed estimators of \(\psi_{0}\), it does so by correcting the estimators of the infinite-dimensional features of \(P_{0}\) combined to estimate \(\psi_{0}\). The section thus presents an instantiation of the general targeted minimum loss estimation procedure tailored to the estimation of \(\psi_{0}\). It is the main destination of this ride in targeted learning territory, far from its outposts yet well into this exciting territory.