# B Basic results and their proofs

## B.1 NPSEM

The experiment can also be summarized by a *nonparametric system of structural
equations*: for some deterministic functions \(f_w\), \(f_a\), \(f_y\) and
independent sources of randomness \(U_w\), \(U_a\), \(U_y\),

sample the context where the counterfactual rewards will be generated, the action will be undertaken and the actual reward will be obtained, \(W = f_{w}(U_w)\);

sample the two counterfactual rewards of the two actions that can be undertaken, \(Y_{0} = f_{y}(0, W, U_y)\) and \(Y_{1} = f_{y}(1, W, U_y)\);

sample which action is carried out in the given context, \(A = f_{a} (W, U_a)\);

define the corresponding reward, \(Y = A Y_{1} + (1-A) Y_{0}\);

summarize the course of the experiment with the observation \(O = (W, A, Y)\), thus concealing \(Y_{0}\) and \(Y_{1}\).

## B.2 Identification

Let \(\bbP_{0}\) be an experiment that generates \(\bbO \defq (W, Y_{0}, Y_{1}, A, Y)\). We think of \(W\) as the context where an action is undertaken, of \(Y_{0}\) and \(Y_{1}\) as the counterfactual (potential) rewards that actions \(a=0\) and \(a=1\) would entail, of \(A\) as the action carried out, and of \(Y\) as the reward received in response to action \(A\). Consider the following assumptions:

**Randomization**: under \(\bbP_{0}\), the counterfactual rewards \(Y_0\), \(Y_1\) and action \(A\) are conditionally independent given \(W\),*i.e.*, \(Y_a \perp A \mid W\) for \(a=0,1\).**Consistency**: under \(\bbP_{0}\), if action \(A\) is undertaken then reward \(Y_{A}\) is received,*i.e.*, \(Y = Y_{A}\) (or \(Y=Y_{a}\) given that \(A=a\)).**Positivity**: under \(\bbP_{0}\), both actions \(a=0\) and \(a=1\) have (\(\bbP_{0}\)-almost surely) a positive probability to be undertaken given \(W\),*i.e.*, \(\Pr_{\bbP_0}(\ell\Gbar_0(a,W) > 0) = 1\) for \(a=0,1\).

**Proposition B.1 (Identification) **Under the above assumptions, it holds that \[\begin{equation*} \psi_{0} =
\Exp_{\bbP_{0}} \left(Y_{1} - Y_{0}\right) = \Exp_{\bbP_{0}}(Y_1) -
\Exp_{\bbP_{0}}(Y_0). \end{equation*}\]

*Proof*. Set arbitrarily \(a \in \{0,1\}\). By the randomization assumption on the one
hand (second equality) and by the consistency and positivity assumptions on
the other hand (third equality), it holds that \[\begin{align*}
\Exp_{\bbP_0}(Y_a) &= \int \Exp_{\bbP_0}(Y_a \mid W = w) dQ_{0,W}(w) = \int
\Exp_{\bbP_0}(Y_a \mid A = a, W = w) dQ_{0,W}(w) \\ &= \int \Exp_{P_0}(Y \mid
A = a, W = w) dQ_{0,W}(w) = \int \Qbar_0(a,W) dQ_{0,W}(w). \end{align*}\] The
stated result easily follows.

**Remark.** The positivity assumption is needed for \(\Exp_{P_0}(Y \mid A = a, W) \defq \Qbar_{0}(a,W)\) to be well-defined.

## B.3 Building a confidence interval

Let \(\Phi\) be the standard normal distribution function. Let \(X_{1}\), \(\ldots\), \(X_{n}\) be independently drawn from a given law.

### B.3.1 CLT & Slutsky’s lemma

Assume that \(\sigma^{2} \defq \Var(X_{1})\) is finite. Let \(m \defq \Exp(X_{1})\) be the mean of \(X_{1}\) and \(\bar{X}_{n} \defq n^{-1} \sum_{i=1}^{n} X_{i}\) be the empirical mean. By the central limit theorem (CLT), it holds that \(\sqrt{n} (\bar{X}_{n} - m)\) converges in law as \(n\) grows to the centered Gaussian law with variance \(\sigma^{2}\).

Moreover, if \(\sigma_{n}^{2}\) is a (positive) consistent estimator of \(\sigma^{2}\) then, by Slutsky’s lemma, \(\sqrt{n}/\sigma_{n} (\bar{X}_{n} - m)\) converges in law to the standard normal law. The empirical variance \(n^{-1} \sum_{i=1}^{n}(X_{i} - \bar{X}_{n})^{2}\) is such an estimator.

**Proposition B.2 **Under the above assumptions, \[\begin{equation*} \left[\bar{X}_{n} \pm
\Phi^{-1}(1-\alpha) \frac{\sigma_{n}}{\sqrt{n}}\right] \end{equation*}\] is a
confidence interval for \(m\) with asymptotic level \((1-2\alpha)\).

### B.3.2 CLT and order statistics

Suppose that the law of \(X_{1}\) admits a continuous distribution function \(F\). Set \(p \in ]0,1[\) and, assuming that \(n\) is large, find \(k\geq 1\) and \(l \geq 1\) such that \[\begin{equation*} \frac{k}{n} \approx p - \Phi^{-1}(1-\alpha) \sqrt{\frac{p(1-p)}{n}} \end{equation*}\] and \[\begin{equation*} \frac{l}{n} \approx p + \Phi^{-1}(1-\alpha) \sqrt{\frac{p(1-p)}{n}}. \end{equation*}\]

**Proposition B.3 **Under the above assumptions, \([X_{(k)},X_{(l)}]\) is a confidence interval for
\(F^{-1}(p)\) with asymptotic level \(1 - 2\alpha\).

## B.4 Another representation of the parameter of interest

For notational simplicitiy, note that \((2a-1)\) equals 1 if \(a=1\) and \(-1\) if
\(a=0\). Now, for each \(a = 0,1\), \[\begin{align*}
\Exp_{P_{0}}\left(\frac{\one\{A = a\}Y}{\ell\Gbar_{0}(a,W)}\right) &=
\Exp_{P_{0}}\left(\Exp_{P_{0}}\left(\frac{\one\{A = a\}Y}{\ell\Gbar_{0}(a,W)}
\middle| A, W \right) \right) \\ &= \Exp_{P_{0}}\left(\frac{\one\{A =
a\}}{\ell\Gbar_{0}(a,W)} \Qbar_{0}(A, W) \right) \\ &=
\Exp_{P_{0}}\left(\frac{\one\{A = a\}}{\ell\Gbar_{0}(a,W)} \Qbar_{0}(a,
W)\right) \\ &= \Exp_{P_{0}}\left(\Exp_{P_{0}}\left(\frac{\one\{A =
a\}}{\ell\Gbar_{0}(a,W)} \Qbar_{0}(a, W) \middle| W \right) \right) \\& =
\Exp_{P_{0}}\left(\frac{\ell\Gbar_{0}(a,W)}{\ell\Gbar_{0}(a,W)} \Qbar_{0}(a,
W) \middle| W \right) \\& = \Exp_{P_{0}} \left( \Qbar_{0}(a, W) \right),
\end{align*}\] where the first, fourth and sixth equalities follow from the
tower rule^{25}, and
the second and fifth hold by definition of the conditional expectation. This
completes the proof.

## B.5 The delta-method

Let \(f\) be a map from \(\Theta \subset \bbR^{p}\) to \(\bbR^{q}\) that is differentiable at \(\theta\in \Theta\). Let \(X_{n}\) be a random vector taking its values in \(\Theta\).

**Proposition B.4 **If \(\sqrt{n} (X_{n} - \theta)\) converges in law to the Gaussian law with mean
\(\mu\) and covariance matrix \(\Sigma\), then \(\sqrt{n} (f(X_{n}) - f(\theta))\)
converge in law to the Gaussian law with mean \(\nabla f(\theta) \times \mu\)
and covariance matrix \(\nabla f(\theta) \times \Sigma \times \nabla f(\theta)^{\top}\). In addition, if \(\Sigma_{n}\) estimates \(\Sigma\)
consistently then, by Slutsky’s lemma, the asymptotic variance of \(\sqrt{n} (f(X_{n}) - f(\theta))\) is consistently estimated with \(\nabla f(X_{n}) \times \Sigma_{n} \times \nabla f(X_{n})^{\top}\).

## B.6 The oracle logistic risk

First, let us recall the definition of the Kullback-Leibler divergence between Bernoulli laws of parameters \(p,q\in]0,1[\): \[\begin{equation*}\text{KL}(p,q) \defq p \log\left(\frac{p}{q}\right) + (1-p) \log \left(\frac{1-p}{1-q}\right).\end{equation*}\] It satisfies \(\text{KL}(p,q) \geq 0\) where the equality holds if and only if \(p=q\).

Let \(f:[0,1] \times \{0,1\} \times [0,1] \to [0,1]\) be a (measurable) function. Applying the tower rule shows that the oracle logistic risk satisfies \[\begin{align} \Exp_{P_{0}} \left(L_{y} (f)(O)\right)&=\Exp_{P_{0}} \left(-\Qbar_{0}(A,W) \log f(A,W) - \left(1 - \Qbar_{0} (A,W)\right) \log \left(1 - f(A,W)\right)\right)\notag\\&=\Exp_{P_{0}} \left(\text{KL}\left(\Qbar_{0}(A,W), f(A,W)\right)\right) + \text{constant},\tag{B.1} \end{align}\] where the above constant equals \[\begin{equation*} -\Exp_{P_{0}}\left(\Qbar_{0}(A,W) \log \Qbar_{0}(A,W) - \left(1 - \Qbar_{0} (A,W)\right) \log \left(1 - \Qbar_{0,W}(A,W)\right)\right). \end{equation*}\]

In light of (B.1), \(\Qbar_{0}\) minimizes \(f \mapsto \Exp_{P_{0}} \left(L_{y} (f)(O)\right)\) over the set of (measurable) functions mapping \([0,1] \times \{0,1\} \times [0,1]\) to \([0,1]\). Moreover, as an average of measures of discrepancy, \(\Exp_{P_{0}} \left(L_{y} (f)(O)\right)\) is also a measure of discrepancy.

For any random variable \((U,V)\) such that \(\Exp(U|V)\) and \(\Exp(U)\) are well defined, it holds that \(\Exp(\Exp(U|V)) = \Exp(U)\).↩︎