My goal is to prove the following inequality, known as the Dvoretsky-Kiefer-Wolfowitz inequality (1956) :
Let $(X_i)_{i \geqslant}$ be iid random variables. Let $\displaystyle F_n(x)= \frac{1}{n}\sum _{i=1}^n 1_{X_i \leqslant x}$ and $F$ the distribution function of $X_1$. Then there exists a constant $C>0$ such that for every $\varepsilon >0$ : $$\mathbb{P} \left( \sup_{x \in \mathbb{R}} |F_n(x)-F(x)| > \varepsilon \right) \leqslant C e^{-2n\varepsilon ^2}$$
I did not find any proof on the web (only the article of DKW of 1956 but it is not understandable to me due to their notations). The only thing I found was the proof that : $$\mathbb{E} \left( \sup_{x \in \mathbb{R}} |F_n(x)-F(x)| \right) \leqslant \frac{c}{\sqrt{n}}$$ in this paper :
(theorem 3.3)
which is named the DKW inequality.
I was not able to prove the DKW inquality from this result btu here is my try :
By the Markov inequality and for every function :
$$\mathbb{P} \left( \sup_{x \in \mathbb{R}} |F_n(x)-F(x)| > \varepsilon \right) \leqslant \frac{\mathbb{E} \left( f \left( \sup_{x \in \mathbb{R}} |F_n(x)-F(x)|\right) \right)}{f(\varepsilon)} $$. With $f(x)=e^{tx}$ and using the convexity of $f$ the Jensen inequality gives : $$\mathbb{P} \left( \sup_{x \in \mathbb{R}} |F_n(x)-F(x)| > \varepsilon \right) \leqslant \frac{e^{ctn^{-1/2}}}{e^{t \varepsilon}} $$
But that does not give the correct result.
Question. Does anyone can help me with proving the DKW inequality with the estimate $\mathbb{E} \left( \sup_{x \in \mathbb{R}} |F_n(x)-F(x)| \right) \leqslant \frac{c}{\sqrt{n}}$ for start ? Or maybe just giving me a paper with the proof in modern language.
Thank you.
$\endgroup$ 32 Answers
$\begingroup$I know this question is old, but just in case anyone else comes here looking for an answer, the following pdf might be helpful:
$\endgroup$ 2 $\begingroup$The result you want (a high probability bound) follows from the given result (an expectation bound) by McDiarmid's inequality.
Let $X_1, \cdots, X_n$ be the random variables drawn i.i.d. from the cumulative distribution function $F$. I.e., independently for each $i \in [n]$, we have $\forall x \in \mathbb{R}~~\mathbb{P}[X_i \le x] = F(x)$. Let $F_n$ denote the empirical CDF i.e. $F_n(x) = \frac1n \sum_i^n \mathbb{I}[X_i \le x]$.
Define a function $$g(X_1, \cdots, X_n) = \sup_{x \in \mathbb{R}} |F_n(x)-F(x)|.$$
The inequality you have is $$\mu := \mathbb{E}[g(X_1, \cdots, X_n)] = \mathbb{E}\left[\sup_{x \in \mathbb{R}} |F_n(x)-F(x)|\right] \le \frac{c}{\sqrt{n}}.$$
For any given $x \in \mathbb{R}$ and any $i \in [n]$, changing one $X_i$ can change $F_n(x)$ by at most $1/n$. It follows that, for any $i \in [n]$, changing one $X_i$ can only change $g(X_1,\cdots,X_n)$ by at most $1/n$. (This is known as the sensitivity of the function and the supremum doesn't increase the sensitivity.) This means McDiarmid's inequality can be applied to the function $g$. We have $$\forall \varepsilon \ge 0 ~~~~~ \mathbb{P}\left[g(X_1, \cdots, X_n) \ge \mu + \varepsilon\right] \le \exp(-2\varepsilon^2n).$$
This is essentially the form you want. We just need to set a constant $C \ge 1$ such that $\exp(-2(\varepsilon-c/\sqrt{n})^2 n) \le C \cdot \exp(-2\varepsilon^2n)$ for all $\varepsilon \ge 0$.
$\endgroup$