Proof of the DKW inequality

$\begingroup$

My goal is to prove the following inequality, known as the Dvoretsky-Kiefer-Wolfowitz inequality (1956) :

Let $(X_i)_{i \geqslant}$ be iid random variables. Let $\displaystyle F_n(x)= \frac{1}{n}\sum _{i=1}^n 1_{X_i \leqslant x}$ and $F$ the distribution function of $X_1$. Then there exists a constant $C>0$ such that for every $\varepsilon >0$ : $$\mathbb{P} \left( \sup_{x \in \mathbb{R}} |F_n(x)-F(x)| > \varepsilon \right) \leqslant C e^{-2n\varepsilon ^2}$$

I did not find any proof on the web (only the article of DKW of 1956 but it is not understandable to me due to their notations). The only thing I found was the proof that : $$\mathbb{E} \left( \sup_{x \in \mathbb{R}} |F_n(x)-F(x)| \right) \leqslant \frac{c}{\sqrt{n}}$$ in this paper :
(theorem 3.3)

which is named the DKW inequality.

I was not able to prove the DKW inquality from this result btu here is my try :

By the Markov inequality and for every function :

$$\mathbb{P} \left( \sup_{x \in \mathbb{R}} |F_n(x)-F(x)| > \varepsilon \right) \leqslant \frac{\mathbb{E} \left( f \left( \sup_{x \in \mathbb{R}} |F_n(x)-F(x)|\right) \right)}{f(\varepsilon)} $$. With $f(x)=e^{tx}$ and using the convexity of $f$ the Jensen inequality gives : $$\mathbb{P} \left( \sup_{x \in \mathbb{R}} |F_n(x)-F(x)| > \varepsilon \right) \leqslant \frac{e^{ctn^{-1/2}}}{e^{t \varepsilon}} $$

But that does not give the correct result.

Question. Does anyone can help me with proving the DKW inequality with the estimate $\mathbb{E} \left( \sup_{x \in \mathbb{R}} |F_n(x)-F(x)| \right) \leqslant \frac{c}{\sqrt{n}}$ for start ? Or maybe just giving me a paper with the proof in modern language.

Thank you.

$\endgroup$ 3

2 Answers

$\begingroup$

I know this question is old, but just in case anyone else comes here looking for an answer, the following pdf might be helpful:

$\endgroup$ 2 $\begingroup$

The result you want (a high probability bound) follows from the given result (an expectation bound) by McDiarmid's inequality.

Let $X_1, \cdots, X_n$ be the random variables drawn i.i.d. from the cumulative distribution function $F$. I.e., independently for each $i \in [n]$, we have $\forall x \in \mathbb{R}~~\mathbb{P}[X_i \le x] = F(x)$. Let $F_n$ denote the empirical CDF i.e. $F_n(x) = \frac1n \sum_i^n \mathbb{I}[X_i \le x]$.

Define a function $$g(X_1, \cdots, X_n) = \sup_{x \in \mathbb{R}} |F_n(x)-F(x)|.$$

The inequality you have is $$\mu := \mathbb{E}[g(X_1, \cdots, X_n)] = \mathbb{E}\left[\sup_{x \in \mathbb{R}} |F_n(x)-F(x)|\right] \le \frac{c}{\sqrt{n}}.$$

For any given $x \in \mathbb{R}$ and any $i \in [n]$, changing one $X_i$ can change $F_n(x)$ by at most $1/n$. It follows that, for any $i \in [n]$, changing one $X_i$ can only change $g(X_1,\cdots,X_n)$ by at most $1/n$. (This is known as the sensitivity of the function and the supremum doesn't increase the sensitivity.) This means McDiarmid's inequality can be applied to the function $g$. We have $$\forall \varepsilon \ge 0 ~~~~~ \mathbb{P}\left[g(X_1, \cdots, X_n) \ge \mu + \varepsilon\right] \le \exp(-2\varepsilon^2n).$$

This is essentially the form you want. We just need to set a constant $C \ge 1$ such that $\exp(-2(\varepsilon-c/\sqrt{n})^2 n) \le C \cdot \exp(-2\varepsilon^2n)$ for all $\varepsilon \ge 0$.

$\endgroup$

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like