Preptide

Limit Theorems

Interview prep for law of large numbers, central limit theorem, and concentration inequalities

Fundamental Concepts

Convergence of Random Variables

There are several modes of convergence for sequences of random variables:

1. Convergence in Distribution (Weak Convergence)

XndXX_n \xrightarrow{d} X if:

limnFXn(x)=FX(x)\lim_{n \to \infty} F_{X_n}(x) = F_X(x)

for all xx where FXF_X is continuous.

Notation: Also written as XnXX_n \Rightarrow X or XnLXX_n \xrightarrow{\mathcal{L}} X.

2. Convergence in Probability

XnPXX_n \xrightarrow{P} X if for all ϵ>0\epsilon > 0:

limnP(XnX>ϵ)=0\lim_{n \to \infty} P(|X_n - X| > \epsilon) = 0

Interpretation: The probability that XnX_n is far from XX goes to zero.

3. Convergence Almost Surely (Strong Convergence)

Xna.s.XX_n \xrightarrow{a.s.} X if:

P(limnXn=X)=1P\left(\lim_{n \to \infty} X_n = X\right) = 1

Interpretation: XnX_n converges to XX along almost every sample path.

4. Convergence in LpL^p (Mean)

XnLpXX_n \xrightarrow{L^p} X if:

limnE[XnXp]=0\lim_{n \to \infty} E[|X_n - X|^p] = 0

Special cases:

  • p=1p = 1: Convergence in mean
  • p=2p = 2: Convergence in mean square

Hierarchy of Convergence

Relationships (stronger → weaker):

a.s.in probabilityin distribution\text{a.s.} \Rightarrow \text{in probability} \Rightarrow \text{in distribution}

Lp convergencein probabilityL^p \text{ convergence} \Rightarrow \text{in probability}

Key facts:

  • Almost sure convergence is strongest
  • Convergence in distribution is weakest
  • LpL^p convergence for p>qp > q implies LqL^q convergence
  • Convergence in distribution to a constant implies convergence in probability to that constant

Law of Large Numbers (LLN)

Weak Law of Large Numbers (WLLN)

Let X1,X2,...X_1, X_2, ... be i.i.d. random variables with E[Xi]=μE[X_i] = \mu and Var(Xi)=σ2<\text{Var}(X_i) = \sigma^2 < \infty.

Define the sample mean:

Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i

WLLN: XˉnPμ\bar{X}_n \xrightarrow{P} \mu

Interpretation: For large nn, the sample mean is close to the population mean with high probability.

Formal statement: For all ϵ>0\epsilon > 0:

limnP(Xˉnμ>ϵ)=0\lim_{n \to \infty} P(|\bar{X}_n - \mu| > \epsilon) = 0

Proof technique: Use Chebyshev's inequality:

P(Xˉnμ>ϵ)Var(Xˉn)ϵ2=σ2nϵ20P(|\bar{X}_n - \mu| > \epsilon) \leq \frac{\text{Var}(\bar{X}_n)}{\epsilon^2} = \frac{\sigma^2}{n\epsilon^2} \to 0

Strong Law of Large Numbers (SLLN)

Under the same conditions as WLLN:

SLLN: Xˉna.s.μ\bar{X}_n \xrightarrow{a.s.} \mu

Interpretation: The sample mean converges to the population mean along almost every sample path (not just in probability).

Formal statement:

P(limnXˉn=μ)=1P\left(\lim_{n \to \infty} \bar{X}_n = \mu\right) = 1

Comparison with WLLN:

  • SLLN is stronger than WLLN
  • SLLN requires only E[Xi]<E[|X_i|] < \infty (finite mean), not finite variance
  • SLLN guarantees convergence along almost every realization

Applications of LLN

Monte Carlo simulation: Sample mean approximates expected value

Frequency interpretation: Relative frequency converges to probability

Casino/insurance: Average payout converges to expected payout (law of averages)

Polling: Sample proportion converges to population proportion

Central Limit Theorem (CLT)

Classical Central Limit Theorem

Let X1,X2,...X_1, X_2, ... be i.i.d. random variables with E[Xi]=μE[X_i] = \mu and Var(Xi)=σ2<\text{Var}(X_i) = \sigma^2 < \infty.

Define:

Sn=i=1nXi,Xˉn=SnnS_n = \sum_{i=1}^n X_i, \quad \bar{X}_n = \frac{S_n}{n}

CLT: The standardized sum converges in distribution to standard normal:

Snnμσn=Xˉnμσ/ndN(0,1)\frac{S_n - n\mu}{\sigma\sqrt{n}} = \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1)

Equivalently:

SndN(nμ,nσ2)S_n \xrightarrow{d} \mathcal{N}(n\mu, n\sigma^2)

XˉndN(μ,σ2n)\bar{X}_n \xrightarrow{d} \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right)

Interpretation: For large nn, the distribution of Xˉn\bar{X}_n is approximately normal, regardless of the distribution of XiX_i.

Practical form: For large nn:

P(Xˉnμσ/nz)Φ(z)P\left(\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \leq z\right) \approx \Phi(z)

where Φ\Phi is the standard normal CDF.

Berry-Esseen Theorem (Rate of Convergence)

Provides bounds on how fast the CLT convergence occurs.

If E[Xi3]=ρ<E[|X_i|^3] = \rho < \infty:

supxP(Snnμσnx)Φ(x)Cρσ3n\sup_x \left|P\left(\frac{S_n - n\mu}{\sigma\sqrt{n}} \leq x\right) - \Phi(x)\right| \leq \frac{C\rho}{\sigma^3\sqrt{n}}

where C0.4748C \approx 0.4748 is a constant.

Interpretation: Error in normal approximation is O(1/n)O(1/\sqrt{n}).

Lindeberg-Lévy CLT (Independent, Not Identical)

For independent (but not necessarily identically distributed) random variables X1,...,XnX_1, ..., X_n with E[Xi]=μiE[X_i] = \mu_i and Var(Xi)=σi2\text{Var}(X_i) = \sigma_i^2:

If certain conditions hold (Lindeberg condition), then:

i=1nXii=1nμii=1nσi2dN(0,1)\frac{\sum_{i=1}^n X_i - \sum_{i=1}^n \mu_i}{\sqrt{\sum_{i=1}^n \sigma_i^2}} \xrightarrow{d} \mathcal{N}(0, 1)

Lyapunov CLT

A sufficient condition for CLT to hold for independent (not identical) RVs.

If for some δ>0\delta > 0:

limni=1nE[Xiμi2+δ](i=1nσi2)1+δ/2=0\lim_{n \to \infty} \frac{\sum_{i=1}^n E[|X_i - \mu_i|^{2+\delta}]}{(\sum_{i=1}^n \sigma_i^2)^{1 + \delta/2}} = 0

then the CLT holds.

Multivariate CLT

Let X1,X2,...\mathbf{X}_1, \mathbf{X}_2, ... be i.i.d. random vectors with E[Xi]=μE[\mathbf{X}_i] = \boldsymbol{\mu} and Cov(Xi)=Σ\text{Cov}(\mathbf{X}_i) = \Sigma.

Multivariate CLT:

n(Xˉnμ)dN(0,Σ)\sqrt{n}(\bar{\mathbf{X}}_n - \boldsymbol{\mu}) \xrightarrow{d} \mathcal{N}(\mathbf{0}, \Sigma)

where Xˉn=1ni=1nXi\bar{\mathbf{X}}_n = \frac{1}{n}\sum_{i=1}^n \mathbf{X}_i.

Delta Method

Problem: Find limiting distribution of g(Xˉn)g(\bar{X}_n) where gg is a smooth function.

Delta Method: If n(Xˉnμ)dN(0,σ2)\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \sigma^2) and g(μ)0g'(\mu) \neq 0:

n(g(Xˉn)g(μ))dN(0,σ2[g(μ)]2)\sqrt{n}(g(\bar{X}_n) - g(\mu)) \xrightarrow{d} \mathcal{N}(0, \sigma^2 [g'(\mu)]^2)

Multivariate Delta Method: If n(Xˉnμ)dN(0,Σ)\sqrt{n}(\bar{\mathbf{X}}_n - \boldsymbol{\mu}) \xrightarrow{d} \mathcal{N}(\mathbf{0}, \Sigma):

n(g(Xˉn)g(μ))dN(0,g(μ)TΣg(μ))\sqrt{n}(g(\bar{\mathbf{X}}_n) - g(\boldsymbol{\mu})) \xrightarrow{d} \mathcal{N}\left(0, \nabla g(\boldsymbol{\mu})^T \Sigma \nabla g(\boldsymbol{\mu})\right)

When to use: Finding asymptotic distributions of statistics like sample variance, correlation, etc.

Continuous Mapping Theorem

If XndXX_n \xrightarrow{d} X and gg is continuous, then:

g(Xn)dg(X)g(X_n) \xrightarrow{d} g(X)

Use: Transforming convergent sequences while preserving convergence in distribution.

Slutsky's Theorem

If XndXX_n \xrightarrow{d} X and YnPcY_n \xrightarrow{P} c (constant), then:

  1. Xn+YndX+cX_n + Y_n \xrightarrow{d} X + c
  2. XnYndcXX_n Y_n \xrightarrow{d} cX
  3. Xn/YndX/cX_n / Y_n \xrightarrow{d} X/c (if c0c \neq 0)

Use: Combining convergent sequences, working with estimated variances.

Concentration Inequalities

Concentration inequalities provide bounds on how much a random variable deviates from its mean.

Markov's Inequality

For non-negative random variable XX and a>0a > 0:

P(Xa)E[X]aP(X \geq a) \leq \frac{E[X]}{a}

Proof:

E[X]=0xf(x)dxaxf(x)dxaaf(x)dx=aP(Xa)E[X] = \int_0^\infty x f(x) dx \geq \int_a^\infty x f(x) dx \geq a \int_a^\infty f(x) dx = a \cdot P(X \geq a)

When to use: Only know the mean, want rough bound.

Limitations: Often very loose; only uses first moment.

Generalization: For any non-negative function ϕ\phi and ϕ(X)0\phi(X) \geq 0:

P(Xa)=P(ϕ(X)ϕ(a))E[ϕ(X)]ϕ(a)P(X \geq a) = P(\phi(X) \geq \phi(a)) \leq \frac{E[\phi(X)]}{\phi(a)}

Chebyshev's Inequality

For random variable XX with mean μ\mu and variance σ2\sigma^2, for any k>0k > 0:

P(Xμk)σ2k2P(|X - \mu| \geq k) \leq \frac{\sigma^2}{k^2}

Alternative form: For any t>0t > 0:

P(Xμtσ)1t2P(|X - \mu| \geq t\sigma) \leq \frac{1}{t^2}

Proof: Apply Markov's inequality to (Xμ)2(X - \mu)^2:

P(Xμk)=P((Xμ)2k2)E[(Xμ)2]k2=σ2k2P(|X - \mu| \geq k) = P((X-\mu)^2 \geq k^2) \leq \frac{E[(X-\mu)^2]}{k^2} = \frac{\sigma^2}{k^2}

When to use: Know mean and variance, tighter than Markov.

Key insight: Uses second moment (variance) to get better bound than Markov.

Chernoff Bound (General Method)

For any random variable XX and aRa \in \mathbb{R}:

P(Xa)inft>0E[etX]eta=inft>0etaMX(t)P(X \geq a) \leq \inf_{t > 0} \frac{E[e^{tX}]}{e^{ta}} = \inf_{t > 0} e^{-ta} M_X(t)

where MX(t)=E[etX]M_X(t) = E[e^{tX}] is the MGF.

Proof: For t>0t > 0:

P(Xa)=P(etXeta)E[etX]etaP(X \geq a) = P(e^{tX} \geq e^{ta}) \leq \frac{E[e^{tX}]}{e^{ta}}

Then optimize over tt.

When to use: MGF is available, want exponentially decaying bound.

Key advantage: Provides exponential tail bounds (much stronger than polynomial bounds).

Chernoff-Hoeffding Bound

For i.i.d. Bernoulli random variables X1,...,XnX_1, ..., X_n with P(Xi=1)=pP(X_i = 1) = p, let Sn=i=1nXiS_n = \sum_{i=1}^n X_i.

Upper tail: For δ>0\delta > 0:

P(Sn(1+δ)np)(eδ(1+δ)1+δ)npP(S_n \geq (1+\delta)np) \leq \left(\frac{e^\delta}{(1+\delta)^{1+\delta}}\right)^{np}

Simplified (for 0<δ<10 < \delta < 1):

P(Sn(1+δ)np)eδ2np3P(S_n \geq (1+\delta)np) \leq e^{-\frac{\delta^2 np}{3}}

Lower tail: For 0<δ<10 < \delta < 1:

P(Sn(1δ)np)eδ2np2P(S_n \leq (1-\delta)np) \leq e^{-\frac{\delta^2 np}{2}}

Two-sided: For δ>0\delta > 0:

P(Snnpδnp)2eδ2np3P(|S_n - np| \geq \delta np) \leq 2e^{-\frac{\delta^2 np}{3}}

When to use: Sums of bounded independent random variables, concentration around mean.

Hoeffding's Inequality

For independent random variables X1,...,XnX_1, ..., X_n with aiXibia_i \leq X_i \leq b_i, let Sn=i=1nXiS_n = \sum_{i=1}^n X_i.

Hoeffding's inequality: For any t>0t > 0:

P(SnE[Sn]t)exp(2t2i=1n(biai)2)P(S_n - E[S_n] \geq t) \leq \exp\left(-\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right)

P(SnE[Sn]t)exp(2t2i=1n(biai)2)P(S_n - E[S_n] \leq -t) \leq \exp\left(-\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right)

For sample mean: If Xi[a,b]X_i \in [a, b] and Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i:

P(XˉnE[X]t)2exp(2nt2(ba)2)P(|\bar{X}_n - E[X]| \geq t) \leq 2\exp\left(-\frac{2nt^2}{(b-a)^2}\right)

When to use: Bounded random variables, exponential concentration for sample means.

Bennett's Inequality

For independent random variables X1,...,XnX_1, ..., X_n with E[Xi]=0E[X_i] = 0, Var(Xi)=σi2\text{Var}(X_i) = \sigma_i^2, and XibX_i \leq b:

P(i=1nXit)exp(σ2b2h(btσ2))P\left(\sum_{i=1}^n X_i \geq t\right) \leq \exp\left(-\frac{\sigma^2}{b^2} h\left(\frac{bt}{\sigma^2}\right)\right)

where σ2=i=1nσi2\sigma^2 = \sum_{i=1}^n \sigma_i^2 and h(u)=(1+u)ln(1+u)uh(u) = (1+u)\ln(1+u) - u.

When to use: Sharper than Hoeffding when variance is small relative to range.

Bernstein's Inequality

For independent random variables X1,...,XnX_1, ..., X_n with E[Xi]=0E[X_i] = 0, E[Xi2]=σi2E[X_i^2] = \sigma_i^2, and XiM|X_i| \leq M:

P(i=1nXit)2exp(t2/2σ2+Mt/3)P\left(\left|\sum_{i=1}^n X_i\right| \geq t\right) \leq 2\exp\left(-\frac{t^2/2}{\sigma^2 + Mt/3}\right)

where σ2=i=1nσi2\sigma^2 = \sum_{i=1}^n \sigma_i^2.

When to use: Similar to Bennett's, interpolates between Gaussian and exponential regimes.

Azuma-Hoeffding Inequality (Martingale Concentration)

Let X0,X1,...,XnX_0, X_1, ..., X_n be a martingale with XiXi1ci|X_i - X_{i-1}| \leq c_i for all ii.

Azuma's inequality:

P(XnX0t)exp(t22i=1nci2)P(X_n - X_0 \geq t) \leq \exp\left(-\frac{t^2}{2\sum_{i=1}^n c_i^2}\right)

When to use: Martingales with bounded differences (e.g., Doob martingale, random walks).

McDiarmid's Inequality (Method of Bounded Differences)

Let f:XnRf: \mathcal{X}^n \to \mathbb{R} satisfy the bounded difference condition:

f(x1,...,xi,...,xn)f(x1,...,xi,...,xn)ci|f(x_1, ..., x_i, ..., x_n) - f(x_1, ..., x_i', ..., x_n)| \leq c_i

for all x1,...,xn,xiXx_1, ..., x_n, x_i' \in \mathcal{X}.

If X1,...,XnX_1, ..., X_n are independent:

P(f(X1,...,Xn)E[f(X1,...,Xn)]t)exp(2t2i=1nci2)P(f(X_1, ..., X_n) - E[f(X_1, ..., X_n)] \geq t) \leq \exp\left(-\frac{2t^2}{\sum_{i=1}^n c_i^2}\right)

When to use: Functions of many independent variables where changing one variable has bounded effect.

Examples: Longest increasing subsequence, chromatic number of random graph.

Mill's Inequality (Gaussian Tail Bounds)

For ZN(0,1)Z \sim \mathcal{N}(0, 1) and x>0x > 0:

1x2πex2/2(11x2)<P(Z>x)<1x2πex2/2\frac{1}{x\sqrt{2\pi}}e^{-x^2/2} \left(1 - \frac{1}{x^2}\right) < P(Z > x) < \frac{1}{x\sqrt{2\pi}}e^{-x^2/2}

Simple upper bound:

P(Z>x)12ex2/2P(Z > x) \leq \frac{1}{2}e^{-x^2/2}

When to use: Normal distribution tail probabilities.

Cantelli's Inequality (One-Sided Chebyshev)

For random variable XX with mean μ\mu and variance σ2\sigma^2, for t>0t > 0:

P(Xμt)σ2σ2+t2P(X - \mu \geq t) \leq \frac{\sigma^2}{\sigma^2 + t^2}

When to use: One-sided bound, tighter than two-sided Chebyshev.

Summary Table: Concentration Inequalities

InequalityConditionsBound for P(XE[X]t)P(X - E[X] \geq t)Decay RateBest Use Case
MarkovX0X \geq 0E[X]t\frac{E[X]}{t}O(1/t)O(1/t)Only mean known
ChebyshevFinite varianceVar(X)t2\frac{\text{Var}(X)}{t^2}O(1/t2)O(1/t^2)Mean + variance known
ChernoffMGF existsinfsestMX(s)\inf_s e^{-st}M_X(s)ExponentialMGF available
HoeffdingBounded, independentexp(2t2n(ba)2)\exp\left(-\frac{2t^2}{n(b-a)^2}\right)ExponentialBounded variables
BernsteinBounded, variance knownexp(t2/2σ2+Mt/3)\exp\left(-\frac{t^2/2}{\sigma^2 + Mt/3}\right)ExponentialSmall variance
AzumaMartingale, bounded diffexp(t22ci2)\exp\left(-\frac{t^2}{2\sum c_i^2}\right)ExponentialMartingales
McDiarmidBounded differencesexp(2t2ci2)\exp\left(-\frac{2t^2}{\sum c_i^2}\right)ExponentialFunctions of indep. vars

Advanced Limit Theorems

Law of the Iterated Logarithm (LIL)

For i.i.d. X1,X2,...X_1, X_2, ... with E[Xi]=μE[X_i] = \mu and Var(Xi)=σ2\text{Var}(X_i) = \sigma^2:

lim supnSnnμσ2nlnlnn=1a.s.\limsup_{n \to \infty} \frac{S_n - n\mu}{\sigma\sqrt{2n \ln \ln n}} = 1 \quad \text{a.s.}

lim infnSnnμσ2nlnlnn=1a.s.\liminf_{n \to \infty} \frac{S_n - n\mu}{\sigma\sqrt{2n \ln \ln n}} = -1 \quad \text{a.s.}

Interpretation: The fluctuations of SnS_n are of order nlnlnn\sqrt{n \ln \ln n} (tighter than CLT's n\sqrt{n}).

Glivenko-Cantelli Theorem

Let X1,...,XnX_1, ..., X_n be i.i.d. with CDF FF. Define the empirical CDF:

Fn(x)=1ni=1n1XixF_n(x) = \frac{1}{n}\sum_{i=1}^n \mathbb{1}_{X_i \leq x}

Glivenko-Cantelli:

supxFn(x)F(x)a.s.0\sup_x |F_n(x) - F(x)| \xrightarrow{a.s.} 0

Interpretation: The empirical CDF converges uniformly to the true CDF.

Donsker's Theorem (Functional CLT)

The empirical process:

n(Fn(x)F(x))\sqrt{n}(F_n(x) - F(x))

converges in distribution to a Brownian bridge.

Use: Asymptotic distribution theory for goodness-of-fit tests (Kolmogorov-Smirnov).

Poisson Approximation to Binomial

For XBinomial(n,p)X \sim \text{Binomial}(n, p) with np=λnp = \lambda fixed as nn \to \infty, p0p \to 0:

P(X=k)λkeλk!P(X = k) \to \frac{\lambda^k e^{-\lambda}}{k!}

When to use: Large nn, small pp, moderate npnp.

Rule of thumb: Use when n20n \geq 20 and p0.05p \leq 0.05.

Normal Approximation to Binomial

For XBinomial(n,p)X \sim \text{Binomial}(n, p):

Xnpnp(1p)dN(0,1)\frac{X - np}{\sqrt{np(1-p)}} \xrightarrow{d} \mathcal{N}(0, 1)

Continuity correction: For better approximation:

P(Xk)Φ(k+0.5npnp(1p))P(X \leq k) \approx \Phi\left(\frac{k + 0.5 - np}{\sqrt{np(1-p)}}\right)

Rule of thumb: Use when np5np \geq 5 and n(1p)5n(1-p) \geq 5.

Normal Approximation to Poisson

For XPoisson(λ)X \sim \text{Poisson}(\lambda) with large λ\lambda:

XλλdN(0,1)\frac{X - \lambda}{\sqrt{\lambda}} \xrightarrow{d} \mathcal{N}(0, 1)

Rule of thumb: Use when λ10\lambda \geq 10.

Key Techniques and Tricks

Trick 1: Choosing the Right Inequality

Hierarchy (loose to tight):

  1. Markov (only mean)
  2. Chebyshev (mean + variance)
  3. Chernoff/Hoeffding (exponential, for bounded variables)

When variance is small: Use Bernstein over Hoeffding.

Trick 2: Optimizing Chernoff Bound

To find the best Chernoff bound:

  1. Write P(Xa)etaMX(t)P(X \geq a) \leq e^{-ta}M_X(t)
  2. Minimize over tt: take derivative, set to 0
  3. Solve for optimal tt^*
  4. Substitute back

Trick 3: Union Bound with Concentration

For events A1,...,AnA_1, ..., A_n:

P(i=1nAi)i=1nP(Ai)P\left(\bigcup_{i=1}^n A_i\right) \leq \sum_{i=1}^n P(A_i)

Combine with concentration: Use Hoeffding/Chernoff for each P(Ai)P(A_i).

Trick 4: Symmetrization

Replace XiX_i with XiXiX_i - X_i' where XiX_i' is independent copy to get zero-mean variables.

Trick 5: Truncation

For unbounded variables, truncate at appropriate level, bound separately:

E[X]=E[X1XM]+E[X1X>M]E[X] = E[X \mathbb{1}_{|X| \leq M}] + E[X \mathbb{1}_{|X| > M}]

Trick 6: Moment Generating Function Composition

For sum of independent RVs:

MXi(t)=MXi(t)M_{\sum X_i}(t) = \prod M_{X_i}(t)

Trick 7: Taylor Expansion for Bounds

Use ex1+x+x2e^x \leq 1 + x + x^2 for x[0,1]x \in [0, 1] to simplify Chernoff bounds.

Trick 8: Borel-Cantelli Lemmas

First Borel-Cantelli: If n=1P(An)<\sum_{n=1}^\infty P(A_n) < \infty, then P(An i.o.)=0P(A_n \text{ i.o.}) = 0.

Second Borel-Cantelli: If events are independent and n=1P(An)=\sum_{n=1}^\infty P(A_n) = \infty, then P(An i.o.)=1P(A_n \text{ i.o.}) = 1.

Use: Proving almost sure convergence or divergence.

Trick 9: Skorohod Representation

Convergence in distribution can be represented as almost sure convergence on a different probability space.

Trick 10: Cramér-Wold Device

For multivariate convergence in distribution, it suffices to show all linear combinations converge:

XndX    aTXndaTX for all a\mathbf{X}_n \xrightarrow{d} \mathbf{X} \iff \mathbf{a}^T\mathbf{X}_n \xrightarrow{d} \mathbf{a}^T\mathbf{X} \text{ for all } \mathbf{a}

Interview Problem Types

Type 1: Identifying Convergence Mode

GivenFindApproach
Sequence of random variablesType of convergenceCheck definitions; use hierarchy (a.s. \Rightarrow in prob. \Rightarrow in dist.)

Type 2: Applying Law of Large Numbers

GivenFindApproach
i.i.d. sequence with finite mean (and variance for WLLN)Convergence of sample meanUse XˉnPμ\bar{X}_n \xrightarrow{P} \mu (WLLN) or a.s.\xrightarrow{a.s.} (SLLN)

Type 3: Central Limit Theorem Applications

GivenFindApproach
Sum or mean of i.i.d. RVs, large nnApproximate distribution or probabilityUse Xˉnμσ/nN(0,1)\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \approx \mathcal{N}(0,1) for large nn

Type 4: Delta Method

GivenFindApproach
Asymptotic distribution of Xˉn\bar{X}_n, smooth function ggAsymptotic distribution of g(Xˉn)g(\bar{X}_n)Apply delta method: n(g(Xˉn)g(μ))dN(0,σ2[g(μ)]2)\sqrt{n}(g(\bar{X}_n) - g(\mu)) \xrightarrow{d} \mathcal{N}(0, \sigma^2[g'(\mu)]^2)

Type 5: Markov's Inequality

GivenFindApproach
Non-negative RV with known meanUpper bound on tail probabilityUse P(Xa)E[X]/aP(X \geq a) \leq E[X]/a

Type 6: Chebyshev's Inequality

GivenFindApproach
RV with known mean and varianceBound on P(Xμk)P(\|X - \mu\| \geq k)Use P(Xμk)σ2/k2P(\|X - \mu\| \geq k) \leq \sigma^2/k^2

Type 7: Chernoff/Hoeffding Bounds

GivenFindApproach
Sum of bounded independent RVsExponential tail boundApply Hoeffding: P(SnE[Sn]t)2exp(2t2/[n(ba)2])P(\|S_n - E[S_n]\| \geq t) \leq 2\exp(-2t^2/[n(b-a)^2])

Type 8: Approximating Distributions

GivenFindApproach
Binomial, Poisson, or other distribution; large parametersNormal or Poisson approximationCheck conditions: CLT for normal, rare events for Poisson

Type 9: Proving Convergence

GivenFindApproach
Sequence definitionProve convergence in some modeUse appropriate inequality/theorem; check definitions

Type 10: Concentration in Specific Problems

GivenFindApproach
Problem requiring tight probability boundsChoose and apply concentration inequalitySelect based on what's known: Markov → Chebyshev → Hoeffding/Chernoff

Common Pitfalls

Pitfall 1: Confusing Convergence Types

Wrong: Assuming convergence in distribution implies convergence in probability

Correct: Only true when limiting to a constant

Pitfall 2: Misapplying CLT

Wrong: Using CLT when nn is small or variance is infinite

Check: Need nn large (typically n30n \geq 30) and finite variance

Pitfall 3: Forgetting Independence

Wrong: Applying Hoeffding/Chernoff to dependent variables

Check: Most concentration inequalities require independence

Pitfall 4: Wrong Chebyshev Application

Wrong: Using two-sided Chebyshev for one-sided bound

Better: Use Cantelli's inequality for one-sided bounds

Pitfall 5: Ignoring Continuity Correction

Wrong: Using normal approximation to binomial without correction for small nn

Check: Add 0.5 for better discrete-to-continuous approximation

Pitfall 6: Loose Bounds

Wrong: Using Markov when variance is known

Better: Use Chebyshev (or stronger) when more information available

Pitfall 7: Incorrect Delta Method

Wrong: Applying delta method when g(μ)=0g'(\mu) = 0

Check: Need g(μ)0g'(\mu) \neq 0; use second-order delta method otherwise

Quick Reference: Key Theorems

Weak Law of Large Numbers:

XˉnPμ\bar{X}_n \xrightarrow{P} \mu

Strong Law of Large Numbers:

Xˉna.s.μ\bar{X}_n \xrightarrow{a.s.} \mu

Central Limit Theorem:

Xˉnμσ/ndN(0,1)\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1)

Delta Method:

n(g(Xˉn)g(μ))dN(0,σ2[g(μ)]2)\sqrt{n}(g(\bar{X}_n) - g(\mu)) \xrightarrow{d} \mathcal{N}(0, \sigma^2[g'(\mu)]^2)

Markov's Inequality:

P(Xa)E[X]aP(X \geq a) \leq \frac{E[X]}{a}

Chebyshev's Inequality:

P(Xμk)σ2k2P(|X - \mu| \geq k) \leq \frac{\sigma^2}{k^2}

Hoeffding's Inequality (for Xi[a,b]X_i \in [a,b]):

P(Xˉnμt)2exp(2nt2(ba)2)P(|\bar{X}_n - \mu| \geq t) \leq 2\exp\left(-\frac{2nt^2}{(b-a)^2}\right)

Chernoff Bound:

P(Xa)inft>0etaMX(t)P(X \geq a) \leq \inf_{t>0} e^{-ta}M_X(t)

Practice Problem Categories

  • Proving convergence in probability vs. almost sure
  • Sample mean convergence rates
  • Normal approximation to binomial/Poisson
  • Delta method for variance, correlation, ratios
  • Slutsky's theorem applications
  • Continuous mapping theorem
  • Tail probability bounds
  • Concentration for sums
  • Union bounds with concentration
  • Martingale concentration
  • Empirical process convergence
  • Monte Carlo error bounds
  • Confidence intervals via CLT
  • Sample size calculations
  • Hypothesis testing via asymptotic normality
  • Bootstrap consistency
  • Method of moments consistency
  • Maximum likelihood asymptotics

On this page

Fundamental ConceptsConvergence of Random VariablesHierarchy of ConvergenceLaw of Large Numbers (LLN)Weak Law of Large Numbers (WLLN)Strong Law of Large Numbers (SLLN)Applications of LLNCentral Limit Theorem (CLT)Classical Central Limit TheoremBerry-Esseen Theorem (Rate of Convergence)Lindeberg-Lévy CLT (Independent, Not Identical)Lyapunov CLTMultivariate CLTDelta MethodContinuous Mapping TheoremSlutsky's TheoremConcentration InequalitiesMarkov's InequalityChebyshev's InequalityChernoff Bound (General Method)Chernoff-Hoeffding BoundHoeffding's InequalityBennett's InequalityBernstein's InequalityAzuma-Hoeffding Inequality (Martingale Concentration)McDiarmid's Inequality (Method of Bounded Differences)Mill's Inequality (Gaussian Tail Bounds)Cantelli's Inequality (One-Sided Chebyshev)Summary Table: Concentration InequalitiesAdvanced Limit TheoremsLaw of the Iterated Logarithm (LIL)Glivenko-Cantelli TheoremDonsker's Theorem (Functional CLT)Poisson Approximation to BinomialNormal Approximation to BinomialNormal Approximation to PoissonKey Techniques and TricksTrick 1: Choosing the Right InequalityTrick 2: Optimizing Chernoff BoundTrick 3: Union Bound with ConcentrationTrick 4: SymmetrizationTrick 5: TruncationTrick 6: Moment Generating Function CompositionTrick 7: Taylor Expansion for BoundsTrick 8: Borel-Cantelli LemmasTrick 9: Skorohod RepresentationTrick 10: Cramér-Wold DeviceInterview Problem TypesType 1: Identifying Convergence ModeType 2: Applying Law of Large NumbersType 3: Central Limit Theorem ApplicationsType 4: Delta MethodType 5: Markov's InequalityType 6: Chebyshev's InequalityType 7: Chernoff/Hoeffding BoundsType 8: Approximating DistributionsType 9: Proving ConvergenceType 10: Concentration in Specific ProblemsCommon PitfallsPitfall 1: Confusing Convergence TypesPitfall 2: Misapplying CLTPitfall 3: Forgetting IndependencePitfall 4: Wrong Chebyshev ApplicationPitfall 5: Ignoring Continuity CorrectionPitfall 6: Loose BoundsPitfall 7: Incorrect Delta MethodQuick Reference: Key TheoremsPractice Problem Categories