Limit Theorems
Interview prep for law of large numbers, central limit theorem, and concentration inequalities
Fundamental Concepts
Convergence of Random Variables
There are several modes of convergence for sequences of random variables:
1. Convergence in Distribution (Weak Convergence)
if:
for all where is continuous.
Notation: Also written as or .
2. Convergence in Probability
if for all :
Interpretation: The probability that is far from goes to zero.
3. Convergence Almost Surely (Strong Convergence)
if:
Interpretation: converges to along almost every sample path.
4. Convergence in (Mean)
if:
Special cases:
- : Convergence in mean
- : Convergence in mean square
Hierarchy of Convergence
Relationships (stronger → weaker):
Key facts:
- Almost sure convergence is strongest
- Convergence in distribution is weakest
- convergence for implies convergence
- Convergence in distribution to a constant implies convergence in probability to that constant
Law of Large Numbers (LLN)
Weak Law of Large Numbers (WLLN)
Let be i.i.d. random variables with and .
Define the sample mean:
WLLN:
Interpretation: For large , the sample mean is close to the population mean with high probability.
Formal statement: For all :
Proof technique: Use Chebyshev's inequality:
Strong Law of Large Numbers (SLLN)
Under the same conditions as WLLN:
SLLN:
Interpretation: The sample mean converges to the population mean along almost every sample path (not just in probability).
Formal statement:
Comparison with WLLN:
- SLLN is stronger than WLLN
- SLLN requires only (finite mean), not finite variance
- SLLN guarantees convergence along almost every realization
Applications of LLN
Monte Carlo simulation: Sample mean approximates expected value
Frequency interpretation: Relative frequency converges to probability
Casino/insurance: Average payout converges to expected payout (law of averages)
Polling: Sample proportion converges to population proportion
Central Limit Theorem (CLT)
Classical Central Limit Theorem
Let be i.i.d. random variables with and .
Define:
CLT: The standardized sum converges in distribution to standard normal:
Equivalently:
Interpretation: For large , the distribution of is approximately normal, regardless of the distribution of .
Practical form: For large :
where is the standard normal CDF.
Berry-Esseen Theorem (Rate of Convergence)
Provides bounds on how fast the CLT convergence occurs.
If :
where is a constant.
Interpretation: Error in normal approximation is .
Lindeberg-Lévy CLT (Independent, Not Identical)
For independent (but not necessarily identically distributed) random variables with and :
If certain conditions hold (Lindeberg condition), then:
Lyapunov CLT
A sufficient condition for CLT to hold for independent (not identical) RVs.
If for some :
then the CLT holds.
Multivariate CLT
Let be i.i.d. random vectors with and .
Multivariate CLT:
where .
Delta Method
Problem: Find limiting distribution of where is a smooth function.
Delta Method: If and :
Multivariate Delta Method: If :
When to use: Finding asymptotic distributions of statistics like sample variance, correlation, etc.
Continuous Mapping Theorem
If and is continuous, then:
Use: Transforming convergent sequences while preserving convergence in distribution.
Slutsky's Theorem
If and (constant), then:
- (if )
Use: Combining convergent sequences, working with estimated variances.
Concentration Inequalities
Concentration inequalities provide bounds on how much a random variable deviates from its mean.
Markov's Inequality
For non-negative random variable and :
Proof:
When to use: Only know the mean, want rough bound.
Limitations: Often very loose; only uses first moment.
Generalization: For any non-negative function and :
Chebyshev's Inequality
For random variable with mean and variance , for any :
Alternative form: For any :
Proof: Apply Markov's inequality to :
When to use: Know mean and variance, tighter than Markov.
Key insight: Uses second moment (variance) to get better bound than Markov.
Chernoff Bound (General Method)
For any random variable and :
where is the MGF.
Proof: For :
Then optimize over .
When to use: MGF is available, want exponentially decaying bound.
Key advantage: Provides exponential tail bounds (much stronger than polynomial bounds).
Chernoff-Hoeffding Bound
For i.i.d. Bernoulli random variables with , let .
Upper tail: For :
Simplified (for ):
Lower tail: For :
Two-sided: For :
When to use: Sums of bounded independent random variables, concentration around mean.
Hoeffding's Inequality
For independent random variables with , let .
Hoeffding's inequality: For any :
For sample mean: If and :
When to use: Bounded random variables, exponential concentration for sample means.
Bennett's Inequality
For independent random variables with , , and :
where and .
When to use: Sharper than Hoeffding when variance is small relative to range.
Bernstein's Inequality
For independent random variables with , , and :
where .
When to use: Similar to Bennett's, interpolates between Gaussian and exponential regimes.
Azuma-Hoeffding Inequality (Martingale Concentration)
Let be a martingale with for all .
Azuma's inequality:
When to use: Martingales with bounded differences (e.g., Doob martingale, random walks).
McDiarmid's Inequality (Method of Bounded Differences)
Let satisfy the bounded difference condition:
for all .
If are independent:
When to use: Functions of many independent variables where changing one variable has bounded effect.
Examples: Longest increasing subsequence, chromatic number of random graph.
Mill's Inequality (Gaussian Tail Bounds)
For and :
Simple upper bound:
When to use: Normal distribution tail probabilities.
Cantelli's Inequality (One-Sided Chebyshev)
For random variable with mean and variance , for :
When to use: One-sided bound, tighter than two-sided Chebyshev.
Summary Table: Concentration Inequalities
| Inequality | Conditions | Bound for | Decay Rate | Best Use Case |
|---|---|---|---|---|
| Markov | Only mean known | |||
| Chebyshev | Finite variance | Mean + variance known | ||
| Chernoff | MGF exists | Exponential | MGF available | |
| Hoeffding | Bounded, independent | Exponential | Bounded variables | |
| Bernstein | Bounded, variance known | Exponential | Small variance | |
| Azuma | Martingale, bounded diff | Exponential | Martingales | |
| McDiarmid | Bounded differences | Exponential | Functions of indep. vars |
Advanced Limit Theorems
Law of the Iterated Logarithm (LIL)
For i.i.d. with and :
Interpretation: The fluctuations of are of order (tighter than CLT's ).
Glivenko-Cantelli Theorem
Let be i.i.d. with CDF . Define the empirical CDF:
Glivenko-Cantelli:
Interpretation: The empirical CDF converges uniformly to the true CDF.
Donsker's Theorem (Functional CLT)
The empirical process:
converges in distribution to a Brownian bridge.
Use: Asymptotic distribution theory for goodness-of-fit tests (Kolmogorov-Smirnov).
Poisson Approximation to Binomial
For with fixed as , :
When to use: Large , small , moderate .
Rule of thumb: Use when and .
Normal Approximation to Binomial
For :
Continuity correction: For better approximation:
Rule of thumb: Use when and .
Normal Approximation to Poisson
For with large :
Rule of thumb: Use when .
Key Techniques and Tricks
Trick 1: Choosing the Right Inequality
Hierarchy (loose to tight):
- Markov (only mean)
- Chebyshev (mean + variance)
- Chernoff/Hoeffding (exponential, for bounded variables)
When variance is small: Use Bernstein over Hoeffding.
Trick 2: Optimizing Chernoff Bound
To find the best Chernoff bound:
- Write
- Minimize over : take derivative, set to 0
- Solve for optimal
- Substitute back
Trick 3: Union Bound with Concentration
For events :
Combine with concentration: Use Hoeffding/Chernoff for each .
Trick 4: Symmetrization
Replace with where is independent copy to get zero-mean variables.
Trick 5: Truncation
For unbounded variables, truncate at appropriate level, bound separately:
Trick 6: Moment Generating Function Composition
For sum of independent RVs:
Trick 7: Taylor Expansion for Bounds
Use for to simplify Chernoff bounds.
Trick 8: Borel-Cantelli Lemmas
First Borel-Cantelli: If , then .
Second Borel-Cantelli: If events are independent and , then .
Use: Proving almost sure convergence or divergence.
Trick 9: Skorohod Representation
Convergence in distribution can be represented as almost sure convergence on a different probability space.
Trick 10: Cramér-Wold Device
For multivariate convergence in distribution, it suffices to show all linear combinations converge:
Interview Problem Types
Type 1: Identifying Convergence Mode
| Given | Find | Approach |
|---|---|---|
| Sequence of random variables | Type of convergence | Check definitions; use hierarchy (a.s. in prob. in dist.) |
Type 2: Applying Law of Large Numbers
| Given | Find | Approach |
|---|---|---|
| i.i.d. sequence with finite mean (and variance for WLLN) | Convergence of sample mean | Use (WLLN) or (SLLN) |
Type 3: Central Limit Theorem Applications
| Given | Find | Approach |
|---|---|---|
| Sum or mean of i.i.d. RVs, large | Approximate distribution or probability | Use for large |
Type 4: Delta Method
| Given | Find | Approach |
|---|---|---|
| Asymptotic distribution of , smooth function | Asymptotic distribution of | Apply delta method: |
Type 5: Markov's Inequality
| Given | Find | Approach |
|---|---|---|
| Non-negative RV with known mean | Upper bound on tail probability | Use |
Type 6: Chebyshev's Inequality
| Given | Find | Approach |
|---|---|---|
| RV with known mean and variance | Bound on | Use |
Type 7: Chernoff/Hoeffding Bounds
| Given | Find | Approach |
|---|---|---|
| Sum of bounded independent RVs | Exponential tail bound | Apply Hoeffding: |
Type 8: Approximating Distributions
| Given | Find | Approach |
|---|---|---|
| Binomial, Poisson, or other distribution; large parameters | Normal or Poisson approximation | Check conditions: CLT for normal, rare events for Poisson |
Type 9: Proving Convergence
| Given | Find | Approach |
|---|---|---|
| Sequence definition | Prove convergence in some mode | Use appropriate inequality/theorem; check definitions |
Type 10: Concentration in Specific Problems
| Given | Find | Approach |
|---|---|---|
| Problem requiring tight probability bounds | Choose and apply concentration inequality | Select based on what's known: Markov → Chebyshev → Hoeffding/Chernoff |
Common Pitfalls
Pitfall 1: Confusing Convergence Types
Wrong: Assuming convergence in distribution implies convergence in probability
Correct: Only true when limiting to a constant
Pitfall 2: Misapplying CLT
Wrong: Using CLT when is small or variance is infinite
Check: Need large (typically ) and finite variance
Pitfall 3: Forgetting Independence
Wrong: Applying Hoeffding/Chernoff to dependent variables
Check: Most concentration inequalities require independence
Pitfall 4: Wrong Chebyshev Application
Wrong: Using two-sided Chebyshev for one-sided bound
Better: Use Cantelli's inequality for one-sided bounds
Pitfall 5: Ignoring Continuity Correction
Wrong: Using normal approximation to binomial without correction for small
Check: Add 0.5 for better discrete-to-continuous approximation
Pitfall 6: Loose Bounds
Wrong: Using Markov when variance is known
Better: Use Chebyshev (or stronger) when more information available
Pitfall 7: Incorrect Delta Method
Wrong: Applying delta method when
Check: Need ; use second-order delta method otherwise
Quick Reference: Key Theorems
Weak Law of Large Numbers:
Strong Law of Large Numbers:
Central Limit Theorem:
Delta Method:
Markov's Inequality:
Chebyshev's Inequality:
Hoeffding's Inequality (for ):
Chernoff Bound:
Practice Problem Categories
- Proving convergence in probability vs. almost sure
- Sample mean convergence rates
- Normal approximation to binomial/Poisson
- Delta method for variance, correlation, ratios
- Slutsky's theorem applications
- Continuous mapping theorem
- Tail probability bounds
- Concentration for sums
- Union bounds with concentration
- Martingale concentration
- Empirical process convergence
- Monte Carlo error bounds
- Confidence intervals via CLT
- Sample size calculations
- Hypothesis testing via asymptotic normality
- Bootstrap consistency
- Method of moments consistency
- Maximum likelihood asymptotics