Interview prep for random variables, moments, MGFs, characteristic functions, and cumulants
A random variable is a function from the sample space to the real numbers:
X:Ω→R
Discrete random variable: Takes countable values {x1,x2,x3,...}
Continuous random variable: Takes values in an interval or union of intervals
Mixed random variable: Has both discrete and continuous components
For discrete random variable X:
pX(x)=P(X=x)
Properties:
- pX(x)≥0 for all x
- ∑xpX(x)=1
For continuous random variable X:
P(a≤X≤b)=∫abfX(x)dx
Properties:
- fX(x)≥0 for all x
- ∫−∞∞fX(x)dx=1
- Note: P(X=x)=0 for continuous X (but fX(x) can be > 1)
FX(x)=P(X≤x)
Properties:
- Non-decreasing: FX(x1)≤FX(x2) if x1≤x2
- Right-continuous
- limx→−∞FX(x)=0 and limx→∞FX(x)=1
Relationships:
- Discrete: FX(x)=∑xi≤xpX(xi)
- Continuous: FX(x)=∫−∞xfX(t)dt and fX(x)=dxdFX(x)
The n-th raw moment (or moment about the origin):
μn′=E[Xn]
Discrete: μn′=∑xxn⋅pX(x)
Continuous: μn′=∫−∞∞xn⋅fX(x)dx
Special cases:
- μ1′=E[X] (mean)
- μ2′=E[X2]
The n-th central moment (moment about the mean):
μn=E[(X−μ)n]
where μ=E[X].
Special cases:
- μ0=1 (always)
- μ1=0 (always)
- μ2=E[(X−μ)2]=Var(X) (variance)
- μ3=E[(X−μ)3] (related to skewness)
- μ4=E[(X−μ)4] (related to kurtosis)
Skewness (third standardized moment):
γ1=σ3μ3=(Var(X))3/2E[(X−μ)3]
Interpretation:
- γ1=0: symmetric distribution
- γ1>0: right-skewed (long right tail)
- γ1<0: left-skewed (long left tail)
Kurtosis (fourth standardized moment):
γ2=σ4μ4=(Var(X))2E[(X−μ)4]
Excess kurtosis:
Excess kurtosis=γ2−3
Interpretation:
- γ2=3 (excess = 0): mesokurtic (normal distribution)
- γ2>3 (excess > 0): leptokurtic (heavy tails, sharp peak)
- γ2<3 (excess < 0): platykurtic (light tails, flat peak)
Variance formula:
Var(X)=E[X2]−(E[X])2=μ2′−(μ1′)2
General relationship: Central moments can be expressed in terms of raw moments using binomial expansion:
μn=∑k=0n(kn)(−μ)n−kμk′
Example:
μ2=μ2′−(μ1′)2
μ3=μ3′−3μ1′μ2′+2(μ1′)3
μ4=μ4′−4μ1′μ3′+6(μ1′)2μ2′−3(μ1′)4
The moment generating function of X:
MX(t)=E[etX]
Discrete: MX(t)=∑xetxpX(x)
Continuous: MX(t)=∫−∞∞etxfX(x)dx
Domain: Defined for t in some neighborhood of 0 (may not exist for all distributions)
Moments from MGF:
E[Xn]=MX(n)(0)=dtndnMX(t)t=0
Why "moment generating": Taylor expansion around t=0 gives:
MX(t)=∑n=0∞n!E[Xn]tn=1+E[X]t+2!E[X2]t2+3!E[X3]t3+⋯
Uniqueness: If MGF exists in a neighborhood of 0, it uniquely determines the distribution.
Sum of independent RVs: If X and Y are independent:
MX+Y(t)=MX(t)⋅MY(t)
Scaling and shifting:
MaX+b(t)=ebtMX(at)
| Distribution | MGF MX(t) |
|---|
| Bernoulli(p) | 1−p+pet |
| Binomial(n,p) | (1−p+pet)n |
| Geometric(p) | 1−(1−p)etpet for t<−ln(1−p) |
| Poisson(λ) | eλ(et−1) |
| Exponential(λ) | λ−tλ for t<λ |
| Normal(μ,σ2) | eμt+21σ2t2 |
| Gamma(α,β) | (β−tβ)α for t<β |
The characteristic function of X:
ϕX(t)=E[eitX]
where i=−1.
Always exists (unlike MGF) because ∣eitX∣=1.
Relationship to MGF: ϕX(t)=MX(it) when MGF exists.
Uniqueness: Uniquely determines the distribution (Lévy's continuity theorem).
Moments: If moments exist:
E[Xn]=in1ϕX(n)(0)
Sum of independent RVs:
ϕX+Y(t)=ϕX(t)⋅ϕY(t)
Inversion formula: Can recover PDF/PMF from characteristic function (Fourier inversion).
- Always exist (MGF may not)
- Uniquely determine distributions
- Simplify convolution (sum of independent RVs)
- Central to proving Central Limit Theorem
Cumulants are alternative descriptors of a distribution, defined via the cumulant generating function:
KX(t)=lnMX(t)
The n-th cumulant κn is:
κn=dtndnKX(t)t=0
Taylor expansion:
KX(t)=∑n=1∞κnn!tn
First few cumulants:
- κ1=μ (mean)
- κ2=σ2=Var(X) (variance)
- κ3=μ3=E[(X−μ)3] (third central moment)
- κ4=μ4−3σ4 (excess kurtosis times σ4)
General relationship: Cumulants can be expressed in terms of moments (and vice versa) using Bell polynomials.
Additivity for independent variables: If X and Y are independent:
κn(X+Y)=κn(X)+κn(Y)
This is much simpler than for moments (which require convolution).
Scaling:
κn(aX)=anκn(X)
Translation:
κ1(X+b)=κ1(X)+b
κn(X+b)=κn(X) for n≥2
Normal distribution: For X∼N(μ,σ2):
κ1=μ,κ2=σ2,κn=0 for n≥3
This characterizes the normal distribution!
- Cumulants of sums of independent RVs add (moments don't)
- Easier algebra for sums and convolutions
- Normal distribution has only two non-zero cumulants
- Natural for Central Limit Theorem analysis
- Appear in statistical physics (connected correlation functions)
For random variables X and Y:
Joint PMF (discrete):
pX,Y(x,y)=P(X=x,Y=y)
Joint PDF (continuous):
P((X,Y)∈A)=∬AfX,Y(x,y)dxdy
Joint CDF:
FX,Y(x,y)=P(X≤x,Y≤y)
Discrete:
pX(x)=∑ypX,Y(x,y)
Continuous:
fX(x)=∫−∞∞fX,Y(x,y)dy
Discrete:
pX∣Y(x∣y)=pY(y)pX,Y(x,y)
Continuous:
fX∣Y(x∣y)=fY(y)fX,Y(x,y)
X and Y are independent if and only if:
fX,Y(x,y)=fX(x)⋅fY(y)for all x,y
Equivalently:
- P(X∈A,Y∈B)=P(X∈A)⋅P(Y∈B) for all sets A,B
- E[g(X)h(Y)]=E[g(X)]⋅E[h(Y)] for all functions g,h
Cov(X,Y)=E[(X−E[X])(Y−E[Y])]=E[XY]−E[X]E[Y]
Properties:
- Cov(X,X)=Var(X)
- Cov(X,Y)=Cov(Y,X) (symmetric)
- Cov(aX+b,Y)=a⋅Cov(X,Y)
- If X and Y are independent: Cov(X,Y)=0 (but converse not always true)
Variance of sum:
Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)
General sum:
Var(∑i=1nXi)=∑i=1nVar(Xi)+2∑i<jCov(Xi,Xj)
ρ(X,Y)=Var(X)⋅Var(Y)Cov(X,Y)=σXσYCov(X,Y)
Properties:
- −1≤ρ(X,Y)≤1 (Cauchy-Schwarz inequality)
- ∣ρ∣=1 if and only if Y=aX+b for some constants a,b (perfect linear relationship)
- ρ=0: uncorrelated (but not necessarily independent)
Note:
- Independence ⇒ uncorrelated
- Uncorrelated ⇒ independent (e.g., Y=X2 where X∼N(0,1))
For random vector X=(X1,X2,...,Xn)T:
Σ=Cov(X)=E[(X−μ)(X−μ)T]
where μ=E[X].
Matrix elements:
Σij=Cov(Xi,Xj)
Properties:
- Symmetric: ΣT=Σ
- Positive semi-definite: vTΣv≥0 for all v
- Diagonal elements are variances: Σii=Var(Xi)
Rij=ρ(Xi,Xj)=σiσjCov(Xi,Xj)
Relationship to covariance matrix: If D is diagonal with Dii=σi:
R=D−1ΣD−1
MX(t)=E[etTX]=E[et1X1+t2X2+⋯+tnXn]
For independent components: MX(t)=∏i=1nMXi(ti)
ϕX(t)=E[eitTX]
Discrete case: If Y=g(X):
pY(y)=∑x:g(x)=ypX(x)
Continuous case (monotonic g): If Y=g(X) where g is monotonic and differentiable:
fY(y)=fX(g−1(y))dydg−1(y)=fX(x)dydx
Non-monotonic case: Sum over all inverse images.
For transformation (U,V)=g(X,Y):
fU,V(u,v)=fX,Y(x,y)∣J∣
where ∣J∣ is the absolute value of the Jacobian determinant:
J=det(∂u∂x∂u∂y∂v∂x∂v∂y)
Steps:
- Express (x,y) in terms of (u,v)
- Compute the Jacobian determinant
- Substitute into the formula
Linear transformation: If Y=aX+b:
- E[Y]=aE[X]+b
- Var(Y)=a2Var(X)
Sum of independent RVs: If Z=X+Y and X, Y independent:
fZ(z)=∫−∞∞fX(x)fY(z−x)dx(convolution)
Product: If Z=XY and X, Y independent:
fZ(z)=∫−∞∞∣x∣1fX(x)fY(z/x)dx
For (X,Y)∼N(μX,μY,σX2,σY2,ρ):
fX,Y(x,y)=2πσXσY1−ρ21exp(−2(1−ρ2)1[σX2(x−μX)2−σXσY2ρ(x−μX)(y−μY)+σY2(y−μY)2])
Properties:
- Marginals are normal: X∼N(μX,σX2), Y∼N(μY,σY2)
- If ρ=0, then X and Y are independent (special to normal!)
- Linear combinations are normal
For X∼N(μ,Σ):
fX(x)=(2π)n/2∣Σ∣1/21exp(−21(x−μ)TΣ−1(x−μ))
MGF:
MX(t)=exp(μTt+21tTΣt)
Properties:
- Linear transformations are normal: If Y=AX+b, then Y∼N(Aμ+b,AΣAT)
- Marginals are normal
- Conditionals are normal
- Uncorrelated components are independent
For i.i.d. random variables X1,...,Xn, let X(1)≤X(2)≤⋯≤X(n) be the order statistics.
PDF of k-th order statistic:
fX(k)(x)=(k−1)!(n−k)!n![F(x)]k−1[1−F(x)]n−kf(x)
Joint PDF of (X(i),X(j)) for i<j:
fX(i),X(j)(x,y)=(i−1)!(j−i−1)!(n−j)!n![F(x)]i−1[F(y)−F(x)]j−i−1[1−F(y)]n−jf(x)f(y)
Minimum and Maximum:
- FX(1)(x)=1−[1−F(x)]n
- FX(n)(x)=[F(x)]n
Conditional expectation E[X∣Y] is a random variable (function of Y):
E[X∣Y]=g(Y)
where g(y)=E[X∣Y=y].
Tower property:
E[E[X∣Y]]=E[X]
Taking out what's known:
E[g(Y)X∣Y]=g(Y)E[X∣Y]
Law of total variance:
Var(X)=E[Var(X∣Y)]+Var(E[X∣Y])
A copula is a function that links univariate marginal distributions to form a multivariate distribution.
Sklar's theorem: For any joint distribution FX,Y(x,y) with marginals FX(x) and FY(y), there exists a copula C such that:
FX,Y(x,y)=C(FX(x),FY(y))
Use: Separate modeling of marginals and dependence structure.
Always easier than computing E[(X−μ)2] directly.
Use MX+Y(t)=MX(t)⋅MY(t) for independent X, Y.
Example: Sum of independent normals is normal (check MGF).
E[X]=∑k=1∞P(X≥k)for X∈{1,2,3,...}
If (X,Y) is symmetric (e.g., i.i.d.), then E[X]=E[Y] and Var(X)=Var(Y).
For Y=g(X), find FY(y)=P(Y≤y)=P(g(X)≤y), then differentiate.
Exponential (continuous) and geometric (discrete) are the only memoryless distributions.
If MXn(t)→eat+bt2/2, then Xn→N(a,b) in distribution.
If Y=AX, then Cov(Y)=A⋅Cov(X)⋅AT.
Check if joint PDF/PMF factorizes: fX,Y(x,y)=g(x)⋅h(y).
For sums of independent RVs, cumulants add: κn(X+Y)=κn(X)+κn(Y).
| Given | Find | Approach |
|---|
| PMF or PDF of X | E[X], Var(X), or higher moments | Use E[Xn]=∑xnp(x) or ∫xnf(x)dx, then Var(X)=E[X2]−(E[X])2 |
| Given | Find | Approach |
|---|
| MGF MX(t) or need to find moments | Moments or identify distribution | Take derivatives: E[Xn]=MX(n)(0), or match known MGF forms |
| Given | Find | Approach |
|---|
| Distribution of X, transformation Y=g(X) | Distribution of Y | Use CDF method or Jacobian; for monotonic g: fY(y)=fX(x)∥dx/dy∥ |
| Given | Find | Approach |
|---|
| Joint PDF/PMF of (X,Y) | Marginals, conditionals, or P((X,Y)∈A) | Integrate/sum out variables for marginals; use fX∥Y(x∥y)=fX,Y(x,y)/fY(y) |
| Given | Find | Approach |
|---|
| Joint distribution or moments | Cov(X,Y) or ρ(X,Y) | Use Cov(X,Y)=E[XY]−E[X]E[Y], then ρ=Cov/(σXσY) |
| Given | Find | Approach |
|---|
| Joint distribution | Whether X and Y are independent | Check if fX,Y(x,y)=fX(x)⋅fY(y) or if Cov(X,Y)=0 for normal |
| Given | Find | Approach |
|---|
| Mean vector μ and covariance matrix Σ | Marginals, conditionals, or linear transformations | Use normal properties; marginals/conditionals are normal; AX+b is normal |
| Given | Find | Approach |
|---|
| Distribution of i.i.d. X1,...,Xn | Distribution of X(k) (min, max, median) | Use order statistic formulas with F(x) and f(x) |
| Given | Find | Approach |
|---|
| MGF or distribution | Cumulants κn | Use KX(t)=lnMX(t), then κn=KX(n)(0) |
| Given | Find | Approach |
|---|
| Independent X and Y, need distribution of Z=X+Y | PDF/PMF of Z | Use convolution: fZ(z)=∫fX(x)fY(z−x)dx or MGF: MZ(t)=MX(t)MY(t) |
Wrong: Assuming E[XY]=E[X]E[Y] when X and Y are dependent
Check: This only holds when X and Y are independent
Wrong: Using fY(y)=fX(g−1(y)) without the derivative term
Correct: fY(y)=fX(x)∣dx/dy∣ where x=g−1(y)
Wrong: Assuming Cov(X,Y)=0 implies independence
Correct: Independence implies uncorrelated, but not vice versa (except for normal)
Wrong: Thinking PDF must be ≤ 1
Correct: PDF can exceed 1 (but integrates to 1); it's P(X=x)=0 that holds
Wrong: Var(X+Y)=Var(X)+Var(Y) when X and Y are dependent
Correct: Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)
Wrong: Confusing fX(x) with fX∣Y(x∣y)
Check: Marginal integrates over all Y; conditional fixes Y
Wrong: Assuming every distribution has an MGF
Example: Cauchy distribution has no MGF (but has characteristic function)
Variance:
Var(X)=E[X2]−(E[X])2
Covariance:
Cov(X,Y)=E[XY]−E[X]E[Y]
Correlation:
ρ(X,Y)=σXσYCov(X,Y)
MGF:
MX(t)=E[etX],E[Xn]=MX(n)(0)
Cumulant generating function:
KX(t)=lnMX(t),κn=KX(n)(0)
Transformation (monotonic):
fY(y)=fX(g−1(y))dydx
Convolution:
fX+Y(z)=∫−∞∞fX(x)fY(z−x)dx
Multivariate normal MGF:
MX(t)=exp(μTt+21tTΣt)
Law of total variance:
Var(X)=E[Var(X∣Y)]+Var(E[X∣Y])
- Variance calculations
- MGF identification and manipulation
- Characteristic functions
- Transformation of variables
- Jacobian determinants
- Order statistics
- Bivariate normal
- Covariance matrices
- Linear transformations
- Convolutions
- Skewness and kurtosis
- Cumulant calculations
- Conditional distributions
- Sum of independent RVs
- Mixture distributions
- Copulas
- Moment inequalities (Markov, Chebyshev)
- Probability integral transform