Preptide

Random Variables

Interview prep for random variables, moments, MGFs, characteristic functions, and cumulants

Fundamental Concepts

Random Variables

A random variable is a function from the sample space to the real numbers:

X:ΩRX: \Omega \to \mathbb{R}

Discrete random variable: Takes countable values {x1,x2,x3,...}\{x_1, x_2, x_3, ...\}

Continuous random variable: Takes values in an interval or union of intervals

Mixed random variable: Has both discrete and continuous components

Probability Mass Function (PMF)

For discrete random variable X:

pX(x)=P(X=x)p_X(x) = P(X = x)

Properties:

  • pX(x)0p_X(x) \geq 0 for all xx
  • xpX(x)=1\sum_{x} p_X(x) = 1

Probability Density Function (PDF)

For continuous random variable X:

P(aXb)=abfX(x)dxP(a \leq X \leq b) = \int_a^b f_X(x) \, dx

Properties:

  • fX(x)0f_X(x) \geq 0 for all xx
  • fX(x)dx=1\int_{-\infty}^{\infty} f_X(x) \, dx = 1
  • Note: P(X=x)=0P(X = x) = 0 for continuous X (but fX(x)f_X(x) can be > 1)

Cumulative Distribution Function (CDF)

FX(x)=P(Xx)F_X(x) = P(X \leq x)

Properties:

  • Non-decreasing: FX(x1)FX(x2)F_X(x_1) \leq F_X(x_2) if x1x2x_1 \leq x_2
  • Right-continuous
  • limxFX(x)=0\lim_{x \to -\infty} F_X(x) = 0 and limxFX(x)=1\lim_{x \to \infty} F_X(x) = 1

Relationships:

  • Discrete: FX(x)=xixpX(xi)F_X(x) = \sum_{x_i \leq x} p_X(x_i)
  • Continuous: FX(x)=xfX(t)dtF_X(x) = \int_{-\infty}^x f_X(t) \, dt and fX(x)=ddxFX(x)f_X(x) = \frac{d}{dx} F_X(x)

Moments

Raw Moments

The nn-th raw moment (or moment about the origin):

μn=E[Xn]\mu_n' = E[X^n]

Discrete: μn=xxnpX(x)\mu_n' = \sum_x x^n \cdot p_X(x)

Continuous: μn=xnfX(x)dx\mu_n' = \int_{-\infty}^{\infty} x^n \cdot f_X(x) \, dx

Special cases:

  • μ1=E[X]\mu_1' = E[X] (mean)
  • μ2=E[X2]\mu_2' = E[X^2]

Central Moments

The nn-th central moment (moment about the mean):

μn=E[(Xμ)n]\mu_n = E[(X - \mu)^n]

where μ=E[X]\mu = E[X].

Special cases:

  • μ0=1\mu_0 = 1 (always)
  • μ1=0\mu_1 = 0 (always)
  • μ2=E[(Xμ)2]=Var(X)\mu_2 = E[(X - \mu)^2] = \text{Var}(X) (variance)
  • μ3=E[(Xμ)3]\mu_3 = E[(X - \mu)^3] (related to skewness)
  • μ4=E[(Xμ)4]\mu_4 = E[(X - \mu)^4] (related to kurtosis)

Standardized Moments

Skewness (third standardized moment):

γ1=μ3σ3=E[(Xμ)3](Var(X))3/2\gamma_1 = \frac{\mu_3}{\sigma^3} = \frac{E[(X - \mu)^3]}{(\text{Var}(X))^{3/2}}

Interpretation:

  • γ1=0\gamma_1 = 0: symmetric distribution
  • γ1>0\gamma_1 > 0: right-skewed (long right tail)
  • γ1<0\gamma_1 < 0: left-skewed (long left tail)

Kurtosis (fourth standardized moment):

γ2=μ4σ4=E[(Xμ)4](Var(X))2\gamma_2 = \frac{\mu_4}{\sigma^4} = \frac{E[(X - \mu)^4]}{(\text{Var}(X))^2}

Excess kurtosis:

Excess kurtosis=γ23\text{Excess kurtosis} = \gamma_2 - 3

Interpretation:

  • γ2=3\gamma_2 = 3 (excess = 0): mesokurtic (normal distribution)
  • γ2>3\gamma_2 > 3 (excess > 0): leptokurtic (heavy tails, sharp peak)
  • γ2<3\gamma_2 < 3 (excess < 0): platykurtic (light tails, flat peak)

Moment Relationships

Variance formula:

Var(X)=E[X2](E[X])2=μ2(μ1)2\text{Var}(X) = E[X^2] - (E[X])^2 = \mu_2' - (\mu_1')^2

General relationship: Central moments can be expressed in terms of raw moments using binomial expansion:

μn=k=0n(nk)(μ)nkμk\mu_n = \sum_{k=0}^{n} \binom{n}{k} (-\mu)^{n-k} \mu_k'

Example:

μ2=μ2(μ1)2\mu_2 = \mu_2' - (\mu_1')^2

μ3=μ33μ1μ2+2(μ1)3\mu_3 = \mu_3' - 3\mu_1'\mu_2' + 2(\mu_1')^3

μ4=μ44μ1μ3+6(μ1)2μ23(μ1)4\mu_4 = \mu_4' - 4\mu_1'\mu_3' + 6(\mu_1')^2\mu_2' - 3(\mu_1')^4

Moment Generating Functions (MGF)

Definition

The moment generating function of X:

MX(t)=E[etX]M_X(t) = E[e^{tX}]

Discrete: MX(t)=xetxpX(x)M_X(t) = \sum_x e^{tx} p_X(x)

Continuous: MX(t)=etxfX(x)dxM_X(t) = \int_{-\infty}^{\infty} e^{tx} f_X(x) \, dx

Domain: Defined for tt in some neighborhood of 0 (may not exist for all distributions)

Properties of MGF

Moments from MGF:

E[Xn]=MX(n)(0)=dndtnMX(t)t=0E[X^n] = M_X^{(n)}(0) = \left. \frac{d^n}{dt^n} M_X(t) \right|_{t=0}

Why "moment generating": Taylor expansion around t=0t=0 gives:

MX(t)=n=0E[Xn]n!tn=1+E[X]t+E[X2]2!t2+E[X3]3!t3+M_X(t) = \sum_{n=0}^{\infty} \frac{E[X^n]}{n!} t^n = 1 + E[X]t + \frac{E[X^2]}{2!}t^2 + \frac{E[X^3]}{3!}t^3 + \cdots

Uniqueness: If MGF exists in a neighborhood of 0, it uniquely determines the distribution.

Sum of independent RVs: If X and Y are independent:

MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) \cdot M_Y(t)

Scaling and shifting:

MaX+b(t)=ebtMX(at)M_{aX+b}(t) = e^{bt} M_X(at)

Common MGFs

DistributionMGF MX(t)M_X(t)
Bernoulli(pp)1p+pet1 - p + pe^t
Binomial(n,pn,p)(1p+pet)n(1 - p + pe^t)^n
Geometric(pp)pet1(1p)et\frac{pe^t}{1 - (1-p)e^t} for t<ln(1p)t < -\ln(1-p)
Poisson(λ\lambda)eλ(et1)e^{\lambda(e^t - 1)}
Exponential(λ\lambda)λλt\frac{\lambda}{\lambda - t} for t<λt < \lambda
Normal(μ,σ2\mu, \sigma^2)eμt+12σ2t2e^{\mu t + \frac{1}{2}\sigma^2 t^2}
Gamma(α,β\alpha, \beta)(ββt)α\left(\frac{\beta}{\beta - t}\right)^\alpha for t<βt < \beta

Characteristic Functions

Definition

The characteristic function of X:

ϕX(t)=E[eitX]\phi_X(t) = E[e^{itX}]

where i=1i = \sqrt{-1}.

Always exists (unlike MGF) because eitX=1|e^{itX}| = 1.

Relationship to MGF: ϕX(t)=MX(it)\phi_X(t) = M_X(it) when MGF exists.

Properties of Characteristic Functions

Uniqueness: Uniquely determines the distribution (Lévy's continuity theorem).

Moments: If moments exist:

E[Xn]=1inϕX(n)(0)E[X^n] = \frac{1}{i^n} \phi_X^{(n)}(0)

Sum of independent RVs:

ϕX+Y(t)=ϕX(t)ϕY(t)\phi_{X+Y}(t) = \phi_X(t) \cdot \phi_Y(t)

Inversion formula: Can recover PDF/PMF from characteristic function (Fourier inversion).

Why Characteristic Functions Are Useful

  • Always exist (MGF may not)
  • Uniquely determine distributions
  • Simplify convolution (sum of independent RVs)
  • Central to proving Central Limit Theorem

Cumulants

Definition

Cumulants are alternative descriptors of a distribution, defined via the cumulant generating function:

KX(t)=lnMX(t)K_X(t) = \ln M_X(t)

The nn-th cumulant κn\kappa_n is:

κn=dndtnKX(t)t=0\kappa_n = \left. \frac{d^n}{dt^n} K_X(t) \right|_{t=0}

Taylor expansion:

KX(t)=n=1κntnn!K_X(t) = \sum_{n=1}^{\infty} \kappa_n \frac{t^n}{n!}

Cumulants vs. Moments

First few cumulants:

  • κ1=μ\kappa_1 = \mu (mean)
  • κ2=σ2=Var(X)\kappa_2 = \sigma^2 = \text{Var}(X) (variance)
  • κ3=μ3=E[(Xμ)3]\kappa_3 = \mu_3 = E[(X - \mu)^3] (third central moment)
  • κ4=μ43σ4\kappa_4 = \mu_4 - 3\sigma^4 (excess kurtosis times σ4\sigma^4)

General relationship: Cumulants can be expressed in terms of moments (and vice versa) using Bell polynomials.

Properties of Cumulants

Additivity for independent variables: If X and Y are independent:

κn(X+Y)=κn(X)+κn(Y)\kappa_n(X + Y) = \kappa_n(X) + \kappa_n(Y)

This is much simpler than for moments (which require convolution).

Scaling:

κn(aX)=anκn(X)\kappa_n(aX) = a^n \kappa_n(X)

Translation:

κ1(X+b)=κ1(X)+b\kappa_1(X + b) = \kappa_1(X) + b

κn(X+b)=κn(X) for n2\kappa_n(X + b) = \kappa_n(X) \text{ for } n \geq 2

Normal distribution: For XN(μ,σ2)X \sim \mathcal{N}(\mu, \sigma^2):

κ1=μ,κ2=σ2,κn=0 for n3\kappa_1 = \mu, \quad \kappa_2 = \sigma^2, \quad \kappa_n = 0 \text{ for } n \geq 3

This characterizes the normal distribution!

Why Cumulants Are Useful

  • Cumulants of sums of independent RVs add (moments don't)
  • Easier algebra for sums and convolutions
  • Normal distribution has only two non-zero cumulants
  • Natural for Central Limit Theorem analysis
  • Appear in statistical physics (connected correlation functions)

Multivariate Random Variables

Joint Distributions

For random variables XX and YY:

Joint PMF (discrete):

pX,Y(x,y)=P(X=x,Y=y)p_{X,Y}(x,y) = P(X = x, Y = y)

Joint PDF (continuous):

P((X,Y)A)=AfX,Y(x,y)dxdyP((X,Y) \in A) = \iint_A f_{X,Y}(x,y) \, dx \, dy

Joint CDF:

FX,Y(x,y)=P(Xx,Yy)F_{X,Y}(x,y) = P(X \leq x, Y \leq y)

Marginal Distributions

Discrete:

pX(x)=ypX,Y(x,y)p_X(x) = \sum_y p_{X,Y}(x,y)

Continuous:

fX(x)=fX,Y(x,y)dyf_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dy

Conditional Distributions

Discrete:

pXY(xy)=pX,Y(x,y)pY(y)p_{X|Y}(x|y) = \frac{p_{X,Y}(x,y)}{p_Y(y)}

Continuous:

fXY(xy)=fX,Y(x,y)fY(y)f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}

Independence

X and Y are independent if and only if:

fX,Y(x,y)=fX(x)fY(y)for all x,yf_{X,Y}(x,y) = f_X(x) \cdot f_Y(y) \quad \text{for all } x, y

Equivalently:

  • P(XA,YB)=P(XA)P(YB)P(X \in A, Y \in B) = P(X \in A) \cdot P(Y \in B) for all sets A,BA, B
  • E[g(X)h(Y)]=E[g(X)]E[h(Y)]E[g(X)h(Y)] = E[g(X)] \cdot E[h(Y)] for all functions g,hg, h

Covariance and Correlation

Covariance

Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]\text{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]

Properties:

  • Cov(X,X)=Var(X)\text{Cov}(X, X) = \text{Var}(X)
  • Cov(X,Y)=Cov(Y,X)\text{Cov}(X, Y) = \text{Cov}(Y, X) (symmetric)
  • Cov(aX+b,Y)=aCov(X,Y)\text{Cov}(aX + b, Y) = a \cdot \text{Cov}(X, Y)
  • If X and Y are independent: Cov(X,Y)=0\text{Cov}(X, Y) = 0 (but converse not always true)

Variance of sum:

Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y)

General sum:

Var(i=1nXi)=i=1nVar(Xi)+2i<jCov(Xi,Xj)\text{Var}\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n \text{Var}(X_i) + 2\sum_{i<j} \text{Cov}(X_i, X_j)

Correlation

ρ(X,Y)=Cov(X,Y)Var(X)Var(Y)=Cov(X,Y)σXσY\rho(X, Y) = \frac{\text{Cov}(X, Y)}{\sqrt{\text{Var}(X) \cdot \text{Var}(Y)}} = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}

Properties:

  • 1ρ(X,Y)1-1 \leq \rho(X, Y) \leq 1 (Cauchy-Schwarz inequality)
  • ρ=1|\rho| = 1 if and only if Y=aX+bY = aX + b for some constants a,ba, b (perfect linear relationship)
  • ρ=0\rho = 0: uncorrelated (but not necessarily independent)

Note:

  • Independence \Rightarrow uncorrelated
  • Uncorrelated ⇏\not\Rightarrow independent (e.g., Y=X2Y = X^2 where XN(0,1)X \sim \mathcal{N}(0,1))

Multivariate Moments

Covariance Matrix

For random vector X=(X1,X2,...,Xn)T\mathbf{X} = (X_1, X_2, ..., X_n)^T:

Σ=Cov(X)=E[(Xμ)(Xμ)T]\Sigma = \text{Cov}(\mathbf{X}) = E[(\mathbf{X} - \mathbf{\mu})(\mathbf{X} - \mathbf{\mu})^T]

where μ=E[X]\mathbf{\mu} = E[\mathbf{X}].

Matrix elements:

Σij=Cov(Xi,Xj)\Sigma_{ij} = \text{Cov}(X_i, X_j)

Properties:

  • Symmetric: ΣT=Σ\Sigma^T = \Sigma
  • Positive semi-definite: vTΣv0\mathbf{v}^T \Sigma \mathbf{v} \geq 0 for all v\mathbf{v}
  • Diagonal elements are variances: Σii=Var(Xi)\Sigma_{ii} = \text{Var}(X_i)

Correlation Matrix

Rij=ρ(Xi,Xj)=Cov(Xi,Xj)σiσj\mathbf{R}_{ij} = \rho(X_i, X_j) = \frac{\text{Cov}(X_i, X_j)}{\sigma_i \sigma_j}

Relationship to covariance matrix: If D\mathbf{D} is diagonal with Dii=σiD_{ii} = \sigma_i:

R=D1ΣD1\mathbf{R} = \mathbf{D}^{-1} \Sigma \mathbf{D}^{-1}

Multivariate MGF

MX(t)=E[etTX]=E[et1X1+t2X2++tnXn]M_{\mathbf{X}}(\mathbf{t}) = E[e^{\mathbf{t}^T \mathbf{X}}] = E[e^{t_1 X_1 + t_2 X_2 + \cdots + t_n X_n}]

For independent components: MX(t)=i=1nMXi(ti)M_{\mathbf{X}}(\mathbf{t}) = \prod_{i=1}^n M_{X_i}(t_i)

Multivariate Characteristic Function

ϕX(t)=E[eitTX]\phi_{\mathbf{X}}(\mathbf{t}) = E[e^{i\mathbf{t}^T \mathbf{X}}]

Transformations of Random Variables

Univariate Transformations

Discrete case: If Y=g(X)Y = g(X):

pY(y)=x:g(x)=ypX(x)p_Y(y) = \sum_{x: g(x) = y} p_X(x)

Continuous case (monotonic gg): If Y=g(X)Y = g(X) where gg is monotonic and differentiable:

fY(y)=fX(g1(y))ddyg1(y)=fX(x)dxdyf_Y(y) = f_X(g^{-1}(y)) \left| \frac{d}{dy} g^{-1}(y) \right| = f_X(x) \left| \frac{dx}{dy} \right|

Non-monotonic case: Sum over all inverse images.

Multivariate Transformations (Jacobian)

For transformation (U,V)=g(X,Y)(U, V) = g(X, Y):

fU,V(u,v)=fX,Y(x,y)Jf_{U,V}(u, v) = f_{X,Y}(x, y) |J|

where J|J| is the absolute value of the Jacobian determinant:

J=det(xuxvyuyv)J = \det \begin{pmatrix} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{pmatrix}

Steps:

  1. Express (x,y)(x, y) in terms of (u,v)(u, v)
  2. Compute the Jacobian determinant
  3. Substitute into the formula

Common Transformations

Linear transformation: If Y=aX+bY = aX + b:

  • E[Y]=aE[X]+bE[Y] = aE[X] + b
  • Var(Y)=a2Var(X)\text{Var}(Y) = a^2 \text{Var}(X)

Sum of independent RVs: If Z=X+YZ = X + Y and X, Y independent:

fZ(z)=fX(x)fY(zx)dx(convolution)f_Z(z) = \int_{-\infty}^{\infty} f_X(x) f_Y(z - x) \, dx \quad \text{(convolution)}

Product: If Z=XYZ = XY and X, Y independent:

fZ(z)=1xfX(x)fY(z/x)dxf_Z(z) = \int_{-\infty}^{\infty} \frac{1}{|x|} f_X(x) f_Y(z/x) \, dx

Multivariate Normal Distribution

Bivariate Normal

For (X,Y)N(μX,μY,σX2,σY2,ρ)(X, Y) \sim \mathcal{N}(\mu_X, \mu_Y, \sigma_X^2, \sigma_Y^2, \rho):

fX,Y(x,y)=12πσXσY1ρ2exp(12(1ρ2)[(xμX)2σX22ρ(xμX)(yμY)σXσY+(yμY)2σY2])f_{X,Y}(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}} \exp\left(-\frac{1}{2(1-\rho^2)}\left[\frac{(x-\mu_X)^2}{\sigma_X^2} - \frac{2\rho(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} + \frac{(y-\mu_Y)^2}{\sigma_Y^2}\right]\right)

Properties:

  • Marginals are normal: XN(μX,σX2)X \sim \mathcal{N}(\mu_X, \sigma_X^2), YN(μY,σY2)Y \sim \mathcal{N}(\mu_Y, \sigma_Y^2)
  • If ρ=0\rho = 0, then X and Y are independent (special to normal!)
  • Linear combinations are normal

Multivariate Normal (General)

For XN(μ,Σ)\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \Sigma):

fX(x)=1(2π)n/2Σ1/2exp(12(xμ)TΣ1(xμ))f_{\mathbf{X}}(\mathbf{x}) = \frac{1}{(2\pi)^{n/2}|\Sigma|^{1/2}} \exp\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^T \Sigma^{-1} (\mathbf{x} - \boldsymbol{\mu})\right)

MGF:

MX(t)=exp(μTt+12tTΣt)M_{\mathbf{X}}(\mathbf{t}) = \exp\left(\boldsymbol{\mu}^T \mathbf{t} + \frac{1}{2}\mathbf{t}^T \Sigma \mathbf{t}\right)

Properties:

  • Linear transformations are normal: If Y=AX+b\mathbf{Y} = A\mathbf{X} + \mathbf{b}, then YN(Aμ+b,AΣAT)\mathbf{Y} \sim \mathcal{N}(A\boldsymbol{\mu} + \mathbf{b}, A\Sigma A^T)
  • Marginals are normal
  • Conditionals are normal
  • Uncorrelated components are independent

Advanced Topics

Order Statistics

For i.i.d. random variables X1,...,XnX_1, ..., X_n, let X(1)X(2)X(n)X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)} be the order statistics.

PDF of kk-th order statistic:

fX(k)(x)=n!(k1)!(nk)![F(x)]k1[1F(x)]nkf(x)f_{X_{(k)}}(x) = \frac{n!}{(k-1)!(n-k)!} [F(x)]^{k-1} [1-F(x)]^{n-k} f(x)

Joint PDF of (X(i),X(j))(X_{(i)}, X_{(j)}) for i<ji < j:

fX(i),X(j)(x,y)=n!(i1)!(ji1)!(nj)![F(x)]i1[F(y)F(x)]ji1[1F(y)]njf(x)f(y)f_{X_{(i)},X_{(j)}}(x,y) = \frac{n!}{(i-1)!(j-i-1)!(n-j)!} [F(x)]^{i-1} [F(y)-F(x)]^{j-i-1} [1-F(y)]^{n-j} f(x)f(y)

Minimum and Maximum:

  • FX(1)(x)=1[1F(x)]nF_{X_{(1)}}(x) = 1 - [1-F(x)]^n
  • FX(n)(x)=[F(x)]nF_{X_{(n)}}(x) = [F(x)]^n

Conditional Expectation (Advanced)

Conditional expectation E[XY]E[X|Y] is a random variable (function of Y):

E[XY]=g(Y)E[X|Y] = g(Y)

where g(y)=E[XY=y]g(y) = E[X|Y=y].

Tower property:

E[E[XY]]=E[X]E[E[X|Y]] = E[X]

Taking out what's known:

E[g(Y)XY]=g(Y)E[XY]E[g(Y)X|Y] = g(Y)E[X|Y]

Law of total variance:

Var(X)=E[Var(XY)]+Var(E[XY])\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y])

Copulas

A copula is a function that links univariate marginal distributions to form a multivariate distribution.

Sklar's theorem: For any joint distribution FX,Y(x,y)F_{X,Y}(x,y) with marginals FX(x)F_X(x) and FY(y)F_Y(y), there exists a copula CC such that:

FX,Y(x,y)=C(FX(x),FY(y))F_{X,Y}(x,y) = C(F_X(x), F_Y(y))

Use: Separate modeling of marginals and dependence structure.

Important Tricks and Techniques

Trick 1: Variance via E[X2](E[X])2E[X^2] - (E[X])^2

Always easier than computing E[(Xμ)2]E[(X - \mu)^2] directly.

Trick 2: MGF for Sums

Use MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) \cdot M_Y(t) for independent X, Y.

Example: Sum of independent normals is normal (check MGF).

Trick 3: Indicator Functions for Discrete Distributions

E[X]=k=1P(Xk)for X{1,2,3,...}E[X] = \sum_{k=1}^{\infty} P(X \geq k) \quad \text{for } X \in \{1, 2, 3, ...\}

Trick 4: Symmetry in Joint Distributions

If (X,Y)(X, Y) is symmetric (e.g., i.i.d.), then E[X]=E[Y]E[X] = E[Y] and Var(X)=Var(Y)\text{Var}(X) = \text{Var}(Y).

Trick 5: Transformation via CDF

For Y=g(X)Y = g(X), find FY(y)=P(Yy)=P(g(X)y)F_Y(y) = P(Y \leq y) = P(g(X) \leq y), then differentiate.

Trick 6: Memoryless Property

Exponential (continuous) and geometric (discrete) are the only memoryless distributions.

Trick 7: Normal Approximation via MGF/CLT

If MXn(t)eat+bt2/2M_{X_n}(t) \to e^{at + bt^2/2}, then XnN(a,b)X_n \to \mathcal{N}(a, b) in distribution.

Trick 8: Covariance Matrix for Linear Combinations

If Y=AX\mathbf{Y} = A\mathbf{X}, then Cov(Y)=ACov(X)AT\text{Cov}(\mathbf{Y}) = A \cdot \text{Cov}(\mathbf{X}) \cdot A^T.

Trick 9: Independence Checking

Check if joint PDF/PMF factorizes: fX,Y(x,y)=g(x)h(y)f_{X,Y}(x,y) = g(x) \cdot h(y).

Trick 10: Cumulant Additivity

For sums of independent RVs, cumulants add: κn(X+Y)=κn(X)+κn(Y)\kappa_n(X + Y) = \kappa_n(X) + \kappa_n(Y).

Interview Problem Types

Type 1: Computing Moments from Distribution

GivenFindApproach
PMF or PDF of XE[X]E[X], Var(X)\text{Var}(X), or higher momentsUse E[Xn]=xnp(x)E[X^n] = \sum x^n p(x) or xnf(x)dx\int x^n f(x) dx, then Var(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2

Type 2: MGF Applications

GivenFindApproach
MGF MX(t)M_X(t) or need to find momentsMoments or identify distributionTake derivatives: E[Xn]=MX(n)(0)E[X^n] = M_X^{(n)}(0), or match known MGF forms

Type 3: Transformations of Random Variables

GivenFindApproach
Distribution of X, transformation Y=g(X)Y = g(X)Distribution of YUse CDF method or Jacobian; for monotonic gg: fY(y)=fX(x)dx/dyf_Y(y) = f_X(x) \|dx/dy\|

Type 4: Joint Distribution Problems

GivenFindApproach
Joint PDF/PMF of (X,Y)(X,Y)Marginals, conditionals, or P((X,Y)A)P((X,Y) \in A)Integrate/sum out variables for marginals; use fXY(xy)=fX,Y(x,y)/fY(y)f_{X\|Y}(x\|y) = f_{X,Y}(x,y)/f_Y(y)

Type 5: Covariance and Correlation

GivenFindApproach
Joint distribution or momentsCov(X,Y)\text{Cov}(X,Y) or ρ(X,Y)\rho(X,Y)Use Cov(X,Y)=E[XY]E[X]E[Y]\text{Cov}(X,Y) = E[XY] - E[X]E[Y], then ρ=Cov/(σXσY)\rho = \text{Cov}/(\sigma_X \sigma_Y)

Type 6: Independence Testing

GivenFindApproach
Joint distributionWhether X and Y are independentCheck if fX,Y(x,y)=fX(x)fY(y)f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y) or if Cov(X,Y)=0\text{Cov}(X,Y) = 0 for normal

Type 7: Multivariate Normal Problems

GivenFindApproach
Mean vector μ\boldsymbol{\mu} and covariance matrix Σ\SigmaMarginals, conditionals, or linear transformationsUse normal properties; marginals/conditionals are normal; AX+bA\mathbf{X} + \mathbf{b} is normal

Type 8: Order Statistics

GivenFindApproach
Distribution of i.i.d. X1,...,XnX_1, ..., X_nDistribution of X(k)X_{(k)} (min, max, median)Use order statistic formulas with F(x)F(x) and f(x)f(x)

Type 9: Cumulant Calculations

GivenFindApproach
MGF or distributionCumulants κn\kappa_nUse KX(t)=lnMX(t)K_X(t) = \ln M_X(t), then κn=KX(n)(0)\kappa_n = K_X^{(n)}(0)

Type 10: Convolution Problems

GivenFindApproach
Independent X and Y, need distribution of Z=X+YZ = X + YPDF/PMF of ZUse convolution: fZ(z)=fX(x)fY(zx)dxf_Z(z) = \int f_X(x)f_Y(z-x)dx or MGF: MZ(t)=MX(t)MY(t)M_Z(t) = M_X(t)M_Y(t)

Common Pitfalls

Pitfall 1: Confusing E[XY]E[XY] with E[X]E[Y]E[X]E[Y]

Wrong: Assuming E[XY]=E[X]E[Y]E[XY] = E[X]E[Y] when X and Y are dependent

Check: This only holds when X and Y are independent

Pitfall 2: Forgetting Jacobian in Transformations

Wrong: Using fY(y)=fX(g1(y))f_Y(y) = f_X(g^{-1}(y)) without the derivative term

Correct: fY(y)=fX(x)dx/dyf_Y(y) = f_X(x) |dx/dy| where x=g1(y)x = g^{-1}(y)

Pitfall 3: Uncorrelated ≠ Independent

Wrong: Assuming Cov(X,Y)=0\text{Cov}(X,Y) = 0 implies independence

Correct: Independence implies uncorrelated, but not vice versa (except for normal)

Pitfall 4: PDF Values > 1

Wrong: Thinking PDF must be ≤ 1

Correct: PDF can exceed 1 (but integrates to 1); it's P(X=x)=0P(X = x) = 0 that holds

Pitfall 5: Wrong Variance Formula for Sums

Wrong: Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) when X and Y are dependent

Correct: Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X,Y)

Pitfall 6: Marginal ≠ Conditional

Wrong: Confusing fX(x)f_X(x) with fXY(xy)f_{X|Y}(x|y)

Check: Marginal integrates over all Y; conditional fixes Y

Pitfall 7: MGF Doesn't Always Exist

Wrong: Assuming every distribution has an MGF

Example: Cauchy distribution has no MGF (but has characteristic function)

Quick Reference: Key Formulas

Variance:

Var(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2

Covariance:

Cov(X,Y)=E[XY]E[X]E[Y]\text{Cov}(X,Y) = E[XY] - E[X]E[Y]

Correlation:

ρ(X,Y)=Cov(X,Y)σXσY\rho(X,Y) = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}

MGF:

MX(t)=E[etX],E[Xn]=MX(n)(0)M_X(t) = E[e^{tX}], \quad E[X^n] = M_X^{(n)}(0)

Cumulant generating function:

KX(t)=lnMX(t),κn=KX(n)(0)K_X(t) = \ln M_X(t), \quad \kappa_n = K_X^{(n)}(0)

Transformation (monotonic):

fY(y)=fX(g1(y))dxdyf_Y(y) = f_X(g^{-1}(y)) \left|\frac{dx}{dy}\right|

Convolution:

fX+Y(z)=fX(x)fY(zx)dxf_{X+Y}(z) = \int_{-\infty}^{\infty} f_X(x) f_Y(z-x) \, dx

Multivariate normal MGF:

MX(t)=exp(μTt+12tTΣt)M_{\mathbf{X}}(\mathbf{t}) = \exp\left(\boldsymbol{\mu}^T\mathbf{t} + \frac{1}{2}\mathbf{t}^T\Sigma\mathbf{t}\right)

Law of total variance:

Var(X)=E[Var(XY)]+Var(E[XY])\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y])

Practice Problem Categories

  • Variance calculations
  • MGF identification and manipulation
  • Characteristic functions
  • Transformation of variables
  • Jacobian determinants
  • Order statistics
  • Bivariate normal
  • Covariance matrices
  • Linear transformations
  • Convolutions
  • Skewness and kurtosis
  • Cumulant calculations
  • Conditional distributions
  • Sum of independent RVs
  • Mixture distributions
  • Copulas
  • Moment inequalities (Markov, Chebyshev)
  • Probability integral transform

On this page

Fundamental ConceptsRandom VariablesProbability Mass Function (PMF)Probability Density Function (PDF)Cumulative Distribution Function (CDF)MomentsRaw MomentsCentral MomentsStandardized MomentsMoment RelationshipsMoment Generating Functions (MGF)DefinitionProperties of MGFCommon MGFsCharacteristic FunctionsDefinitionProperties of Characteristic FunctionsWhy Characteristic Functions Are UsefulCumulantsDefinitionCumulants vs. MomentsProperties of CumulantsWhy Cumulants Are UsefulMultivariate Random VariablesJoint DistributionsMarginal DistributionsConditional DistributionsIndependenceCovariance and CorrelationCovarianceCorrelationMultivariate MomentsCovariance MatrixCorrelation MatrixMultivariate MGFMultivariate Characteristic FunctionTransformations of Random VariablesUnivariate TransformationsMultivariate Transformations (Jacobian)Common TransformationsMultivariate Normal DistributionBivariate NormalMultivariate Normal (General)Advanced TopicsOrder StatisticsConditional Expectation (Advanced)CopulasImportant Tricks and TechniquesTrick 1: Variance via E[X2](E[X])2E[X^2] - (E[X])^2Trick 2: MGF for SumsTrick 3: Indicator Functions for Discrete DistributionsTrick 4: Symmetry in Joint DistributionsTrick 5: Transformation via CDFTrick 6: Memoryless PropertyTrick 7: Normal Approximation via MGF/CLTTrick 8: Covariance Matrix for Linear CombinationsTrick 9: Independence CheckingTrick 10: Cumulant AdditivityInterview Problem TypesType 1: Computing Moments from DistributionType 2: MGF ApplicationsType 3: Transformations of Random VariablesType 4: Joint Distribution ProblemsType 5: Covariance and CorrelationType 6: Independence TestingType 7: Multivariate Normal ProblemsType 8: Order StatisticsType 9: Cumulant CalculationsType 10: Convolution ProblemsCommon PitfallsPitfall 1: Confusing E[XY]E[XY] with E[X]E[Y]E[X]E[Y]Pitfall 2: Forgetting Jacobian in TransformationsPitfall 3: Uncorrelated ≠ IndependentPitfall 4: PDF Values > 1Pitfall 5: Wrong Variance Formula for SumsPitfall 6: Marginal ≠ ConditionalPitfall 7: MGF Doesn't Always ExistQuick Reference: Key FormulasPractice Problem Categories