Preptide

Conditional Probability

Interview prep for conditional probability, joint and marginal distributions, Bayes theorem, and independence

Fundamental Concepts

Definition of Conditional Probability

The probability of event A given that event B has occurred:

P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}

where P(B)>0P(B) > 0

Intuition: We're restricting our sample space to only cases where B occurs, then finding what fraction of those cases also have A.

The Multiplication Rule

Rearranging the definition gives us:

P(AB)=P(AB)P(B)=P(BA)P(A)P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)

This is crucial for building probability trees and sequential events.

Joint, Marginal, and Conditional Distributions

Joint Distributions

The joint distribution P(X,Y)P(X, Y) or P(X=x,Y=y)P(X = x, Y = y) gives the probability of both events occurring together.

Discrete case: Joint PMF pX,Y(x,y)=P(X=x,Y=y)p_{X,Y}(x,y) = P(X = x, Y = y)

Continuous case: Joint PDF fX,Y(x,y)f_{X,Y}(x,y) where P((X,Y)A)=AfX,Y(x,y)dxdyP((X,Y) \in A) = \iint_A f_{X,Y}(x,y) \, dx \, dy

Key property: Must sum/integrate to 1: xypX,Y(x,y)=1orfX,Y(x,y)dxdy=1\sum_x \sum_y p_{X,Y}(x,y) = 1 \quad \text{or} \quad \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dx \, dy = 1

Marginal Distributions

The marginal distribution is the distribution of one variable regardless of the other(s).

Discrete case (marginal PMF): P(X=x)=yP(X=x,Y=y)P(X = x) = \sum_y P(X = x, Y = y) P(Y=y)=xP(X=x,Y=y)P(Y = y) = \sum_x P(X = x, Y = y)

Continuous case (marginal PDF): fX(x)=fX,Y(x,y)dyf_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dy fY(y)=fX,Y(x,y)dxf_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \, dx

Intuition: Sum/integrate out the variable(s) you don't care about. This is called "marginalization."

Memory trick: Think of a contingency table—marginals are the row/column totals.

Conditional Distributions

The conditional distribution gives the distribution of one variable given a specific value of another.

Discrete case: P(X=xY=y)=P(X=x,Y=y)P(Y=y)P(X = x | Y = y) = \frac{P(X = x, Y = y)}{P(Y = y)}

Continuous case: fXY(xy)=fX,Y(x,y)fY(y)f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}

Key insight: The conditional distribution is itself a valid probability distribution (sums/integrates to 1).

Essential Identities and Formulas

The Fundamental Relationship

Key Formula:

Joint=Conditional×Marginal\text{Joint} = \text{Conditional} \times \text{Marginal}

P(X,Y)=P(XY)P(Y)=P(YX)P(X)P(X, Y) = P(X|Y) \cdot P(Y) = P(Y|X) \cdot P(X)

fX,Y(x,y)=fXY(xy)fY(y)=fYX(yx)fX(x)f_{X,Y}(x,y) = f_{X|Y}(x|y) \cdot f_Y(y) = f_{Y|X}(y|x) \cdot f_X(x)

This is the most important formula—it connects all three types of distributions.

Bayes' Theorem (Distribution Form)

P(XY)=P(YX)P(X)P(Y)P(X|Y) = \frac{P(Y|X) \cdot P(X)}{P(Y)}

For continuous variables: fXY(xy)=fYX(yx)fX(x)fY(y)f_{X|Y}(x|y) = \frac{f_{Y|X}(y|x) \cdot f_X(x)}{f_Y(y)}

Extended form with law of total probability: fXY(xy)=fYX(yx)fX(x)fYX(yx)fX(x)dxf_{X|Y}(x|y) = \frac{f_{Y|X}(y|x) \cdot f_X(x)}{\int_{-\infty}^{\infty} f_{Y|X}(y|x') \cdot f_X(x') \, dx'}

Law of Total Probability (Continuous)

fY(y)=fYX(yx)fX(x)dxf_Y(y) = \int_{-\infty}^{\infty} f_{Y|X}(y|x) \cdot f_X(x) \, dx

Discrete version: P(Y=y)=xP(Y=yX=x)P(X=x)P(Y = y) = \sum_x P(Y = y | X = x) \cdot P(X = x)

When to use: Computing marginals from conditionals, or finding normalizing constants.

Independence

X and Y are independent if and only if:

P(X,Y)=P(X)P(Y)for all x,yP(X, Y) = P(X) \cdot P(Y) \quad \text{for all } x, y

Equivalently:

  • P(XY)=P(X)P(X|Y) = P(X) (conditioning doesn't matter)
  • fX,Y(x,y)=fX(x)fY(y)f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y)
  • The joint "factorizes" into the product of marginals

Marginal Distribution Tricks and Properties

Trick 1: Sum/Integrate Out Variables

Problem: Given joint distribution, find marginal.

Solution: Sum (discrete) or integrate (continuous) over all values of the other variable(s).

Example: For joint PMF table, marginal is the row/column sum.

Trick 2: Reverse via Conditional × Marginal

Problem: Given conditionals and one marginal, find joint or other marginal.

Solution: Use P(X,Y)=P(XY)P(Y)P(X,Y) = P(X|Y) \cdot P(Y) then marginalize.

Example:

  • Given: P(YX)P(Y|X) and P(X)P(X)
  • Find: P(Y)P(Y)
  • Method: P(Y)=xP(YX=x)P(X=x)P(Y) = \sum_x P(Y|X=x) \cdot P(X=x) (law of total probability)

Trick 3: Marginal of a Function

If Z=g(X,Y)Z = g(X, Y), finding the marginal distribution of Z:

P(Z=z)=P(g(X,Y)=z)=(x,y):g(x,y)=zP(X=x,Y=y)P(Z = z) = P(g(X,Y) = z) = \sum_{(x,y): g(x,y)=z} P(X=x, Y=y)

Continuous: Use transformation techniques or integration over level sets.

Trick 4: Symmetry in Marginals

If the joint distribution is symmetric in X and Y, then: fX(x)=fY(x)f_X(x) = f_Y(x)

Example: If (X,Y)(X,Y) is uniformly distributed on a circle, both marginals are identical.

Trick 5: Marginal Expectations

E[X]=xfX(x)dx=xfX,Y(x,y)dydxE[X] = \int_{-\infty}^{\infty} x \cdot f_X(x) \, dx = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x \cdot f_{X,Y}(x,y) \, dy \, dx

Shortcut: You can compute E[X]E[X] from the joint distribution without explicitly finding the marginal first.

E[X]=E[E[XY]]E[X] = E[E[X|Y]] (law of iterated expectations)

Trick 6: Indicator Functions for Marginals

For complex regions in the joint distribution:

fX(x)=fX,Y(x,y)1{(x,y)Region}dyf_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) \cdot \mathbb{1}_{\{(x,y) \in \text{Region}\}} \, dy

Use case: When the joint has support on a restricted region (e.g., x+y<1x + y < 1).

Conditional Distribution Tricks and Properties

Trick 7: Conditional Independence

X and Y are conditionally independent given Z if:

P(X,YZ)=P(XZ)P(YZ)P(X, Y | Z) = P(X|Z) \cdot P(Y|Z)

Equivalently: P(XY,Z)=P(XZ)P(X|Y,Z) = P(X|Z)

Intuition: Given Z, knowing Y doesn't give additional information about X.

Common in: Bayesian networks, hierarchical models, Markov chains.

Trick 8: Conditional Expectations

E[XY=y]=xfXY(xy)dxE[X|Y=y] = \int_{-\infty}^{\infty} x \cdot f_{X|Y}(x|y) \, dx

Properties:

  • E[X]=E[E[XY]]E[X] = E[E[X|Y]] (law of iterated expectations / tower property)
  • Var(X)=E[Var(XY)]+Var(E[XY])\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y]) (law of total variance)

Interview gold: These properties solve many complex expectation problems.

Trick 9: Conditioning Reduces Variance

Var(XY)Var(X)\text{Var}(X|Y) \leq \text{Var}(X)

Intuition: Additional information (conditioning) reduces uncertainty.

Trick 10: Bayes' Update Pattern

In Bayesian statistics:

  • Prior: P(X)P(X) or fX(x)f_X(x)
  • Likelihood: P(YX)P(Y|X) or fYX(yx)f_{Y|X}(y|x)
  • Posterior: P(XY)P(X|Y) or fXY(xy)f_{X|Y}(x|y)

PosteriorLikelihood×Prior\text{Posterior} \propto \text{Likelihood} \times \text{Prior}

Trick: Often you can ignore the normalizing constant (marginal) until the end:

fXY(xy)=fYX(yx)fX(x)constant w.r.t. xf_{X|Y}(x|y) = \frac{f_{Y|X}(y|x) \cdot f_X(x)}{\text{constant w.r.t. } x}

Find the constant by ensuring the posterior integrates to 1.

Common Joint Distribution Patterns

Pattern 1: Uniform on a Region

If (X,Y)(X,Y) is uniform on region RR with area AA:

fX,Y(x,y)=1A1(x,y)Rf_{X,Y}(x,y) = \frac{1}{A} \cdot \mathbb{1}_{(x,y) \in R}

Marginal strategy: fX(x)=y:(x,y)R1Ady=length of slice at xAf_X(x) = \int_{y: (x,y) \in R} \frac{1}{A} \, dy = \frac{\text{length of slice at } x}{A}

Pattern 2: Product Form (Independence)

fX,Y(x,y)=g(x)h(y)f_{X,Y}(x,y) = g(x) \cdot h(y)

Immediate insight: X and Y are independent!

Marginals: fX(x)=g(x)h(y)dy=Cg(x)f_X(x) = g(x) \cdot \int h(y) \, dy = C \cdot g(x)

Normalize if needed.

Pattern 3: Conditional Structure

fX,Y(x,y)=fY(y)fXY(xy)f_{X,Y}(x,y) = f_Y(y) \cdot f_{X|Y}(x|y)

Use when: One variable clearly "comes first" (hierarchical structure).

Example: YExponential(λ)Y \sim \text{Exponential}(\lambda), then XYNormal(Y,1)X|Y \sim \text{Normal}(Y, 1)

Pattern 4: Mixture Distributions

fX(x)=i=1kπifi(x)f_X(x) = \sum_{i=1}^{k} \pi_i \cdot f_i(x)

where πi\pi_i are mixing weights with πi=1\sum \pi_i = 1.

Interpretation: X comes from one of k distributions with probability πi\pi_i.

Connection to marginals: This IS the law of total probability in distribution form.

Quick Reference: Essential Equalities

Converting Between Distributions

  1. Joint → Marginal:

    fX(x)=fX,Y(x,y)dyf_X(x) = \int f_{X,Y}(x,y) \, dy

  2. Joint → Conditional:

    fXY(xy)=fX,Y(x,y)fY(y)f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}

  3. Conditional + Marginal → Joint:

    fX,Y(x,y)=fXY(xy)fY(y)f_{X,Y}(x,y) = f_{X|Y}(x|y) \cdot f_Y(y)

  4. Conditional + Marginal → Other Marginal:

    fY(y)=fYX(yx)fX(x)dxf_Y(y) = \int f_{Y|X}(y|x) \cdot f_X(x) \, dx

  5. Reverse Conditional (Bayes):

    fYX(yx)=fXY(xy)fY(y)fX(x)f_{Y|X}(y|x) = \frac{f_{X|Y}(x|y) \cdot f_Y(y)}{f_X(x)}

Expectation Identities

  1. Marginal expectation from joint:

    E[X]=xfX,Y(x,y)dxdyE[X] = \iint x \cdot f_{X,Y}(x,y) \, dx \, dy

  2. Law of iterated expectations:

    E[X]=E[E[XY]]E[X] = E[E[X|Y]]

  3. Conditional expectation as function:

    E[XY] is a function of Y:g(Y)E[X|Y] \text{ is a function of } Y: g(Y)

  4. Law of total variance:

    Var(X)=E[Var(XY)]+Var(E[XY])\text{Var}(X) = E[\text{Var}(X|Y)] + \text{Var}(E[X|Y])

  5. Conditional variance:

    Var(XY=y)=E[X2Y=y](E[XY=y])2\text{Var}(X|Y=y) = E[X^2|Y=y] - (E[X|Y=y])^2

Problem-Solving Strategies

Strategy 1: Draw the Dependency Graph

Identify which variables depend on which:

  • Arrows show conditional dependence
  • Helps identify which conditionals you know
  • Guides application of chain rule

Strategy 2: Fill in a Contingency Table

For discrete distributions with few values:

  1. Create table with rows/columns for each variable's values
  2. Fill in what you know (joint, marginal, or conditional)
  3. Use row/column sums for marginals
  4. Use division for conditionals

Strategy 3: Identify the Question Type

  • Have joint, want marginal: Integrate/sum out
  • Have conditionals + one marginal, want joint: Multiply
  • Have conditionals + one marginal, want other marginal: Law of total probability
  • Want to reverse conditioning: Bayes' theorem

Strategy 4: Look for Factorization

Can you write fX,Y(x,y)=g(x)h(y)f_{X,Y}(x,y) = g(x) \cdot h(y)?

  • If yes: Independent! Marginals are easy.
  • If no: Need full marginalization.

Strategy 5: Use Symmetry

  • Is the joint symmetric in the variables?
  • Are there symmetries in the support region?
  • Can you swap variables to make the problem easier?

Interview Problem Types

Type 1: Contingency Table Problems

GivenFindApproach
Joint distribution in table formMarginals, conditionals, independence checkRow/column sums for marginals, division for conditionals

Type 2: Uniform on Region

GivenFindApproach
(X,Y)(X,Y) uniform on geometric regionMarginal distribution of X or Y"Slice" the region, find length/area of slices

Type 3: Hierarchical Models

GivenFindApproach
XfX(x)X \sim f_X(x), then YXfYX(yx)Y\|X \sim f_{Y\|X}(y\|x)Marginal fY(y)f_Y(y) or joint fX,Y(x,y)f_{X,Y}(x,y)Use fX,Y(x,y)=fYX(yx)fX(x)f_{X,Y}(x,y) = f_{Y\|X}(y\|x) \cdot f_X(x), then marginalize if needed

Type 4: Order Statistics

GivenFindApproach
X1,...,XnX_1, ..., X_n i.i.d., let X(1)...X(n)X_{(1)} \leq ... \leq X_{(n)} be order statisticsDistribution of X(k)X_{(k)} or joint of (X(i),X(j))(X_{(i)}, X_{(j)})Use joint density of order statistics, marginalize carefully

Type 5: Transformation of Variables

GivenFindApproach
Joint distribution of (X,Y)(X,Y), transformation (U,V)=g(X,Y)(U,V) = g(X,Y)Joint distribution of (U,V)(U,V) or marginal of UUJacobian transformation, then marginalize if needed

Type 6: Bayesian Inference

GivenFindApproach
Prior fX(x)f_X(x), likelihood fYX(yx)f_{Y\|X}(y\|x), observed data Y=y0Y=y_0Posterior fXY(xy0)f_{X\|Y}(x\|y_0)Bayes' theorem, often can ignore normalizing constant

Common Pitfalls

Pitfall 1: Confusing Joint and Conditional

Wrong: Using P(X,Y)P(X,Y) when you mean P(XY)P(X|Y)

Check: Does the formula involve division by a marginal?

Pitfall 2: Forgetting to Marginalize

Wrong: Treating a joint distribution as a marginal

Check: Do you have extra variables you need to integrate/sum out?

Pitfall 3: Independence Assumption

Wrong: Assuming fX,Y(x,y)=fX(x)fY(y)f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y) without checking

Check: Does the joint factorize? Does P(XY)=P(X)P(X|Y) = P(X)?

Pitfall 4: Wrong Order in Conditional

Wrong: Computing P(XY)P(X|Y) when you need P(YX)P(Y|X)

Fix: Use Bayes' theorem to reverse

Pitfall 5: Ignoring Support/Domain

Wrong: Integrating over wrong limits

Check: What values can (x,y)(x,y) actually take? Are there constraints?

Additional Helpful Tricks

The Complement Trick

Often easier to calculate the probability of "not happening":

P(AB)=1P(AcB)P(A|B) = 1 - P(A^c|B)

Example: "At least one success" problems are easier as 1P(all failures)1 - P(\text{all failures})

Symmetry Arguments

Look for symmetry in problems to simplify calculations.

Example: In a random shuffle, any specific position is equally likely to contain any specific card.

Conditioning on the First Step

For sequential processes, condition on what happens first:

P(A)=P(Afirst step1)P(first step1)+P(Afirst step2)P(first step2)P(A) = P(A|\text{first step}_1) \cdot P(\text{first step}_1) + P(A|\text{first step}_2) \cdot P(\text{first step}_2)

The "False Positive" Framework

For diagnostic/testing problems:

  • Sensitivity: P(positive testdisease)P(\text{positive test}|\text{disease})
  • Specificity: P(negative testno disease)P(\text{negative test}|\text{no disease})
  • Goal: Usually find P(diseasepositive test)P(\text{disease}|\text{positive test}) using Bayes' Theorem

Probability Trees

Draw a tree for sequential events:

  • Each branch represents a conditional probability
  • Multiply along branches for joint probabilities
  • Add across branches for total probabilities

Common Mistakes to Avoid

The Prosecutor's Fallacy

Wrong: Confusing P(AB)P(A|B) with P(BA)P(B|A)

Example: P(matchinnocent)=0.001P(\text{match}|\text{innocent}) = 0.001 does NOT mean P(innocentmatch)=0.001P(\text{innocent}|\text{match}) = 0.001

Assuming Independence

Wrong: Assuming events are independent when they're not

Check: Does knowing B occurred change the probability of A? If yes, they're dependent.

Ignoring Base Rates

Wrong: Neglecting P(A)P(A) when using Bayes' Theorem

Example: Rare diseases remain unlikely even with positive tests if the base rate is very low.

Quick Reference: Chain Rule

Chain Rule: P(A1A2...An)=P(A1)P(A2A1)P(A3A1A2)P(A_1 \cap A_2 \cap ... \cap A_n) = P(A_1) \cdot P(A_2|A_1) \cdot P(A_3|A_1 \cap A_2) \cdots

Partition Formula: For any events A and B: P(A)=P(AB)P(B)+P(ABc)P(Bc)P(A) = P(A|B) \cdot P(B) + P(A|B^c) \cdot P(B^c)

Conditional Complement: P(AcB)=1P(AB)P(A^c|B) = 1 - P(A|B)

Practice Problem Categories

  • Balls and urns
  • Disease testing
  • Game shows (Monty Hall)
  • Card/dice problems
  • Random walks
  • Matching problems
  • Uniform on triangular/circular regions
  • Hierarchical/mixture models

On this page

Fundamental ConceptsDefinition of Conditional ProbabilityThe Multiplication RuleJoint, Marginal, and Conditional DistributionsJoint DistributionsMarginal DistributionsConditional DistributionsEssential Identities and FormulasThe Fundamental RelationshipBayes' Theorem (Distribution Form)Law of Total Probability (Continuous)IndependenceMarginal Distribution Tricks and PropertiesTrick 1: Sum/Integrate Out VariablesTrick 2: Reverse via Conditional × MarginalTrick 3: Marginal of a FunctionTrick 4: Symmetry in MarginalsTrick 5: Marginal ExpectationsTrick 6: Indicator Functions for MarginalsConditional Distribution Tricks and PropertiesTrick 7: Conditional IndependenceTrick 8: Conditional ExpectationsTrick 9: Conditioning Reduces VarianceTrick 10: Bayes' Update PatternCommon Joint Distribution PatternsPattern 1: Uniform on a RegionPattern 2: Product Form (Independence)Pattern 3: Conditional StructurePattern 4: Mixture DistributionsQuick Reference: Essential EqualitiesConverting Between DistributionsExpectation IdentitiesProblem-Solving StrategiesStrategy 1: Draw the Dependency GraphStrategy 2: Fill in a Contingency TableStrategy 3: Identify the Question TypeStrategy 4: Look for FactorizationStrategy 5: Use SymmetryInterview Problem TypesType 1: Contingency Table ProblemsType 2: Uniform on RegionType 3: Hierarchical ModelsType 4: Order StatisticsType 5: Transformation of VariablesType 6: Bayesian InferenceCommon PitfallsPitfall 1: Confusing Joint and ConditionalPitfall 2: Forgetting to MarginalizePitfall 3: Independence AssumptionPitfall 4: Wrong Order in ConditionalPitfall 5: Ignoring Support/DomainAdditional Helpful TricksThe Complement TrickSymmetry ArgumentsConditioning on the First StepThe "False Positive" FrameworkProbability TreesCommon Mistakes to AvoidThe Prosecutor's FallacyAssuming IndependenceIgnoring Base RatesQuick Reference: Chain RulePractice Problem Categories