Conditional Probability
Interview prep for conditional probability, joint and marginal distributions, Bayes theorem, and independence
Fundamental Concepts
Definition of Conditional Probability
The probability of event A given that event B has occurred:
where
Intuition: We're restricting our sample space to only cases where B occurs, then finding what fraction of those cases also have A.
The Multiplication Rule
Rearranging the definition gives us:
This is crucial for building probability trees and sequential events.
Joint, Marginal, and Conditional Distributions
Joint Distributions
The joint distribution or gives the probability of both events occurring together.
Discrete case: Joint PMF
Continuous case: Joint PDF where
Key property: Must sum/integrate to 1:
Marginal Distributions
The marginal distribution is the distribution of one variable regardless of the other(s).
Discrete case (marginal PMF):
Continuous case (marginal PDF):
Intuition: Sum/integrate out the variable(s) you don't care about. This is called "marginalization."
Memory trick: Think of a contingency table—marginals are the row/column totals.
Conditional Distributions
The conditional distribution gives the distribution of one variable given a specific value of another.
Discrete case:
Continuous case:
Key insight: The conditional distribution is itself a valid probability distribution (sums/integrates to 1).
Essential Identities and Formulas
The Fundamental Relationship
Key Formula:
This is the most important formula—it connects all three types of distributions.
Bayes' Theorem (Distribution Form)
For continuous variables:
Extended form with law of total probability:
Law of Total Probability (Continuous)
Discrete version:
When to use: Computing marginals from conditionals, or finding normalizing constants.
Independence
X and Y are independent if and only if:
Equivalently:
- (conditioning doesn't matter)
- The joint "factorizes" into the product of marginals
Marginal Distribution Tricks and Properties
Trick 1: Sum/Integrate Out Variables
Problem: Given joint distribution, find marginal.
Solution: Sum (discrete) or integrate (continuous) over all values of the other variable(s).
Example: For joint PMF table, marginal is the row/column sum.
Trick 2: Reverse via Conditional × Marginal
Problem: Given conditionals and one marginal, find joint or other marginal.
Solution: Use then marginalize.
Example:
- Given: and
- Find:
- Method: (law of total probability)
Trick 3: Marginal of a Function
If , finding the marginal distribution of Z:
Continuous: Use transformation techniques or integration over level sets.
Trick 4: Symmetry in Marginals
If the joint distribution is symmetric in X and Y, then:
Example: If is uniformly distributed on a circle, both marginals are identical.
Trick 5: Marginal Expectations
Shortcut: You can compute from the joint distribution without explicitly finding the marginal first.
(law of iterated expectations)
Trick 6: Indicator Functions for Marginals
For complex regions in the joint distribution:
Use case: When the joint has support on a restricted region (e.g., ).
Conditional Distribution Tricks and Properties
Trick 7: Conditional Independence
X and Y are conditionally independent given Z if:
Equivalently:
Intuition: Given Z, knowing Y doesn't give additional information about X.
Common in: Bayesian networks, hierarchical models, Markov chains.
Trick 8: Conditional Expectations
Properties:
- (law of iterated expectations / tower property)
- (law of total variance)
Interview gold: These properties solve many complex expectation problems.
Trick 9: Conditioning Reduces Variance
Intuition: Additional information (conditioning) reduces uncertainty.
Trick 10: Bayes' Update Pattern
In Bayesian statistics:
- Prior: or
- Likelihood: or
- Posterior: or
Trick: Often you can ignore the normalizing constant (marginal) until the end:
Find the constant by ensuring the posterior integrates to 1.
Common Joint Distribution Patterns
Pattern 1: Uniform on a Region
If is uniform on region with area :
Marginal strategy:
Pattern 2: Product Form (Independence)
Immediate insight: X and Y are independent!
Marginals:
Normalize if needed.
Pattern 3: Conditional Structure
Use when: One variable clearly "comes first" (hierarchical structure).
Example: , then
Pattern 4: Mixture Distributions
where are mixing weights with .
Interpretation: X comes from one of k distributions with probability .
Connection to marginals: This IS the law of total probability in distribution form.
Quick Reference: Essential Equalities
Converting Between Distributions
-
Joint → Marginal:
-
Joint → Conditional:
-
Conditional + Marginal → Joint:
-
Conditional + Marginal → Other Marginal:
-
Reverse Conditional (Bayes):
Expectation Identities
-
Marginal expectation from joint:
-
Law of iterated expectations:
-
Conditional expectation as function:
-
Law of total variance:
-
Conditional variance:
Problem-Solving Strategies
Strategy 1: Draw the Dependency Graph
Identify which variables depend on which:
- Arrows show conditional dependence
- Helps identify which conditionals you know
- Guides application of chain rule
Strategy 2: Fill in a Contingency Table
For discrete distributions with few values:
- Create table with rows/columns for each variable's values
- Fill in what you know (joint, marginal, or conditional)
- Use row/column sums for marginals
- Use division for conditionals
Strategy 3: Identify the Question Type
- Have joint, want marginal: Integrate/sum out
- Have conditionals + one marginal, want joint: Multiply
- Have conditionals + one marginal, want other marginal: Law of total probability
- Want to reverse conditioning: Bayes' theorem
Strategy 4: Look for Factorization
Can you write ?
- If yes: Independent! Marginals are easy.
- If no: Need full marginalization.
Strategy 5: Use Symmetry
- Is the joint symmetric in the variables?
- Are there symmetries in the support region?
- Can you swap variables to make the problem easier?
Interview Problem Types
Type 1: Contingency Table Problems
| Given | Find | Approach |
|---|---|---|
| Joint distribution in table form | Marginals, conditionals, independence check | Row/column sums for marginals, division for conditionals |
Type 2: Uniform on Region
| Given | Find | Approach |
|---|---|---|
| uniform on geometric region | Marginal distribution of X or Y | "Slice" the region, find length/area of slices |
Type 3: Hierarchical Models
| Given | Find | Approach |
|---|---|---|
| , then | Marginal or joint | Use , then marginalize if needed |
Type 4: Order Statistics
| Given | Find | Approach |
|---|---|---|
| i.i.d., let be order statistics | Distribution of or joint of | Use joint density of order statistics, marginalize carefully |
Type 5: Transformation of Variables
| Given | Find | Approach |
|---|---|---|
| Joint distribution of , transformation | Joint distribution of or marginal of | Jacobian transformation, then marginalize if needed |
Type 6: Bayesian Inference
| Given | Find | Approach |
|---|---|---|
| Prior , likelihood , observed data | Posterior | Bayes' theorem, often can ignore normalizing constant |
Common Pitfalls
Pitfall 1: Confusing Joint and Conditional
Wrong: Using when you mean
Check: Does the formula involve division by a marginal?
Pitfall 2: Forgetting to Marginalize
Wrong: Treating a joint distribution as a marginal
Check: Do you have extra variables you need to integrate/sum out?
Pitfall 3: Independence Assumption
Wrong: Assuming without checking
Check: Does the joint factorize? Does ?
Pitfall 4: Wrong Order in Conditional
Wrong: Computing when you need
Fix: Use Bayes' theorem to reverse
Pitfall 5: Ignoring Support/Domain
Wrong: Integrating over wrong limits
Check: What values can actually take? Are there constraints?
Additional Helpful Tricks
The Complement Trick
Often easier to calculate the probability of "not happening":
Example: "At least one success" problems are easier as
Symmetry Arguments
Look for symmetry in problems to simplify calculations.
Example: In a random shuffle, any specific position is equally likely to contain any specific card.
Conditioning on the First Step
For sequential processes, condition on what happens first:
The "False Positive" Framework
For diagnostic/testing problems:
- Sensitivity:
- Specificity:
- Goal: Usually find using Bayes' Theorem
Probability Trees
Draw a tree for sequential events:
- Each branch represents a conditional probability
- Multiply along branches for joint probabilities
- Add across branches for total probabilities
Common Mistakes to Avoid
The Prosecutor's Fallacy
Wrong: Confusing with
Example: does NOT mean
Assuming Independence
Wrong: Assuming events are independent when they're not
Check: Does knowing B occurred change the probability of A? If yes, they're dependent.
Ignoring Base Rates
Wrong: Neglecting when using Bayes' Theorem
Example: Rare diseases remain unlikely even with positive tests if the base rate is very low.
Quick Reference: Chain Rule
Chain Rule:
Partition Formula: For any events A and B:
Conditional Complement:
Practice Problem Categories
- Balls and urns
- Disease testing
- Game shows (Monty Hall)
- Card/dice problems
- Random walks
- Matching problems
- Uniform on triangular/circular regions
- Hierarchical/mixture models