Using conjugate Beta-Binomial models and Monte Carlo simulation to determine whether a redesigned checkout page increases purchase conversions — with full posterior analysis and decision-theoretic framework.
Traditional frequentist A/B testing relies on p-values and fixed sample sizes, often leading to early peeking problems and difficulty interpreting results in business terms. This project applies Bayesian inference to an e-commerce A/B test comparing two checkout page designs, producing direct probability statements about which variant performs better and by how much.
Working with simulated data modeled after realistic e-commerce conversion rates, I built a full Bayesian pipeline: specifying informative priors from historical data, computing posterior distributions analytically via conjugacy, running Monte Carlo simulations to estimate the probability that the new design wins, and computing expected revenue lift to inform the business decision.
An e-commerce company redesigned its checkout page (Variant B) and wants to know whether it improves purchase conversion rate over the original (Variant A). Over a 14-day test period, traffic was randomly split between the two variants:
| Variant | Visitors | Conversions | Observed Rate |
|---|---|---|---|
| A (Control) | 4,821 | 362 | 7.51% |
| B (Redesign) | 4,756 | 408 | 8.58% |
The observed lift is approximately 1.07 percentage points (a 14.2% relative increase). But is this difference real, or could it be due to random variation? Rather than computing a p-value, we want to answer the question directly: What is the probability that Variant B is truly better than Variant A?
Each visitor either converts or doesn't, so the data follows a Binomial distribution. For the conversion rate parameter θ, we use a Beta prior — the conjugate prior for binomial data, which gives us a closed-form posterior.
Using historical data from the previous quarter (conversion rate around 7.2% ± 1.5%), I calibrated a weakly informative prior of Beta(7, 90). This encodes our prior belief while letting the data dominate — the effective prior sample size is only 97 versus thousands of real observations.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# Data
n_a, x_a = 4821, 362 # Control
n_b, x_b = 4756, 408 # Redesign
# Weakly informative prior: Beta(7, 90)
alpha_prior, beta_prior = 7, 90
# Posterior parameters (conjugate update)
alpha_a = alpha_prior + x_a # 369
beta_a = beta_prior + n_a - x_a # 4549
alpha_b = alpha_prior + x_b # 415
beta_b = beta_prior + n_b - x_b # 4438
# Posterior distributions
posterior_a = stats.beta(alpha_a, beta_a)
posterior_b = stats.beta(alpha_b, beta_b)
print(f"Posterior A: Beta({alpha_a}, {beta_a})")
print(f" Mean: {posterior_a.mean():.4f}, 95% CI: [{posterior_a.ppf(0.025):.4f}, {posterior_a.ppf(0.975):.4f}]")
print(f"Posterior B: Beta({alpha_b}, {beta_b})")
print(f" Mean: {posterior_b.mean():.4f}, 95% CI: [{posterior_b.ppf(0.025):.4f}, {posterior_b.ppf(0.975):.4f}]")
To compute the probability that B is better than A, I drew 500,000 samples from each posterior and compared them elementwise. This Monte Carlo approach also gives us the full distribution of the difference (θ_B − θ_A), which is far more informative than a single point estimate.
# Monte Carlo comparison
n_simulations = 500_000
np.random.seed(42)
samples_a = np.random.beta(alpha_a, beta_a, size=n_simulations)
samples_b = np.random.beta(alpha_b, beta_b, size=n_simulations)
# Probability B > A
prob_b_wins = (samples_b > samples_a).mean()
print(f"P(θ_B > θ_A) = {prob_b_wins:.4f}")
# Distribution of the lift
lift = samples_b - samples_a
print(f"Expected lift: {lift.mean()*100:.2f} pp")
print(f"95% CI of lift: [{np.percentile(lift, 2.5)*100:.2f}, {np.percentile(lift, 97.5)*100:.2f}] pp")
print(f"P(lift > 0.5 pp) = {(lift > 0.005).mean():.3f}")
Posterior distributions of conversion rates. The separation between the two densities visually confirms the high probability that B outperforms A.
To translate the statistical result into a business decision, I computed the expected revenue impact. With an average order value of $67.50 and approximately 10,000 weekly visitors to the checkout page:
# Decision-theoretic analysis
avg_order_value = 67.50
weekly_visitors = 10_000
revenue_a = samples_a * avg_order_value * weekly_visitors
revenue_b = samples_b * avg_order_value * weekly_visitors
weekly_gain = revenue_b - revenue_a
print(f"Expected weekly revenue gain: ${weekly_gain.mean():,.0f}")
print(f"Annual projected gain: ${weekly_gain.mean() * 52:,.0f}")
print(f"P(annual gain > $20,000) = {(weekly_gain * 52 > 20000).mean():.1%}")
Recommendation: Deploy Variant B. There is a 97.2% posterior probability that the redesigned checkout page outperforms the control, with an expected annual revenue gain of ~$36,880. The entire 95% credible interval for the lift is positive, and there is a 91.3% chance the annual gain exceeds $20,000. The Bayesian framework gave us direct, interpretable probability statements — no p-value gymnastics required.
This project demonstrates how Bayesian A/B testing provides richer, more decision-relevant output than traditional hypothesis testing. Instead of a binary "significant or not" answer, we obtained a full probability distribution over the effect size, enabling risk-aware decision making. The conjugate Beta-Binomial model made computation trivial while the Monte Carlo simulation extended the analysis to derived quantities like revenue impact.