Sum of uniform random variables until sum is greater than one

math
Author

R package build

Published

November 10, 2022

Introduction

I saw this problem on r/probabilitytheory

If you pick a uniformly random real number on [0,1] and repeat this until the sum of numbers picked is greater than 1, you’ll on average pick \(e \approx 2.718\) numbers

Posed in a way that we can actually begin to solve this.

Sample independent uniform random variables \(U_1, U_2, \ldots\) and let \(S_n = \sum_{i = 1}^n U_i\). Let \(N\) be the first integer \(n\) such that \(S_n > 1\). Find \(E[N]\)

Bonus question: What is the \(E[S_n]\)? Can we find its distribution?

Of course, the first step is to replicate the simple simulation.

Simulation

# Sample from independent uniform U(0,1) distributions and stop when the sum
# is greater than 1. Let N be that first sum. Then E[N] = e

sim_sum <- function(t = 1) {
  s_n <- 0
  n <- 0
  while(s_n < 1) {
    s_n <- s_n + runif(1)
    n <- n + 1
  }
  c(n = n, s_n = s_n)
}


reps <- 1e4
N_sim <- data.frame(t(replicate(reps, sim_sum())))
N_sim_table <- table(N_sim$n)
N_sim_prob <- N_sim_table / reps

K <- as.numeric(names(N_sim_table))

mean_N <- mean(N_sim$n)
plot(K, N_sim_prob, type = 'b', ylim = c(0, 1), ylab = 'P(N = n)', xlab = 'N')
abline(v = mean_N, lty = 2)
text(x = mean_N + 1, y = 0.75, sprintf('Mean(n) = %.03f', mean_N))

Proof

The idea of the proof is to directly solve:

\[ E[n] = \sum_{n = 1}^\infty n \cdot P(N = n) \]

So we need to calculate, \(P(N = n)\). This is the same as, \(P(S_{n - 1} \leq 1 \text{ and } S_n > 1)\). If we calculate \(P(S_n \leq x) = F_{S_n}(x)\) we can complete the proof.

\(P(S_n \leq x)\)

Using the proof from this answer on math.stackexchange,

when \(x \in [0,1]\), claim is that \(P(S_n \leq t) = \frac{x^n}{n!}\). When \(n = 1\), then \(P(S_1 \leq t) = P(U_1 \leq t) = t\). Assume true for \(n\). Then,

\[ \begin{aligned} P(S_{n + 1} \leq x) &= P(S_n + U_{n + 1} \leq t)\\ &= \int_0^1 P(S_n + u \leq t) \underbrace{f(u)}_{1} ~du && S_n \text{ and } U_{n + 1} \text{ independent}\\ &= \int_0^1 P(S_n \leq t - u) ~du\\ &= \int_0^t \underbrace{P(S_n \leq t - u)}_{\text{Induction Hypothesis}} ~du && \text{Since } P(S_n \leq t - u) = 0 \text{ when } u \in [t, 1]\\ &= \int_0^t \frac{(t - u)^n}{n!} ~du \\ &= \frac{1}{n!} \left( \frac{-1}{n + 1} (t - u)^{n + 1}\Big|^{u = t}_{u = 0} \right)\\ &= \frac{t^{n + 1}}{(n + 1)!} \end{aligned} \]

PMF of \(S_n\)

Since \(F_{S_n}(x) = \frac{x^n}{n!}\) we have \[ \begin{aligned} f_{S_n}(x) &= \frac{dF_{S_n}}{dx}\\ &= n \frac{x^{n - 1}}{n!} \end{aligned} \]

\(P(N = n)\)

\[ \begin{aligned} P(N = n) &= P(S_{n - 1} \leq 1 \text{ and } S_n > 1)\\ &= P(S_n > 1 | S_{n - 1} \leq 1) P(S_{n - 1} \leq 1)\\ &= \int_0^1 P(U_n + x > 1 | S_{n - 1} = x \leq 1) \underbrace{P(s \leq 1 | S_{n - 1} = s)}_{1} f_{S_{n - 1}}(x)~dx\\ &= \int_0^1 P(U_n + x > 1)~f_{S_{n - 1}}(x)~dx && U_n \text{ and } S_{n - 1} \text{ independent}\\ &= \int_0^1 (1 - F_n(1 - x))~f_{S_{n - 1}}(x)~dx\\ &= \int_0^1 x (n - 1)\frac{x^{n - 2}}{(n - 1)!}~dx\\ &= \frac{n - 1}{(n - 1)!} \int_0^1 x^{n - 1}~dx\\ &= \frac{n - 1}{n (n - 1)!} = \frac{n - 1}{n!} \end{aligned} \]

To double check the answer we can simulate

# This is formula we computed analytically
# (K - 1) / (factorial(K))
K <- as.numeric(names(N_sim_table))
expected_probs <- (K - 1) / (factorial(K))

plot(K, N_sim_prob, type = 'b', xlab = 'n', ylab = 'Prob(N = n)', ylim = c(0, 1), yaxp = c(0, 1, 4))
lines(K, expected_probs, type = 'b', col = 'red')

E[N] = e

\[ \begin{aligned} E[N] &= \sum_{n = 2}^\infty n \cdot P(N = n)\\ &= \sum_{n = 2}^\infty \frac{n (n - 1)}{n!}\\ &= \sum_{n = 2}^\infty \frac{1}{(n - 2)!}\\ &= \sum_{n = 0}^\infty \frac{1}{n!} && \text{since } \sum_{n = 0}^\infty \frac{x^n}{n!} = e^x\\ &= e \end{aligned} \]