1. Introduction
- Bootstrap methods are a class of nonparametric Monte Carlo methods that estimate the distribution of a population by resampling.
- Resampling methods treat an observed sample as a finite population, and random samples are generated (resampled) from it to estimate population characteristics and make inferences about the sampled population.
2. Understanding by Example
Let $X_1, X_2, \dots, X_{100} \overset{ i.i.d }{\sim} N(0,1)$.
Then, we can easily deduce that $\bar{X} = \frac{X_1 + X_2 + \cdots + X_{100}}{100} \overset{i.i.d}{\sim} N(0, \frac{1}{100})$.
But, if we want to know the distribution of complex statistic about random sample, what we have to do?
Or, if we don't know the distribution of population, how we can inference?
For example, let's think about the situation that we want to know $Y = \frac{ X_{(n)} }{X_{(1)} + X_{(n)}}$, where $X_{(1)}$ is the minumum value and $X_{(n)}$ is the maximum value among $X_1, \cdots, X_{100}$.
In this situation, it is difficult to know what is distribution of Y because statistic is very complex and we don't know what is the population of Y.
But, we can calculate $Y = \frac{ X_{(n)} }{X_{(1)} + X_{(n)}}$.
Just sort random sample, $X_1, X_2, \dots, X_{100}$ with ascending order. And, plug $X_{(1)}$ and $X_{(n)}$ into $Y = \frac{ X_{(n)} }{X_{(1)} + X_{(n)}}$.
However, this is just a point, i.e. point estimate. Our goal is to know what is the distribution of Y when we sample 100 random sample.
Bootstrap can be a solution for solving this problem.
The following are algorithm for how to bootstrap.
3. Algorithm
Suppose $Y$ is the parameter of interest and $\hat{Y}$ is an estimator of $Y$.
Then the bootstrap estimate of the distribution of $\hat{Y}$ is obtained as follows.
- Repeat for b = $1,\dots,B$:
- Generate sample $x^{*(b)} = (x_1^{*}, x_2^{*}, \dots, x_n^{*} )$ by sampling with replacement from the observed sample $x_1, \dots, x_n$.
- Compute $\hat{y}^{(b)}$ from the b-th sample, $x^{*(b)}$.
- Compute the empirical distribution of {$\hat{y}^{(b)}: b = 1, \dots, B$}.
The following historgram presents the distribution of $Y$ by bootstrap when we sample 15 random sample from law school data from R package bootstrap.

Like this, we can get the distribution of Y empirically.
So, if we want to know $Var(Y), E(Y)$, calculate the sample variance of $y^{*1}, \dots, y^{*b}$ and the sample mean $y^{*1}, \dots, y^{*b}$.
4. Theoretical Background of Bootstrap Method
Bootstrap seems to be very simple and useful. But, there are something to be careful, to do this, why inferencing the distribution fo statistic is possible by bootstrap.
Above an example, our goal is to get $Y = \frac{ X_{(n)} }{X_{(1)} + X_{(n)}}$'s distribution.
For now, we aim to get $Var(Y)$ from random sample $X_1, \dots, X_{100}$.
Here are the generating process $Y$ in reality.
- First, sample $X_1, \dots, X_100$ from the population distribution $F$.
- From random sample observation, set $Y = \frac{ X_{(n)} }{X_{(1)} + X_{(n)}}$
$$F \rightarrow \boldsymbol{X} = (X_1, \dots, X_{100}) \rightarrow Y$$
Like this, considering the number of all possible random sample cases, it's generating process to get $Var(Y)$.
Booststrap is imitating this procedure. But, we don't know the population distribution $F$.
So, we can't sample from the population distribution.
Instead of this, we sample from empirical distribution $\hat{F}$.
What is Empirical distribution?
Empirical distribution is a discrete probability distribution in which each data point has probability $\frac{1}{n}$, when we have $X_1, \dots, X_n$.
It usually seems a cumulative density distribution.
The empirical distribution function is a step function that increases by $\frac{1}{n}$ at each observation.




Like above figure, empirical distribution function $\hat{F}$ is converging(almost surely) in $F$.
However, to be able to converge $\hat{F}$ into F, sample must be random sample which is sampled from the population distribution under i.i.d..(Glivenko-Cantelli Theorem).
Anyway, we want to know $Var(Y)$.
Let $Y = g(\boldsymbol{X}) = g(X_1, X_2, \dots, X_{100})$.
To do this, we have to calculate $$Var(Y) = Var(g(\boldsymbol{X})) = E_{F_1 \cdots F_{100}}[[g(X_1,\dots, X_{100})- E[g(X_1, \dots, X_{100})]]^2] $$
\[
= \int \cdots \int
\left[
g(X_1, X_2, \dots, X_{100})
-
\mathbb{E}\big[g(X_1, X_2, \dots, X_{100})\big]
\right]^2
\, dF_1 \cdots dF_{100}
\]
As I mentioned, Bootstrap is substituting $F$ to $\hat{F}$. So, the following is estimate of $Var(Y)$ from bootstrap.
\[
\widehat{\mathrm{Var}}_{\text{ideal}}(Y)
=
\mathbb{E}_{\widehat{F}_1 \cdots \widehat{F}_{100}}
\left[
\left(
g(X_1^*, X_2^*, \dots, X_{100}^*)
-
\mathbb{E}\big[g(X_1^*, X_2^*, \dots, X_{100}^*)\big]
\right)^2
\right]
\]
\[
= \int \cdots \int
\left(
g(X_1^*, X_2^*, \dots, X_{100}^*)
-
\mathbb{E}\big[g(X_1^*, X_2^*, \dots, X_{100}^*)\big]
\right)^2
\, d\widehat{F}_1 \cdots d\widehat{F}_{100}
\]
But, it is not simple to calculate a value of integration if $n$ is not small.
Therefore, we do not use the bootstrap estimate directly. Instead, we approximate the final bootstrap estimate using Monte Carlo integration.
Monte Carlo integration
Monte Carlo integration is one of numerical method which approximate a value of integration by generating random number.
Here is a example.
If we want to know $F$ which integrate $f$ on $D$.
$$F = \int_{D}f(x)dx$$
Then, $D$ is support and we have $p(x)$ which can generate random number. Using this, the integral above can be rewritten as the integral below.
$$F = \int_{D} \frac{f(x)}{p(x)} p(x) dx = E_{p}[\frac{f(x)}{p(x)}]$$
So, the law of large numbers guarantees that the sample mean from a sufficiently large random sample converges in probability to the population mean. the equation above can be approximated to the equation below.
\[
F = \mathbb{E}_p \left[ \frac{f(X)}{p(X)} \right]
\approx
\frac{1}{N} \sum_{i=1}^{N} \frac{f(x_i)}{p(x_i)}
\]
The following is a bootstrap estimate of $Var(Y)$ using Monte Carlo integration.
\[
\mathbb{E}_{\widehat{F}_1 \cdots \widehat{F}_{100}}
\left[
\left(
g(X^*) - \mathbb{E}[g(X^*)]
\right)^2
\right]
\approx
\frac{1}{B}\sum_{i=1}^{B}
\left(
g(X_i^*) - \mathbb{E}[g(X_i^*)]
\right)^2
\]
\[
\text{where }
X_i^* = (X_{i1}^*, X_{i2}^*, \dots, X_{i,100}^*)
\text{ is the } i\text{-th bootstrap sample, }
X_i^* \sim \widehat{F}
\]
\[
\widehat{F} : \text{empirical distribution}
\]
Namely, we generate bootstrap samples by resampling the observed data, compute the squared deviation of $Y = g(\boldsymbol{X})$ for each sample,($g(\boldsymbol{X}^{*}-E[g(\boldsymbol{X}^{*})])^2$) and take the average.
'통계학 > 통계계산(Statistical Computation)' 카테고리의 다른 글
| Gibbs sampling (1) | 2025.08.19 |
|---|---|
| The Acceptance-Rejection (AR) Method (3) | 2025.07.26 |































