Skip to contents

Performs bootstrap resampling for Principal Component Analysis (PCA) based on the method described by Fisher et al. (2016), optimized for high-dimensional data (p >> n). This version is specifically adapted to work with the output object generated by the provided pca function (which returns a bi_projector object of class 'pca').

Usage

bootstrap_pca(
  x,
  nboot = 100,
  k = NULL,
  parallel = FALSE,
  cores = NULL,
  seed = NULL,
  epsilon = 1e-15,
  ...
)

Arguments

x

An object of class 'pca' as returned by the provided pca function. It's expected to contain loadings (v), scores (s), singular values (sdev), left singular vectors (u), and pre-processing info (preproc).

nboot

The number of bootstrap resamples to perform. Must be a positive integer (default: 100).

k

The number of principal components to bootstrap (default: all components available in the fitted PCA model x). Must be less than or equal to the number of components in x.

parallel

Logical flag indicating whether to use parallel processing via the future framework (default: FALSE). Requires the future.apply package and a configured future backend (e.g., future::plan(future::multisession)).

cores

The number of cores to use for parallel processing if parallel = TRUE (default: future::availableCores()). This is used if no future plan is set.

seed

An integer value for the random number generator seed for reproducibility (default: NULL, no seed is set).

epsilon

A small positive value added to standard deviations before division to prevent division by zero or instability (default: 1e-15).

...

Additional arguments (currently ignored).

Value

A list object of class bootstrap_pca_result containing:

E_Vb

Matrix (p x k) of the estimated bootstrap means of the principal components (loadings V^b = coefficients).

sd_Vb

Matrix (p x k) of the estimated bootstrap standard deviations of the principal components (loadings V^b).

z_loadings

Matrix (p x k) of the bootstrap Z-scores for the loadings, calculated as E_Vb / sd_Vb.

E_Scores

Matrix (n x k) of the estimated bootstrap means of the principal component scores (S^b).

sd_Scores

Matrix (n x k) of the estimated bootstrap standard deviations of the principal component scores (S^b).

z_scores

Matrix (n x k) of the bootstrap Z-scores for the scores, calculated as E_Scores / sd_Scores.

E_Ab

Matrix (k x k) of the estimated bootstrap means of the internal rotation matrices A^b.

Ab_array

Array (k x k x nboot) containing all the bootstrap rotation matrices A^b.

Scores_array

Array (n x k x nboot) containing all the bootstrap score matrices (S^b, with NAs for non-sampled subjects).

nboot

The number of bootstrap samples used (successful ones).

k

The number of components bootstrapped.

call

The matched call to the function.

Details

This function implements the fast bootstrap PCA algorithm proposed by Fisher et al. (2016), adapted for the output structure of the provided pca function. The pca function returns an object containing:

  • v: Loadings (coefficients, p x k) - equivalent to V in SVD Y = U D V'. Note the transpose difference from prcomp.

  • s: Scores (n x k) - calculated as U %*% D.

  • sdev: Singular values (vector of length k) - equivalent to d.

  • u: Left singular vectors (n x k).

The bootstrap algorithm works by resampling the subjects (rows) and recomputing the SVD on a low-dimensional representation. Specifically, it computes the SVD of the resampled matrix D U' P^b, where Y = U D V' is the SVD of the original (pre-processed) data, and P^b is a resampling matrix operating on the subjects (columns of U').

The SVD of the resampled low-dimensional matrix is svd(D U' P^b) = A^b S^b (R^b)'. The bootstrap principal components (loadings) are then calculated as V^b = V A^b, and the bootstrap scores are Scores^b = R^b S^b.

Z-scores are provided as mean / sd.

Important Note: The algorithm assumes the data Y used for the original SVD (Y = U D V') was appropriately centered (or pre-processed according to x$preproc). The bootstrap samples are generated based on the components derived from this pre-processed data.

References

Fisher, Aaron, Brian Caffo, Brian Schwartz, and Vadim Zipunnikov. 2016. "Fast, Exact Bootstrap Principal Component Analysis for P > 1 Million." Journal of the American Statistical Association 111 (514): 846–60. doi:10.1080/01621459.2015.1062383 .

Examples

# Simulate data (p=50, n=20)
set.seed(123)
p_dim <- 50
n_obs <- 20
Y_mat <- matrix(rnorm(p_dim * n_obs), nrow = p_dim, ncol = n_obs)
# Transpose for pca function input (n x p)
X_mat <- t(Y_mat)

# Perform PCA using the provided pca function
# Use center() pre-processing
pca_res <- pca(X_mat, ncomp = 5, preproc = center(), method = "fast")

# Run bootstrap on the pca result
boot_res <- bootstrap_pca(pca_res, nboot = 5, k = 5, seed = 456)

# Explore results
print(dim(boot_res$z_loadings)) # p x k Z-scores for loadings (coefficients)
#> [1] 50  5
print(dim(boot_res$z_scores))   # n x k Z-scores for scores
#> [1] 20  5