Fast, Exact Bootstrap for PCA Results from pca function
bootstrap_pca.RdPerforms bootstrap resampling for Principal Component Analysis (PCA) based on
the method described by Fisher et al. (2016), optimized for high-dimensional
data (p >> n). This version is specifically adapted to work with the output
object generated by the provided pca function (which returns a bi_projector
object of class 'pca').
Usage
bootstrap_pca(
x,
nboot = 100,
k = NULL,
parallel = FALSE,
cores = NULL,
seed = NULL,
epsilon = 1e-15,
...
)Arguments
- x
An object of class 'pca' as returned by the provided
pcafunction. It's expected to contain loadings (v), scores (s), singular values (sdev), left singular vectors (u), and pre-processing info (preproc).- nboot
The number of bootstrap resamples to perform. Must be a positive integer (default: 100).
- k
The number of principal components to bootstrap (default: all components available in the fitted PCA model
x). Must be less than or equal to the number of components inx.- parallel
Logical flag indicating whether to use parallel processing via the
futureframework (default: FALSE). Requires thefuture.applypackage and a configuredfuturebackend (e.g.,future::plan(future::multisession)).- cores
The number of cores to use for parallel processing if
parallel = TRUE(default:future::availableCores()). This is used if nofutureplan is set.- seed
An integer value for the random number generator seed for reproducibility (default: NULL, no seed is set).
- epsilon
A small positive value added to standard deviations before division to prevent division by zero or instability (default: 1e-15).
- ...
Additional arguments (currently ignored).
Value
A list object of class bootstrap_pca_result containing:
- E_Vb
Matrix (p x k) of the estimated bootstrap means of the principal components (loadings V^b = coefficients).
- sd_Vb
Matrix (p x k) of the estimated bootstrap standard deviations of the principal components (loadings V^b).
- z_loadings
Matrix (p x k) of the bootstrap Z-scores for the loadings, calculated as
E_Vb / sd_Vb.- E_Scores
Matrix (n x k) of the estimated bootstrap means of the principal component scores (S^b).
- sd_Scores
Matrix (n x k) of the estimated bootstrap standard deviations of the principal component scores (S^b).
- z_scores
Matrix (n x k) of the bootstrap Z-scores for the scores, calculated as
E_Scores / sd_Scores.- E_Ab
Matrix (k x k) of the estimated bootstrap means of the internal rotation matrices A^b.
- Ab_array
Array (k x k x nboot) containing all the bootstrap rotation matrices A^b.
- Scores_array
Array (n x k x nboot) containing all the bootstrap score matrices (S^b, with NAs for non-sampled subjects).
- nboot
The number of bootstrap samples used (successful ones).
- k
The number of components bootstrapped.
- call
The matched call to the function.
Details
This function implements the fast bootstrap PCA algorithm proposed by
Fisher et al. (2016), adapted for the output structure of the provided pca function.
The pca function returns an object containing:
v: Loadings (coefficients, p x k) - equivalent to V in SVD Y = U D V'. Note the transpose difference fromprcomp.s: Scores (n x k) - calculated as U %*% D.sdev: Singular values (vector of length k) - equivalent to d.u: Left singular vectors (n x k).
The bootstrap algorithm works by resampling the subjects (rows) and recomputing
the SVD on a low-dimensional representation. Specifically, it computes the SVD
of the resampled matrix D U' P^b, where Y = U D V' is the SVD of the original
(pre-processed) data, and P^b is a resampling matrix operating on the subjects (columns of U').
The SVD of the resampled low-dimensional matrix is svd(D U' P^b) = A^b S^b (R^b)'.
The bootstrap principal components (loadings) are then calculated as V^b = V A^b,
and the bootstrap scores are Scores^b = R^b S^b.
Z-scores are provided as mean / sd.
Important Note: The algorithm assumes the data Y used for the original SVD (Y = U D V')
was appropriately centered (or pre-processed according to x$preproc). The bootstrap
samples are generated based on the components derived from this pre-processed data.
References
Fisher, Aaron, Brian Caffo, Brian Schwartz, and Vadim Zipunnikov. 2016. "Fast, Exact Bootstrap Principal Component Analysis for P > 1 Million." Journal of the American Statistical Association 111 (514): 846–60. doi:10.1080/01621459.2015.1062383 .
Examples
# Simulate data (p=50, n=20)
set.seed(123)
p_dim <- 50
n_obs <- 20
Y_mat <- matrix(rnorm(p_dim * n_obs), nrow = p_dim, ncol = n_obs)
# Transpose for pca function input (n x p)
X_mat <- t(Y_mat)
# Perform PCA using the provided pca function
# Use center() pre-processing
pca_res <- pca(X_mat, ncomp = 5, preproc = center(), method = "fast")
# Run bootstrap on the pca result
boot_res <- bootstrap_pca(pca_res, nboot = 5, k = 5, seed = 456)
# Explore results
print(dim(boot_res$z_loadings)) # p x k Z-scores for loadings (coefficients)
#> [1] 50 5
print(dim(boot_res$z_scores)) # n x k Z-scores for scores
#> [1] 20 5