Generic Permutation-Based Test

This generic function implements a permutation-based test to assess the significance of components or statistics in a fitted model. The actual procedure depends on the method defined for the specific model class. Typical usage:

Arguments

x: A fitted model object (e.g. pca, cross_projector, discriminant_projector, multiblock_biprojector).
...: Additional arguments passed down to shuffle_fun or measure_fun (if applicable). Note: For multiblock methods, Xlist, comps, alpha, and use_rspectra (for biprojector) are handled as direct named arguments, not via ....
X: (Used by pca, cross_projector, discriminant_projector) The original primary data matrix used to fit x. Ignored by the multiblock_biprojector method.
Y: (Used by cross_projector) The secondary data block (n x pY). Ignored by other methods.
Xlist: (Used by multiblock_biprojector [optional, default NULL] and multiblock_projector [required]) List of data blocks.
nperm: Integer number of permutations (Default: 1000 for PCA, 500 for multiblock methods, 100 otherwise).
measure_fun: (Optional; Used by pca, cross_projector, discriminant_projector, multiblock_projector) A function for computing the statistic(s) of interest. Ignored by multiblock_biprojector. Signature/default varies by method (see Details).
shuffle_fun: (Optional; Used by all methods) A function for permuting the data appropriately. Signature/default varies by method (see Details).
fit_fun: (Optional; Used by cross_projector, discriminant_projector) A function for re-fitting a new model. Ignored by PCA and multiblock methods. Signature/default varies by method (see Details).
stepwise: (Used by pca) Logical indicating if sequential testing (P3 projection) should be performed. Default TRUE. (The multiblock methods also perform sequential testing based on alpha and comps, but this argument is ignored). Ignored by other methods.
parallel: (Used by all methods) Logical; if TRUE, attempt parallel execution via future.apply::future_lapply.
alternative: (Used by all methods) Character string for the alternative hypothesis: "greater" (default), "less", or "two.sided".
alpha: (Used by pca, multiblock_biprojector, multiblock_projector) Significance level for sequential stopping rule (default 0.05). Passed directly as a named argument to these methods.
comps: (Used by pca, multiblock_biprojector, multiblock_projector) Maximum number of components to test sequentially (default 4). Passed directly as a named argument to these methods.
use_svd_solver: (Used by pca) Optional string specifying the SVD solver (default "fast").
use_rspectra: (Used by multiblock_biprojector) Logical indicating whether to use RSpectra for eigenvalue calculation (default TRUE). Passed directly as a named argument.
predict_method: (Used by discriminant_projector) Prediction method ("lda" or "euclid") used by the default measure function (default "lda").

Value

The structure of the return value depends on the method:

cross_projector and discriminant_projector:: Returns an object of class perm_test, a list containing: statistic, perm_values, p.value, alternative, method, nperm, call.
pca, multiblock_biprojector, and multiblock_projector:: Returns an object inheriting from perm_test (classes perm_test_pca, perm_test_multiblock, or perm_test respectively for multiblock_projector), a list containing: component_results (data frame with observed stat, pval, CIs per component), perm_values (matrix of permuted stats), alpha (if applicable), alternative, method, nperm (vector of successful permutations per component), call.

Details

Shuffle or permute the data in a way that breaks the structure of interest (e.g., shuffle labels for supervised methods, shuffle columns/rows for unsupervised).
Re-fit or re-project the model on the permuted data. Depending on the class, this can be done via a fit_fun or a class-specific approach.
Measure the statistic of interest (e.g., variance explained, classification accuracy, canonical correlation).
Compare the distribution of permuted statistics to the observed statistic to compute an empirical p-value.

S3 methods define the specific defaults and required signatures for the functions involved in shuffling, fitting, and measuring.

This function provides a framework for permutation testing in various multivariate models. The specific implementation details, default functions, and relevant arguments vary by method.

PCA Method (perm_test.pca): Relevant arguments: X, nperm, measure_fun, shuffle_fun, stepwise, parallel, alternative, alpha, comps, use_svd_solver, .... Assesses significance of variance explained by each PC (Vitale et al., 2017). Default statistic: F_a. Default shuffle: column-wise. Default uses P3 projection and sequential stopping with alpha.

Cross Projector Method (perm_test.cross_projector): Relevant arguments: X, Y, nperm, measure_fun, shuffle_fun, fit_fun, parallel, alternative, .... Tests the X-Y relationship. Default statistic: x2y.mse. Default shuffle: rows of Y. Default fit: stats::cancor.

Discriminant Projector Method (perm_test.discriminant_projector): Relevant arguments: X, nperm, measure_fun, shuffle_fun, fit_fun, predict_method, parallel, alternative, .... Tests class separation. Default statistic: prediction accuracy. Default shuffle: labels. Default fit: MASS::lda.

Multiblock Bi-Projector Method (perm_test.multiblock_biprojector): Relevant arguments: Xlist (optional), nperm, shuffle_fun, parallel, alternative, alpha, comps, use_rspectra, .... Tests consensus using fixed internal statistic (eigenvalue) on scores for each component. The statistic is the leading eigenvalue of the covariance matrix of block scores for a given component (T^T, where T columns are scores of block b on component k). By default, it shuffles rows within each block independently (either from Xlist if provided via ..., or using the internally stored scores). It performs sequential testing for components specified by comps using the stopping rule defined by alpha (both passed via ...).

Multiblock Projector Method (perm_test.multiblock_projector): Relevant arguments: Xlist (required), nperm, measure_fun, shuffle_fun, parallel, alternative, alpha, comps, .... Tests consensus using measure_fun (default: mean abs corr) on scores projected from Xlist using the original model x. Does not refit.

References

Buja, A., & Eyuboglu, N. (1992). Remarks on parallel analysis. Multivariate Behavioral Research, 27(4), 509-540. (Relevant for PCA permutation concepts)

Vitale, R., Westerhuis, J. A., Næs, T., Smilde, A. K., de Noord, O. E., & Ferrer, A. (2017). Selecting the number of factors in principal component analysis by permutation testing— Numerical and practical aspects. Journal of Chemometrics, 31(10), e2937. doi:10.1002/cem.2937 (Specific to perm_test.pca)

Examples

# PCA Example
data(iris)
X_iris <- as.matrix(iris[,1:4])
mod_pca <- pca(X_iris, ncomp=4, preproc=center()) # Ensure centering

# Test first 3 components sequentially (faster with more nperm)
# Ensure a future plan is set for parallel=TRUE, e.g., future::plan("multisession")
res_pca <- perm_test(mod_pca, X_iris, nperm=50, comps=3, parallel=FALSE)
#> Pre-calculating reconstructions for stepwise testing...
#> Running 50 permutations sequentially for up to 3 PCA components (alpha=0.050, serial)...
#>   Testing Component 1/3...
#>   Testing Component 2/3...
#>   Testing Component 3/3...
#>   Component 3 p-value (0.05882) > alpha (0.050). Stopping sequential testing.
print(res_pca)
#> 
#> PCA Permutation Test Results
#> 
#> Method:  Permutation test for PCA (Vitale et al. 2017 P3) (statistic: F_a (Fraction of Remaining Variance), stepwise: TRUE, shuffle: column-wise) 
#> Alternative:  greater 
#> 
#> Component Results:
#>   comp  observed       pval  lower_ci  upper_ci
#> 1    1 0.9246187 0.01960784 0.6817051 0.6895225
#> 2    2 0.7039743 0.01960784 0.6045818 0.6920858
#> 3    3 0.7664247 0.05882353 0.6674626 0.7726917
#> 
#> Number of successful permutations per component: 50, 50, 50 

# PCA Example with row shuffling (tests different null hypothesis)
row_shuffle <- function(dat, ...) dat[sample(nrow(dat)), ]
res_pca_row <- perm_test(mod_pca, X_iris, nperm=50, comps=3,
                         shuffle_fun=row_shuffle, parallel=FALSE)
#> Pre-calculating reconstructions for stepwise testing...
#> Running 50 permutations sequentially for up to 3 PCA components (alpha=0.050, serial)...
#>   Testing Component 1/3...
#>   Component 1 p-value (0.4314) > alpha (0.050). Stopping sequential testing.
print(res_pca_row)
#> 
#> PCA Permutation Test Results
#> 
#> Method:  Permutation test for PCA (Vitale et al. 2017 P3) (statistic: F_a (Fraction of Remaining Variance), stepwise: TRUE, shuffle: custom) 
#> Alternative:  greater 
#> 
#> Component Results:
#>   comp  observed      pval  lower_ci  upper_ci
#> 1    1 0.9246187 0.4313725 0.9246187 0.9246187
#> 
#> Number of successful permutations per component: 50 

if (FALSE) { # \dontrun{
# Cross Projector Example (using cancor)
X <- as.matrix(iris[,1:2])
Y <- as.matrix(iris[,3:4])
ccr <- cancor(X, Y)
mod_cp <- cross_projector(ccr$xcoef, ccr$ycoef)

# Perm test (is x2y.mse lower than chance?)
res_cp <- perm_test(mod_cp, X, Y=Y, nperm=50, alternative="less")
print(res_cp)

# Discriminant Projector Example (using LDA)
library(MASS)
lda_fit <- lda(X_iris, grouping=iris$Species)
mod_dp <- discriminant_projector(
  v = lda_fit$scaling,
  s = X_iris %*% lda_fit$scaling,
  sdev = lda_fit$svd,
  labels = iris$Species,
  preproc = prep(center()), # Assuming center() was intended for LDA
  Sigma = lda_fit$covariance # Needed for LDA prediction method
)

# Perm test (is accuracy higher than chance?)
res_dp <- perm_test(mod_dp, X_iris, nperm=50, alternative="greater")
print(res_dp)

# Multiblock Bi-Projector Example
# (Requires a multiblock model 'mod_mb' from e.g. MFA or ComDim)
# Assuming 'mod_mb' exists and has 2 blocks:
# res_mb <- perm_test(mod_mb, nperm=50, comps=3) 
# print(res_mb)
# Example using provided Xlist (list of matrices X1, X2):
# X1 <- matrix(rnorm(50*10), 50, 10)
# X2 <- matrix(rnorm(50*15), 50, 15)
# Assume mod_mb was fit on cbind(X1, X2) with block_indices=list(1:10, 11:25)
# res_mb_xlist <- perm_test(mod_mb, Xlist=list(X1, X2), nperm=50, comps=3)
# print(res_mb_xlist)
} # }

Arguments

Value

Details

References

See also

Examples