Implements the Generalised Least‑Squares Matrix Decomposition of Allen, Grosenick & Taylor (2014) for data observed in a **row** inner‑product space M and a **column** inner‑product space A. Setting M = I_n, A = I_p recovers ordinary PCA.
Usage
genpca(
X,
A = NULL,
M = NULL,
ncomp = NULL,
method = c("eigen", "auto", "spectra", "randomized", "deflation"),
constraints_remedy = c("ridge", "error", "clip", "identity"),
preproc = multivarious::pass(),
threshold = 1e-06,
maxit_deflation = 500L,
use_cpp = TRUE,
maxeig = 800,
warn_approx = TRUE,
maxit_spectra = 1000,
tol_spectra = 1e-09,
oversample = 20L,
n_power = 1L,
n_polish = 0L,
jitter_metric = 1e-10,
seed_randomized = 1234L,
tol_polish_randomized = 1e-04,
verbose = FALSE
)
gmdLA(
X,
Q,
R,
k = min(n_orig, p_orig),
n_orig,
p_orig,
maxeig = 800,
tol = 1e-08,
use_dual = FALSE,
warn_approx = TRUE,
verbose = FALSE
)
gmd_deflationR(X, Q, R, k, thr = 1e-06, maxit = 500L, verbose = FALSE)
# S3 method for class 'genpca'
truncate(x, ncomp)
# S3 method for class 'genpca'
reconstruct(
x,
comp = 1:multivarious::ncomp(x),
rowind = NULL,
colind = NULL,
...
)
# S3 method for class 'genpca'
ncomp(x)Arguments
- X
Numeric matrix n x p.
- A
Column constraint: vector (implies diagonal), dense matrix, or sparse symmetric p x p PSD matrix. If `NULL`, defaults to identity.
- M
Row constraint: vector (implies diagonal), dense matrix, or sparse symmetric n x n PSD matrix. If `NULL`, defaults to identity.
- ncomp
Number of components to extract. Defaults to `min(dim(X))`. Must be positive.
- method
Character string specifying the computation method. One of
"eigen"(default, usesgmdLA),"auto"(heuristic choice among"eigen","spectra", and"randomized"),"spectra"(uses matrix-free C++/Spectra implementationgmd_fast_cpp),"randomized"(approximate randomized block solvergmd_randomized), or"deflation"(usesgmd_deflationRorgmd_deflation_cpp).- constraints_remedy
Character string specifying the remedy for constraints. One of
"error","ridge","clip", or"identity".- preproc
Pre‑processing transformer object from the **multivarious** package (default `multivarious::pass()`). Use `multivarious::center()` for centered GPCA. See `?multivarious::prep` for options.
- threshold
Convergence tolerance for the
"deflation"method's inner loop. Default `1e-6`.- maxit_deflation
Maximum iterations per component for the
"deflation"method. Default `500`.- use_cpp
Logical. If `TRUE` (default) and package was compiled with C++ support, use faster C++ implementation for
method = "deflation". Fallback to R otherwise. (Ignored formethod = "eigen"andmethod = "spectra").- maxeig
Upper bound on subspace dimension for eigen/SVD calculations, primarily for
method = "eigen". If a constraint matrix dimension is<= maxeiga full eigen decomposition is used. Otherwise only the leadingmaxeigeigencomponents are computed viaRSpectra::eigs_sym, so results may be approximate. Default `800`.- warn_approx
Logical. If
TRUE(default) a warning is emitted when an approximate eigen decomposition is used because the dimension exceedsmaxeig.- maxit_spectra
Maximum iterations for the Spectra iterative solver when
method = "spectra". Default `1000`.- tol_spectra
Tolerance for the Spectra iterative solver when
method = "spectra". Default `1e-9`.- oversample
Oversampling for
method = "randomized"(sketch size =ncomp + oversample). Default `20`.- n_power
Number of power iterations for
method = "randomized". Default `1`.- n_polish
Number of optional block-polish iterations for
method = "randomized". Default `0`.- jitter_metric
Small jitter used in metric orthonormalization for
method = "randomized". Default `1e-10`.- seed_randomized
Optional seed for
method = "randomized". Default `1234`.- tol_polish_randomized
Relative tolerance used for early stopping of polish iterations in
method = "randomized". Set `0` to disable early stop. Default `1e-4`.- verbose
Logical. If `TRUE`, print progress messages. Default `FALSE`.
- maxit
Maximum number of iterations for deflation convergence. Default `500`.
Value
An object of class `c("genpca", "bi_projector")` inheriting from `multivarious::bi_projector`, with slots including:
- u,v
Left/right singular vectors scaled by the constraint metrics (MU, AV). These correspond to components in the original space's geometry. Use `components(fit)`.
- ou,ov
Orthonormal singular vectors in the constraint metric (U, V such that UT M U = I, VT AV = I). These are the core mathematical factors.
- sdev
Generalised singular values d_k.
- s
Scores ( X V or equivalently MU D). Represent projection of rows onto components. Use `scores(fit)`.
- preproc
The `multivarious` pre‑processing object used.
- A, M
The constraint matrices used (potentially after coercion to sparse format).
- propv
Proportion of generalized variance explained by each component.
- cumv
Cumulative proportion of generalized variance explained.
Details
`gmdLA` caches the eigen decomposition of the constraint matrices by storing it as an attribute on the matrix. `compute_sqrtm()` returns this modified matrix so callers can reassign it (e.g. `R <- sqrtm_res$matrix`) to reuse the cached decomposition in subsequent calls.
Method
We compute the rank‑ncomp factors UDVT that minimise $$ \|X - UDV^\top\|_{M,A}^2 = \mathrm{tr}\!\bigl(M\, (X-UDV^\top)\,A\,(X-UDV^\top)^\top\bigr) $$ subject to UT M U = I, VT AV = I. (Allen et al., 2014). Three methods are available via the `method` argument:
"eigen"(Default): Uses a one-shot eigen decomposition strategy based ongmdLA. It explicitly forms and decomposes a \(p \times p\) or \(n \times n\) matrix (depending onnvsp)."auto": Chooses among"eigen","spectra", and"randomized"using heuristics on shape, rank ratio (ncomp / min(n,p)), and constraint structure."spectra": Uses a matrix-free iterative approach via the RcppSpectra package to solve the same eigen problem as"eigen"but without forming the large intermediate matrix. Generally faster and uses less memory for largenorp. Requires C++ compiler and RcppSpectra."randomized": Uses a randomized block range finder and small projected eigendecomposition. This is an approximate low-pass method that is often much faster for wide dense matrices with sparse metrics when only top components are needed."deflation": Uses an iterative power/deflation algorithm. Can be slower but potentially uses less memory than"eigen"for very large dense problems wherencompis small.
Backend Guidance
Use
method = "auto"as the default in production pipelines.Use
"eigen"when you need a stable reference solution on small/medium problems.Use
"spectra"for larger matrix-free iterative solves where memory pressure is a concern.Use
"randomized"for wide low-rank settings (p >> n) with sparse metrics when throughput matters most.Use
"deflation"when you only need a few components and can tolerate iterative convergence behavior.
For pre-computed covariance matrices C = X'MX, see genpca_cov which
performs GPCA directly on C with column constraint R (equivalent to A).
References
Allen, G. I., Grosenick, L., & Taylor, J. (2014). *A Generalized Least‑Squares Matrix Decomposition.* Journal of the American Statistical Association, 109(505), 145‑159. arXiv:1102.3074.
See also
genpca_cov for GPCA on pre-computed covariance matrices,
truncate.genpca, reconstruct.genpca,
`multivarious::bi_projector`, `multivarious::project`, `multivarious::scores`,
`multivarious::components`, `multivarious::reconstruct`.
Examples
if (requireNamespace("RSpectra", quietly = TRUE) &&
requireNamespace("multivarious", quietly = TRUE)) {
set.seed(123)
X <- matrix(stats::rnorm(200 * 100), 200, 100)
rownames(X) <- paste0("R", 1:200)
colnames(X) <- paste0("C", 1:100)
# Standard PCA (A=I, M=I, centered) - using default method="eigen"
gpca_std_eigen <- genpca(X, ncomp = 5, preproc = multivarious::center(), verbose = FALSE)
# Standard PCA using Spectra method (requires C++ build)
# gpca_std_spectra <- try(genpca(X, ncomp = 5,
# preproc = multivarious::center(),
# method = "spectra", verbose = TRUE))
# if (!inherits(gpca_std_spectra, "try-error")) {
# print(head(gpca_std_spectra$sdev))
# }
# Compare singular values with prcomp
pr_std <- stats::prcomp(X, center = TRUE, scale. = FALSE)
print("Eigen Method Sdev:")
print(head(gpca_std_eigen$sdev))
print("prcomp Sdev:")
print(head(pr_std$sdev))
print(paste("Total Var Explained (Eigen):",
round(sum(gpca_std_eigen$propv) * 100), "%"))
# Weighted column PCA (diagonal A, no centering)
col_weights <- stats::runif(100, 0.5, 1.5)
gpca_weighted <- genpca(X, A = col_weights, ncomp = 3,
preproc = multivarious::pass(), verbose = FALSE)
print("Weighted GPCA Sdev:")
print(gpca_weighted$sdev)
print(head(components(gpca_weighted)))
}
#> [1] "Eigen Method Sdev:"
#> [1] 23.38656 23.16950 23.00393 22.56341 22.20862
#> [1] "prcomp Sdev:"
#> [1] 1.657829 1.642442 1.630705 1.599478 1.574327 1.543299
#> [1] "Total Var Explained (Eigen): 13 %"
#> [1] "Weighted GPCA Sdev:"
#> [1] 24.79929 24.23576 23.57776
#> PC1 PC2 PC3
#> C1 -0.02654255 0.04444090 -0.063163088
#> C2 0.38534643 -0.13459026 -0.000021882
#> C3 -0.06053140 0.07895713 -0.093587789
#> C4 0.04387758 -0.01422663 0.003160916
#> C5 0.06087768 0.01772540 -0.071200767
#> C6 -0.00294182 -0.08799976 -0.174320612