Performs Generalized PCA directly on a pre-computed covariance matrix C with a single
variable-side constraint/metric R. This is useful when you already have C = X'MX
or when X is too large to store but C is manageable. Supports two methods:
"gmd" (Allen et al.'s GMD approach, default) which exactly matches the two-sided
genpca, and "geigen" (generalized eigenvalue approach) which solves
C v = lambda R v.
Arguments
- C
A p x p symmetric positive semi-definite covariance matrix. Typically C = X'MX where X is the data matrix and M is a row metric.
- R
Variable-side constraint/metric. Can be:
NULL: Identity matrix (standard PCA on C)
A numeric vector of length p: Interpreted as diagonal weights (must be non-negative)
A p x p symmetric PSD matrix: General metric/smoothing/structure penalties
- ncomp
Number of components to return. Default is all positive eigenvalues.
- method
Character string specifying the method. One of:
"gmd" (default): Allen et al.'s GMD approach via eigen decomposition of \(R^{1/2} C R^{1/2}\)
"geigen": Generalized eigenvalue approach solving C v = lambda R v
- constraints_remedy
How to handle slightly non-PSD inputs (for geigen method). One of:
"error": Stop with an error if constraints are not PSD
"ridge": Add a small ridge to the diagonal to make PSD
"clip": Clip negative eigenvalues to zero
"identity": Replace with identity matrix
- tol
Numerical tolerance for PSD checks and filtering small eigenvalues. Default 1e-8.
- verbose
Logical. If TRUE, print progress messages. Default FALSE.
Value
A list with components:
- v
p x k matrix of loadings (R-orthonormal eigenvectors)
- d
Singular values (square root of eigenvalues lambda)
- lambda
Eigenvalues (variances under the R-metric)
- k
Number of components returned
- propv
Proportion of variance explained by each component
- cumv
Cumulative proportion of variance explained
- R_rank
Rank of the constraint matrix R
- method
The method used ("gmd" or "geigen")
Details
Method Selection Guide:
Use method = "gmd" when:
You need exact equivalence with
genpca(X, M, A)You're following Allen et al.'s GMD framework
You want consistent results with the two-sided decomposition
Use method = "geigen" when:
You specifically need the generalized eigenvalue formulation
You're working with legacy code that expects this approach
Computational efficiency is critical and R is well-conditioned
Method "gmd" (default):
This method implements Allen et al.'s GMD approach and exactly matches the two-sided genpca when C = X'MX. It computes the eigendecomposition of \(R^{1/2} C R^{1/2}\) and maps back with \(V = R^{-1/2} Z\), ensuring V'RV = I. The total variance is tr(CR) as in Allen's GPCA (Corollary 5).
Method "geigen":
This method solves the generalized eigenproblem C v = lambda R v directly. While mathematically valid, it solves a different optimization than Allen's GMD and will not, in general, match the two-sided genpca unless R = I or special commutation conditions hold.
For exact equivalence with genpca(X, M, A), use method="gmd" with C = X'MX and R = A.
References
Allen, G. I., Grosenick, L., & Taylor, J. (2014). A Generalized Least-Squares Matrix Decomposition. Journal of the American Statistical Association, 109(505), 145-159.
Examples
# Example 1: Standard PCA on covariance (no constraint)
C <- cov(scale(iris[,1:4], center=TRUE, scale=FALSE))
fit0 <- genpca_cov(C, R=NULL, ncomp=3)
print(fit0$d[1:3]) # first 3 singular values
#> [1] 2.0562689 0.4926162 0.2796596
print(fit0$propv[1:3]) # variance explained by first 3 components
#> [1] 0.92461872 0.05306648 0.01710261
# Example 2: Demonstrating equivalence with genpca
set.seed(123)
X <- matrix(rnorm(50 * 10), 50, 10)
M_diag <- runif(50, 0.5, 1.5) # row weights
A_diag <- runif(10, 0.5, 2) # column weights
# Two-sided GPCA
fit_gpca <- genpca(X, M = M_diag, A = A_diag, ncomp = 5,
preproc = multivarious::pass())
# Equivalent covariance-based GPCA
C <- crossprod(X, diag(M_diag) %*% X) # C = X'MX
fit_cov <- genpca_cov(C, R = A_diag, ncomp = 5, method = "gmd")
# These should match exactly
all.equal(fit_gpca$sdev, fit_cov$d, tolerance = 1e-10)
#> [1] TRUE
# Example 3: Variable weights via a diagonal metric (using iris covariance)
C_iris <- cov(scale(iris[,1:4], center=TRUE, scale=FALSE))
w <- c(1, 1, 0.5, 2) # emphasize Sepal.Width less, Petal.Width more
fitW <- genpca_cov(C_iris, R = w, ncomp=3, method="gmd")
print(fitW$d[1:3])
#> [1] 1.7972881 0.4903437 0.3173048
# Example 4: Compare GMD and generalized eigenvalue approaches
fit_gmd <- genpca_cov(C_iris, R = w, ncomp=2, method="gmd")
fit_geigen <- genpca_cov(C_iris, R = w, ncomp=2, method="geigen")
# These will generally differ unless R = I
print(paste("GMD singular values:", paste(round(fit_gmd$d, 3), collapse=", ")))
#> [1] "GMD singular values: 1.797, 0.49"
print(paste("GEigen singular values:", paste(round(fit_geigen$d, 3), collapse=", ")))
#> [1] "GEigen singular values: 2.658, 0.497"