This function implements a penalized Multiple Factor Analysis decomposition that encourages similarity among block-specific loading matrices. The method uses block-coordinate descent (BCD) with Riemannian optimization on the Stiefel manifold to estimate orthonormal loading matrices for each data block while applying a regularization penalty that promotes structural similarity across blocks.
Usage
penalized_mfa(data, ...)
# S3 method for class 'list'
penalized_mfa(
data,
ncomp = 2,
lambda = 1,
penalty_method = c("projection", "pairwise", "global_mean"),
max_iter = 10,
nsteps_inner = 5,
learning_rate = 0.01,
optimizer = c("adam", "gradient"),
preproc = multivarious::center(),
beta1 = 0.9,
beta2 = 0.999,
adam_epsilon = 1e-08,
tol_obj = 1e-07,
tol_inner = NULL,
compute_consensus = FALSE,
verbose = FALSE,
...
)
# S3 method for class 'multiblock'
penalized_mfa(data, ...)
# S3 method for class 'multidesign'
penalized_mfa(data, subject, ...)Arguments
- data
A list of matrices, a `multiblock` object, or a `multidesign` object. For lists, each element should be a numeric matrix with the same number of rows (observations). Columns represent features and can differ across blocks.
- ...
Additional arguments passed to methods. Currently unused but included for S3 method consistency.
- ncomp
Integer number of latent components to extract (default: 2). This is automatically capped at the minimum number of columns across all blocks after preprocessing. Must be at least 1.
- lambda
Non-negative scalar controlling the strength of the penalty (default: 1). Larger values encourage more similarity among block loadings. When `lambda = 0`, the method reduces to independent PCA on each block. Typical range: 0 to 10.
- penalty_method
Character string specifying the penalty type (default: "projection"). Options are:
"projection": Rotation-invariant penalty on projection matrices (recommended)"pairwise": Pairwise Euclidean distances between loading matrices"global_mean": Distance from mean loading matrix
- max_iter
Maximum number of outer block-coordinate descent iterations (default: 10). Typical range: 10 to 100. More iterations allow better convergence but increase computation time.
- nsteps_inner
Number of gradient update steps per block in each outer iteration (default: 5). Higher values perform more thorough optimization of each block before moving to the next. Typical range: 1 to 20.
- learning_rate
Step size for the optimizer (default: 0.01). Controls how far to move in the gradient direction. Too large may cause divergence; too small slows convergence. Typical range for gradient descent: 0.001 to 0.1. Adam is less sensitive to this parameter.
- optimizer
Character string specifying the optimization algorithm (default: "adam"). Options are:
"adam": Adaptive Moment Estimation with momentum and adaptive learning rates"gradient": Standard gradient descent with fixed learning rate
Adam is generally more robust and converges faster with default settings.
- preproc
A preprocessing specification (default: `multivarious::center()`). Can be:
A single `pre_processor` object applied to all blocks
A list of `pre_processor` objects (one per block)
NULLfor no preprocessing (identity transformation)
Common options: `center()`, `standardize()`, `pass()` (no-op).
- beta1
Adam hyperparameter for first moment decay (default: 0.9). Controls the exponential decay rate for gradient moving average. Typical range: 0.8 to 0.95. Only used when `optimizer = "adam"`.
- beta2
Adam hyperparameter for second moment decay (default: 0.999). Controls the exponential decay rate for squared gradient moving average. Typical range: 0.99 to 0.9999. Only used when `optimizer = "adam"`.
- adam_epsilon
Small constant added to denominator in Adam for numerical stability (default: 1e-8). Prevents division by zero. Only used when `optimizer = "adam"`.
- tol_obj
Numeric tolerance for outer loop convergence based on relative change in objective function (default: 1e-7). Smaller values require more precise convergence. Typical range: 1e-8 to 1e-5.
- tol_inner
Optional numeric tolerance for inner loop convergence based on Frobenius norm of change in loading matrix (default: NULL, no early stopping). When specified, inner loop stops if \(\|\mathbf{V}_i^{\text{new}} - \mathbf{V}_i\|_F < \text{tol_inner}\).
- compute_consensus
Logical indicating whether to compute a consensus loading matrix (default: FALSE). Only computed when all blocks have the same number of features after preprocessing. The consensus is the orthonormalized sum of block-specific loadings.
- verbose
Logical indicating whether to print iteration progress (default: FALSE). When TRUE, displays objective values and relative changes at each iteration using the `cli` package.
- subject
Required for `multidesign` method: the name (unquoted) of the subject variable used to split data into blocks. Each subject's data becomes a separate block in the analysis.
Value
A `multiblock_projector` object of class `penalized_mfa` with the following components:
vConcatenated loading matrix (total_features × ncomp) formed by vertically stacking all block-specific loading matrices.
preprocA `concat_pre_processor` object containing the preprocessing transformations applied to each block. Can be used to transform new data.
block_indicesA named list indicating which rows of `v` correspond to each block. Each element is an integer vector of row indices.
Additional information is stored as attributes:
V_listList of length S containing the final orthonormal loading matrices for each block. Each element is a (p_i × ncomp) matrix.
obj_valuesNumeric vector of objective function values at each iteration, including the initial value. Length is (iterations_run + 1).
consensusThe consensus loading matrix (
p_postx ncomp) if `compute_consensus=TRUE`, otherwise `NULL`. This is the orthonormalized average of block loadings over blocks with matching feature dimension.lambdaThe penalty strength used in the optimization.
penalty_methodCharacter string indicating the penalty type used. May be "none" if penalty was disabled due to inconsistent block dimensions.
iterations_runInteger indicating how many outer iterations were completed before convergence or reaching max_iter.
Details
## Optimization Problem
For \(S\) data blocks \(\mathbf{X}_i \in \mathbb{R}^{n \times p_i}\) (where \(i = 1, \ldots, S\)), the method estimates loading matrices \(\mathbf{V}_i \in \mathbb{R}^{p_i \times k}\) with orthonormal columns (\(\mathbf{V}_i^\top \mathbf{V}_i = \mathbf{I}_k\)) by minimizing:
$$ \min_{\{V_i\}} \sum_{i=1}^S \|\mathbf{X}_i - \mathbf{X}_i \mathbf{V}_i \mathbf{V}_i^\top\|_F^2 \;+\; \lambda \,\text{Penalty}(\{\mathbf{V}_i\}) $$
The first term is the reconstruction error (sum of squared residuals) for each block, and the second term is a penalty that encourages similarity among the loading matrices.
## Penalty Options
Three penalty formulations are available:
- Projection (default, rotation-invariant):
$$\sum_{i=1}^S \|\mathbf{P}_i - \bar{\mathbf{P}}\|_F^2$$ where \(\mathbf{P}_i = \mathbf{V}_i \mathbf{V}_i^\top\) is the projection matrix for block \(i\) and \(\bar{\mathbf{P}} = \frac{1}{S}\sum_{i=1}^S \mathbf{P}_i\). This penalty is invariant to rotations of the loading matrices and is recommended when the orientation of the latent space is arbitrary.
- Global Mean (not rotation-invariant):
$$\sum_{i=1}^S \|\mathbf{V}_i - \bar{\mathbf{V}}\|_F^2$$ where \(\bar{\mathbf{V}} = \frac{1}{S}\sum_{i=1}^S \mathbf{V}_i\). This directly penalizes Euclidean distance from the mean loading matrix. Computationally simpler than pairwise but not rotation-invariant.
- Pairwise (not rotation-invariant):
$$\sum_{i < j} \|\mathbf{V}_i - \mathbf{V}_j\|_F^2$$ This penalizes all pairwise Euclidean distances between loading matrices. Equivalent to global mean up to a scaling factor but conceptually different.
## Optimization Algorithm
The algorithm uses block-coordinate descent with Riemannian gradient descent on the Stiefel manifold:
1. **Outer loop** (max_iter iterations): Cycles through all blocks 2. **Inner loop** (nsteps_inner steps per block): Updates each block's loading matrix 3. **Gradient computation**: Combines reconstruction gradient and penalty gradient 4. **Tangent space projection**: Projects gradient onto Stiefel manifold tangent space 5. **Update step**: Uses either gradient descent or Adam optimizer 6. **Retraction**: Re-orthonormalizes via QR decomposition to maintain manifold constraint
The Stiefel manifold constraint ensures \(\mathbf{V}_i^\top \mathbf{V}_i = \mathbf{I}_k\) at all iterations, which is critical for identifiability and interpretability.
## Convergence Criteria
The outer loop stops when the relative change in the objective function falls below `tol_obj`: $$\frac{|f^{(t+1)} - f^{(t)}|}{|f^{(t)}| + \epsilon} < \text{tol_obj}$$
Optionally, the inner loop can stop early if the Frobenius norm of the change in \(\mathbf{V}_i\) falls below `tol_inner`.
## Preprocessing
Data preprocessing is handled via the `multivarious` package. Common preprocessing steps include centering (default) and scaling. Each block can have its own preprocessor, or a single preprocessor can be applied to all blocks. Preprocessing is "learned" from the input data and stored in the result object for later use on new data.
## Consensus Loading Matrix
When `compute_consensus = TRUE` and blocks have equal feature dimensions, a consensus loading matrix is computed by orthonormalizing the sum of block-specific loadings: $$\mathbf{V}_{\text{consensus}} = \text{orth}\left(\sum_{i=1}^S \mathbf{V}_i\right)$$
This provides a single "average" loading matrix that summarizes the common structure.
Input types
The function supports three input types via S3 methods:
listA list of numeric matrices, each representing a data block
multiblockA multiblock object from the
multivariouspackagemultidesignA multidesign object where each subject is treated as a separate block
Examples
if (FALSE) { # \dontrun{
# Example 1: Basic usage with simulated data
set.seed(123)
data_list <- lapply(1:3, function(i) {
matrix(rnorm(100), 10, 10)
})
res <- penalized_mfa(data_list, ncomp=2, lambda=1, penalty_method="projection",
optimizer="adam", max_iter=50, verbose=TRUE)
print(res)
# Access block-specific loadings
V_list <- attr(res, "V_list")
print(V_list[[1]]) # Loadings for first block
# Plot convergence
obj_vals <- attr(res, "obj_values")
plot(obj_vals, type='b', xlab='Iteration', ylab='Objective',
main='Convergence of Penalized MFA')
# Example 2: With consensus and custom preprocessing
library(multivarious)
res2 <- penalized_mfa(data_list, ncomp=3, lambda=2,
preproc=standardize(), # Center and scale
compute_consensus=TRUE,
verbose=FALSE)
consensus_loadings <- attr(res2, "consensus")
# Example 3: Comparing penalty methods
res_proj <- penalized_mfa(data_list, ncomp=2, lambda=1, penalty_method="projection")
res_mean <- penalized_mfa(data_list, ncomp=2, lambda=1, penalty_method="global_mean")
res_pair <- penalized_mfa(data_list, ncomp=2, lambda=1, penalty_method="pairwise")
# Example 4: Lambda selection via objective values
lambdas <- c(0, 0.1, 0.5, 1, 2, 5, 10)
results <- lapply(lambdas, function(lam) {
fit <- penalized_mfa(data_list, ncomp=2, lambda=lam, verbose=FALSE)
list(lambda=lam, final_obj=tail(attr(fit, "obj_values"), 1))
})
# Example 5: Using with multiblock object
mb <- multiblock(data_list)
res_mb <- penalized_mfa(mb, ncomp=2, lambda=1)
# Example 6: Different preprocessors per block
preproc_list <- list(
center(),
standardize(),
pass() # No preprocessing for third block
)
res_custom <- penalized_mfa(data_list, ncomp=2, lambda=1,
preproc=preproc_list)
# Example 7: Gradient descent instead of Adam
res_gd <- penalized_mfa(data_list, ncomp=2, lambda=1,
optimizer="gradient",
learning_rate=0.001,
max_iter=100)
} # }