Penalized Multiple Factor Analysis (MFA) — penalized

This function implements a penalized Multiple Factor Analysis decomposition that encourages similarity among block-specific loading matrices. The method uses block-coordinate descent (BCD) with Riemannian optimization on the Stiefel manifold to estimate orthonormal loading matrices for each data block while applying a regularization penalty that promotes structural similarity across blocks.

Usage

penalized_mfa(data, ...)

# S3 method for class 'list'
penalized_mfa(
  data,
  ncomp = 2,
  lambda = 1,
  penalty_method = c("projection", "pairwise", "global_mean"),
  max_iter = 10,
  nsteps_inner = 5,
  learning_rate = 0.01,
  optimizer = c("adam", "gradient"),
  preproc = multivarious::center(),
  beta1 = 0.9,
  beta2 = 0.999,
  adam_epsilon = 1e-08,
  tol_obj = 1e-07,
  tol_inner = NULL,
  compute_consensus = FALSE,
  verbose = FALSE,
  ...
)

# S3 method for class 'multiblock'
penalized_mfa(data, ...)

# S3 method for class 'multidesign'
penalized_mfa(data, subject, ...)

Arguments

data

A list of matrices, a `multiblock` object, or a `multidesign` object. For lists, each element should be a numeric matrix with the same number of rows (observations). Columns represent features and can differ across blocks.

...

Additional arguments passed to methods. Currently unused but included for S3 method consistency.

ncomp

Integer number of latent components to extract (default: 2). This is automatically capped at the minimum number of columns across all blocks after preprocessing. Must be at least 1.

lambda

Non-negative scalar controlling the strength of the penalty (default: 1). Larger values encourage more similarity among block loadings. When `lambda = 0`, the method reduces to independent PCA on each block. Typical range: 0 to 10.

penalty_method

Character string specifying the penalty type (default: "projection"). Options are:

"projection": Rotation-invariant penalty on projection matrices (recommended)
"pairwise": Pairwise Euclidean distances between loading matrices
"global_mean": Distance from mean loading matrix

max_iter

Maximum number of outer block-coordinate descent iterations (default: 10). Typical range: 10 to 100. More iterations allow better convergence but increase computation time.

nsteps_inner

Number of gradient update steps per block in each outer iteration (default: 5). Higher values perform more thorough optimization of each block before moving to the next. Typical range: 1 to 20.

learning_rate

Step size for the optimizer (default: 0.01). Controls how far to move in the gradient direction. Too large may cause divergence; too small slows convergence. Typical range for gradient descent: 0.001 to 0.1. Adam is less sensitive to this parameter.

optimizer

Character string specifying the optimization algorithm (default: "adam"). Options are:

"adam": Adaptive Moment Estimation with momentum and adaptive learning rates
"gradient": Standard gradient descent with fixed learning rate

Adam is generally more robust and converges faster with default settings.

preproc

A preprocessing specification (default: `multivarious::center()`). Can be:

A single `pre_processor` object applied to all blocks
A list of `pre_processor` objects (one per block)
NULL for no preprocessing (identity transformation)

Common options: `center()`, `standardize()`, `pass()` (no-op).

beta1

Adam hyperparameter for first moment decay (default: 0.9). Controls the exponential decay rate for gradient moving average. Typical range: 0.8 to 0.95. Only used when `optimizer = "adam"`.

beta2

Adam hyperparameter for second moment decay (default: 0.999). Controls the exponential decay rate for squared gradient moving average. Typical range: 0.99 to 0.9999. Only used when `optimizer = "adam"`.

adam_epsilon

Small constant added to denominator in Adam for numerical stability (default: 1e-8). Prevents division by zero. Only used when `optimizer = "adam"`.

tol_obj

Numeric tolerance for outer loop convergence based on relative change in objective function (default: 1e-7). Smaller values require more precise convergence. Typical range: 1e-8 to 1e-5.

tol_inner

Optional numeric tolerance for inner loop convergence based on Frobenius norm of change in loading matrix (default: NULL, no early stopping). When specified, inner loop stops if $\|\mathbf{V}_i^{\text{new}} - \mathbf{V}_i\|_F < \text{tol_inner}$.

compute_consensus

Logical indicating whether to compute a consensus loading matrix (default: FALSE). Only computed when all blocks have the same number of features after preprocessing. The consensus is the orthonormalized sum of block-specific loadings.

verbose

Logical indicating whether to print iteration progress (default: FALSE). When TRUE, displays objective values and relative changes at each iteration using the `cli` package.

subject

Required for `multidesign` method: the name (unquoted) of the subject variable used to split data into blocks. Each subject's data becomes a separate block in the analysis.

Value

A `multiblock_projector` object of class `penalized_mfa` with the following components:

v: Concatenated loading matrix (total_features × ncomp) formed by vertically stacking all block-specific loading matrices.
preproc: A `concat_pre_processor` object containing the preprocessing transformations applied to each block. Can be used to transform new data.
block_indices: A named list indicating which rows of `v` correspond to each block. Each element is an integer vector of row indices.

Additional information is stored as attributes:

V_list: List of length S containing the final orthonormal loading matrices for each block. Each element is a (p_i × ncomp) matrix.
obj_values: Numeric vector of objective function values at each iteration, including the initial value. Length is (iterations_run + 1).
consensus: The consensus loading matrix (p_post x ncomp) if `compute_consensus=TRUE`, otherwise `NULL`. This is the orthonormalized average of block loadings over blocks with matching feature dimension.
lambda: The penalty strength used in the optimization.
penalty_method: Character string indicating the penalty type used. May be "none" if penalty was disabled due to inconsistent block dimensions.
iterations_run: Integer indicating how many outer iterations were completed before convergence or reaching max_iter.

Details

## Optimization Problem

For $S$ data blocks $\mathbf{X}_i \in \mathbb{R}^{n \times p_i}$ (where $i = 1, \ldots, S$), the method estimates loading matrices $\mathbf{V}_i \in \mathbb{R}^{p_i \times k}$ with orthonormal columns ($\mathbf{V}_i^\top \mathbf{V}_i = \mathbf{I}_k$) by minimizing:

$$ \min_{\{V_i\}} \sum_{i=1}^S \|\mathbf{X}_i - \mathbf{X}_i \mathbf{V}_i \mathbf{V}_i^\top\|_F^2 \;+\; \lambda \,\text{Penalty}(\{\mathbf{V}_i\}) $$

The first term is the reconstruction error (sum of squared residuals) for each block, and the second term is a penalty that encourages similarity among the loading matrices.

## Penalty Options

Three penalty formulations are available:

Projection (default, rotation-invariant):: $$\sum_{i=1}^S \|\mathbf{P}_i - \bar{\mathbf{P}}\|_F^2$$ where $\mathbf{P}_i = \mathbf{V}_i \mathbf{V}_i^\top$ is the projection matrix for block $i$ and $\bar{\mathbf{P}} = \frac{1}{S}\sum_{i=1}^S \mathbf{P}_i$. This penalty is invariant to rotations of the loading matrices and is recommended when the orientation of the latent space is arbitrary.
Global Mean (not rotation-invariant):: $$\sum_{i=1}^S \|\mathbf{V}_i - \bar{\mathbf{V}}\|_F^2$$ where $\bar{\mathbf{V}} = \frac{1}{S}\sum_{i=1}^S \mathbf{V}_i$. This directly penalizes Euclidean distance from the mean loading matrix. Computationally simpler than pairwise but not rotation-invariant.
Pairwise (not rotation-invariant):: $$\sum_{i < j} \|\mathbf{V}_i - \mathbf{V}_j\|_F^2$$ This penalizes all pairwise Euclidean distances between loading matrices. Equivalent to global mean up to a scaling factor but conceptually different.

## Optimization Algorithm

The algorithm uses block-coordinate descent with Riemannian gradient descent on the Stiefel manifold:

1. **Outer loop** (max_iter iterations): Cycles through all blocks 2. **Inner loop** (nsteps_inner steps per block): Updates each block's loading matrix 3. **Gradient computation**: Combines reconstruction gradient and penalty gradient 4. **Tangent space projection**: Projects gradient onto Stiefel manifold tangent space 5. **Update step**: Uses either gradient descent or Adam optimizer 6. **Retraction**: Re-orthonormalizes via QR decomposition to maintain manifold constraint

The Stiefel manifold constraint ensures $\mathbf{V}_i^\top \mathbf{V}_i = \mathbf{I}_k$ at all iterations, which is critical for identifiability and interpretability.

## Convergence Criteria

The outer loop stops when the relative change in the objective function falls below `tol_obj`: $$\frac{|f^{(t+1)} - f^{(t)}|}{|f^{(t)}| + \epsilon} < \text{tol_obj}$$

Optionally, the inner loop can stop early if the Frobenius norm of the change in $\mathbf{V}_i$ falls below `tol_inner`.

## Preprocessing

Data preprocessing is handled via the `multivarious` package. Common preprocessing steps include centering (default) and scaling. Each block can have its own preprocessor, or a single preprocessor can be applied to all blocks. Preprocessing is "learned" from the input data and stored in the result object for later use on new data.

## Consensus Loading Matrix

When `compute_consensus = TRUE` and blocks have equal feature dimensions, a consensus loading matrix is computed by orthonormalizing the sum of block-specific loadings: $$\mathbf{V}_{\text{consensus}} = \text{orth}\left(\sum_{i=1}^S \mathbf{V}_i\right)$$

This provides a single "average" loading matrix that summarizes the common structure.

Input types

The function supports three input types via S3 methods:

list: A list of numeric matrices, each representing a data block
multiblock: A multiblock object from the multivarious package
multidesign: A multidesign object where each subject is treated as a separate block

Examples

if (FALSE) { # \dontrun{
# Example 1: Basic usage with simulated data
set.seed(123)
data_list <- lapply(1:3, function(i) {
  matrix(rnorm(100), 10, 10)
})
res <- penalized_mfa(data_list, ncomp=2, lambda=1, penalty_method="projection",
                     optimizer="adam", max_iter=50, verbose=TRUE)
print(res)

# Access block-specific loadings
V_list <- attr(res, "V_list")
print(V_list[[1]])  # Loadings for first block

# Plot convergence
obj_vals <- attr(res, "obj_values")
plot(obj_vals, type='b', xlab='Iteration', ylab='Objective',
     main='Convergence of Penalized MFA')

# Example 2: With consensus and custom preprocessing
library(multivarious)
res2 <- penalized_mfa(data_list, ncomp=3, lambda=2,
                      preproc=standardize(),  # Center and scale
                      compute_consensus=TRUE,
                      verbose=FALSE)
consensus_loadings <- attr(res2, "consensus")

# Example 3: Comparing penalty methods
res_proj <- penalized_mfa(data_list, ncomp=2, lambda=1, penalty_method="projection")
res_mean <- penalized_mfa(data_list, ncomp=2, lambda=1, penalty_method="global_mean")
res_pair <- penalized_mfa(data_list, ncomp=2, lambda=1, penalty_method="pairwise")

# Example 4: Lambda selection via objective values
lambdas <- c(0, 0.1, 0.5, 1, 2, 5, 10)
results <- lapply(lambdas, function(lam) {
  fit <- penalized_mfa(data_list, ncomp=2, lambda=lam, verbose=FALSE)
  list(lambda=lam, final_obj=tail(attr(fit, "obj_values"), 1))
})

# Example 5: Using with multiblock object
mb <- multiblock(data_list)
res_mb <- penalized_mfa(mb, ncomp=2, lambda=1)

# Example 6: Different preprocessors per block
preproc_list <- list(
  center(),
  standardize(),
  pass()  # No preprocessing for third block
)
res_custom <- penalized_mfa(data_list, ncomp=2, lambda=1,
                            preproc=preproc_list)

# Example 7: Gradient descent instead of Adam
res_gd <- penalized_mfa(data_list, ncomp=2, lambda=1,
                        optimizer="gradient",
                        learning_rate=0.001,
                        max_iter=100)
} # }