Performs Barycentric Multiple Factor Analysis (BaMFA) using an alternating
optimization approach based on a two-level factor model.
It decomposes multi-block data (e.g., multiple subjects) into a shared global
subspace (G) and block-specific subspaces (B_i).
Usage
bamfa(
data,
k_g = 2,
k_l = 2,
niter = 10,
preproc = multivarious::center(),
lambda_l = 0,
tol = 1e-05,
subject = NULL,
...
)
# Default S3 method
bamfa(
data,
k_g = 2,
k_l = 2,
niter = 10,
preproc = multivarious::center(),
lambda_l = 0,
tol = 1e-05,
...
)
# S3 method for class 'list'
bamfa(
data,
k_g = 2,
k_l = 2,
niter = 10,
preproc = multivarious::center(),
lambda_l = 0,
tol = 1e-05,
...
)
# S3 method for class 'multiblock'
bamfa(
data,
k_g = 2,
k_l = 2,
niter = 10,
preproc = multivarious::center(),
lambda_l = 0,
tol = 1e-05,
...
)
# S3 method for class 'multidesign'
bamfa(
data,
k_g = 2,
k_l = 2,
niter = 10,
preproc = multivarious::center(),
lambda_l = 0,
tol = 1e-05,
subject,
...
)Arguments
- data
A
listof matrices (each a block, e.g., subject), amultiblockobject, or amultidesignobject. If a list or multiblock, all blocks must have the same number of rows (observations, e.g., conditions). If a multidesign, thesubjectparameter must be specified to indicate how to split the data.- k_g
Integer: Number of global components to extract (shared across blocks).
- k_l
Integer: Number of local components to extract (specific to each block).
- niter
Integer: Maximum number of iterations (default: 10).
- preproc
A preprocessing pipeline object from
multivarious(default:multivarious::center()).- lambda_l
Numeric: Non-negative ridge penalty applied when estimating the local scores
U_iusing the internalls_ridgefunction. Default: 0 (no penalty).- tol
Numeric: Convergence tolerance. Stops if the relative change in the objective function (Mean Squared Error) and/or the global loadings
Gis less thantol. Default: 1e-5.- subject
Optional: A variable name identifying the blocking/subject variable when using a multidesign object. Required only for the multidesign method.
- ...
Additional arguments (currently unused).
Value
A multiblock_projector object with class "bamfa".
The base projector stores the global loading matrix in v, the
concatenated preprocessor in preproc, and block mappings in
block_indices. Additional named elements passed via ... include:
B_list– block-specific local loading matrices.S_list– block-specific global score matrices.U_list– block-specific local score matrices.k_g– number of global components in the final model.k_l– requested number of local components.lambda_l– regularization parameter used for local scoresU_i.niter_actual– actual number of iterations performed.objective_trace– objective function value at each iteration.g_change_trace– relative change in global loadings at each iteration.data_names– names of the input blocks/subjects.proclist_fitted– list of fitted preprocessors for each block.
Details
The algorithm models each data block \(X_i\) as: $$X_i = S_i G^T + U_i B_i^T + E_i$$ where \(G\) represents shared global loadings, \(B_i\) represents block-specific local loadings, \(S_i\) and \(U_i\) are the corresponding scores, and \(E_i\) is noise. Loadings are constrained to be orthonormal (\(G^T G = I, B_i^T B_i = I\)).
The algorithm aims to minimize the total reconstruction error: $$\sum_{i=1}^{m} ||X_i - S_i G^T - U_i B_i^T||_F^2$$ using an iterative alternating optimization strategy (similar to Expectation-Maximization):
Preprocessing: Each block is preprocessed using the provided
preprocpipeline.Initialization: Initialize global loadings
G(via SVD on the mean block).Iterate (E-step like): For each block
i, holdingGfixed, update scoresS_i(global projection), calculate residuals, find the local basisB_ifrom residuals (via SVD), and estimate local scoresU_i(via projection, potentially regularized bylambda_l).Iterate (M-step like): Holding scores
S_ifixed, update the global loadingsGby performing SVD on an aggregated cross-product matrixsum(X_i^T S_i).Repeat steps 3-4 for
niteriterations or until convergence based ontol.
Caveats and Limitations
Model Choice: Assumes a linear factor model with orthogonal global and local components.
Initialization: Results can be sensitive to the initialization of
G.Local Minima: The alternating optimization algorithm may converge to a local minimum.
Interpretation: The separation into global and local components depends on the model fit and ranks (
k_g,k_l).Regularization (
lambda_l): Penalizes the squared Frobenius norm of the local scoresU_ivia ridge regression.Inference: The method does not provide p-values or confidence intervals.
Examples
# Generate example multi-block data (e.g., 3 subjects, 10 conditions, 50 features)
set.seed(123)
n_obs <- 10
n_features <- 50
n_subjects <- 3
data_list <- lapply(1:n_subjects, function(i) {
matrix(rnorm(n_obs * n_features), n_obs, n_features) +
matrix(rnorm(n_obs * 1, mean=i), n_obs, n_features) # Add subject offset
})
names(data_list) <- paste0("Subject_", 1:n_subjects)
# Run BaMFA with k_g=3 global, k_l=2 local components
result <- bamfa(data_list, k_g = 3, k_l = 2, niter = 10)
print(result)
#> Barycentric Multiple Factor Analysis (BaMFA)
#>
#> Model Structure:
#> Global components (k_g): 3
#> Local components (k_l requested): 2
#> Convergence after 10 iterations
#> Local score regularization (lambda_l): 0
#>
#> Block Information:
#> Block 1 ( Subject_1 ):
#> Features: 50
#> Local components retained: 2
#> Block 2 ( Subject_2 ):
#> Features: 50
#> Local components retained: 2
#> Block 3 ( Subject_3 ):
#> Features: 50
#> Local components retained: 2
#>
#> Final reconstruction error (per feature): 0.458139