Performs low-rank alignment using eigenvalue decomposition. Balances low-rank structure with similarity-based constraints. Supports semi-supervised learning with missing labels.
Arguments
- data
Input data object
- y
Variable name for labels (unquoted). Can contain NA values for unlabeled samples in semi-supervised learning scenarios.
- ...
Additional arguments passed to specific methods
- preproc
Preprocessing function (default: center())
- ncomp
Number of components to extract (default: 2)
- simfun
Function to compute similarity matrix from labels. Should handle NA labels gracefully (e.g., created with createSimFun())
- mu
Balance parameter between low-rank (μ=0) and similarity (μ=1) terms (default: 0.5)
- lambda
Regularization parameter for glmnet. If NULL (default), uses cross-validation to select optimal lambda via cv.glmnet. If specified, uses the provided value directly.
- scale_M
Logical. If TRUE, scales M matrix to have similar eigenvalue magnitude as L. This can improve numerical conditioning but changes the mathematical objective. When enabled, consider adjusting the mu parameter accordingly (default: FALSE)
- n_cores
Number of threads for PRIMME eigenvalue computations. If NULL (default), uses system default. Set to 1 for reproducible results across systems.
- sv_thresh
Singular value threshold used when forming R (default: 1). Values at or below the threshold are discarded, matching Eq. 11 of the original Low-Rank Alignment paper.
- solver
Eigen solver backend. `"explicit"` (default) forms the dense matrices and uses PRIMME, matching the original implementation. `"operator"` keeps low-rank factors and uses RSpectra with a matrix-vector operator to reduce memory and runtime.
Value
The return value depends on the specific method. For hyperdesign objects, returns a multiblock_biprojector object containing alignment results, eigenvectors, preprocessing information, and metadata.
Details
Low-rank alignment optimizes the objective function Z = (1-μ) * M + 2μ * L where M captures low-rank structure and L is the graph Laplacian from similarity matrix. The method balances preserving low-rank structure (μ=0) with enforcing similarity constraints (μ=1).
**Semi-supervised Learning Support:** The algorithm handles NA labels gracefully. Unlabeled samples: - Still contribute to the low-rank structure term M through their data - Do not participate in the similarity constraints (L term) - Receive coordinates in the joint embedding space - Create isolated nodes that produce zero eigenvalues (automatically skipped)
The scale_M parameter controls whether to apply eigenvalue-based scaling: - scale_M = FALSE (default): Uses original formulation Z = (1-μ) * M + 2μ * L - scale_M = TRUE: Applies scaling M := M * (λ₁(L)/λ₁(M)), changing the objective
When scale_M = TRUE, the mu parameter no longer has its original mathematical meaning for balancing the two terms, as the relative scales have been artificially adjusted.
For reproducibility across different systems, set n_cores = 1 to ensure deterministic results from PRIMME eigenvalue computations.
**Handling NA Labels:** Samples with NA labels are supported through the following mechanism: - They contribute to the low-rank reconstruction term M = (I-R)ᵀ(I-R) - They do not participate in similarity constraints (zero rows/columns in C) - They create isolated nodes with zero degree, producing zero eigenvalues - The algorithm automatically detects and skips these zero modes - Final embedding includes coordinates for all samples (labeled and unlabeled)
Examples
# \donttest{
# Example with hyperdesign data
library(multidesign)
# Create synthetic data
set.seed(123)
d1 <- multidesign(matrix(rnorm(10*20), 10, 20),
data.frame(y=1:10, subject=1, run=rep(1:5, 2)))
d2 <- multidesign(matrix(rnorm(10*20), 10, 20),
data.frame(y=1:10, subject=2, run=rep(1:5, 2)))
d3 <- multidesign(matrix(rnorm(10*20), 10, 20),
data.frame(y=1:10, subject=3, run=rep(1:5, 2)))
# Create similarity function (NA-tolerant)
S <- matrix(runif(10*10), 10, 10)
S <- abs(cor(S))
row.names(S) <- colnames(S) <- 1:10
simfun <- createSimFun(S) # Handles NA labels automatically
# Create hyperdesign and run alignment
hd <- hyperdesign(list(d1, d2, d3))
result <- lowrank_align(hd, y, simfun=simfun)
# Semi-supervised learning with missing labels
d1_semi <- d1
d1_semi$design$y[1:3] <- NA # Mark some samples as unlabeled
d2_semi <- d2
d2_semi$design$y[1:2] <- NA
hd_semi <- hyperdesign(list(d1_semi, d2_semi, d3))
result_semi <- lowrank_align(hd_semi, y, simfun=simfun)
#> Semi-supervised low-rank alignment: 25 labeled samples, 5 unlabeled samples
#> Detected 5 isolated nodes (unlabeled samples). Will skip corresponding zero eigenvalue modes.
# }