Low-rank Alignment — lowrank_align • manifoldalign

Performs low-rank alignment using eigenvalue decomposition. Balances low-rank structure with similarity-based constraints. Supports semi-supervised learning with missing labels.

Usage

lowrank_align(data, y, ...)

# S3 method for class 'hyperdesign'
lowrank_align(
  data,
  y,
  preproc = center(),
  ncomp = 2,
  simfun,
  mu = 0.5,
  lambda = NULL,
  scale_M = FALSE,
  n_cores = NULL,
  sv_thresh = 1,
  solver = c("explicit", "operator"),
  ...
)

Arguments

data: Input data object
y: Variable name for labels (unquoted). Can contain NA values for unlabeled samples in semi-supervised learning scenarios.
...: Additional arguments passed to specific methods
preproc: Preprocessing function (default: center())
ncomp: Number of components to extract (default: 2)
simfun: Function to compute similarity matrix from labels. Should handle NA labels gracefully (e.g., created with createSimFun())
mu: Balance parameter between low-rank (μ=0) and similarity (μ=1) terms (default: 0.5)
lambda: Regularization parameter for glmnet. If NULL (default), uses cross-validation to select optimal lambda via cv.glmnet. If specified, uses the provided value directly.
scale_M: Logical. If TRUE, scales M matrix to have similar eigenvalue magnitude as L. This can improve numerical conditioning but changes the mathematical objective. When enabled, consider adjusting the mu parameter accordingly (default: FALSE)
n_cores: Number of threads for PRIMME eigenvalue computations. If NULL (default), uses system default. Set to 1 for reproducible results across systems.
sv_thresh: Singular value threshold used when forming R (default: 1). Values at or below the threshold are discarded, matching Eq. 11 of the original Low-Rank Alignment paper.
solver: Eigen solver backend. `"explicit"` (default) forms the dense matrices and uses PRIMME, matching the original implementation. `"operator"` keeps low-rank factors and uses RSpectra with a matrix-vector operator to reduce memory and runtime.

Value

The return value depends on the specific method. For hyperdesign objects, returns a multiblock_biprojector object containing alignment results, eigenvectors, preprocessing information, and metadata.

Details

Low-rank alignment optimizes the objective function Z = (1-μ) * M + 2μ * L where M captures low-rank structure and L is the graph Laplacian from similarity matrix. The method balances preserving low-rank structure (μ=0) with enforcing similarity constraints (μ=1).

**Semi-supervised Learning Support:** The algorithm handles NA labels gracefully. Unlabeled samples: - Still contribute to the low-rank structure term M through their data - Do not participate in the similarity constraints (L term) - Receive coordinates in the joint embedding space - Create isolated nodes that produce zero eigenvalues (automatically skipped)

The scale_M parameter controls whether to apply eigenvalue-based scaling: - scale_M = FALSE (default): Uses original formulation Z = (1-μ) * M + 2μ * L - scale_M = TRUE: Applies scaling M := M * (λ₁(L)/λ₁(M)), changing the objective

When scale_M = TRUE, the mu parameter no longer has its original mathematical meaning for balancing the two terms, as the relative scales have been artificially adjusted.

For reproducibility across different systems, set n_cores = 1 to ensure deterministic results from PRIMME eigenvalue computations.

**Handling NA Labels:** Samples with NA labels are supported through the following mechanism: - They contribute to the low-rank reconstruction term M = (I-R)ᵀ(I-R) - They do not participate in similarity constraints (zero rows/columns in C) - They create isolated nodes with zero degree, producing zero eigenvalues - The algorithm automatically detects and skips these zero modes - Final embedding includes coordinates for all samples (labeled and unlabeled)

Examples

# \donttest{
# Example with hyperdesign data
library(multidesign)

# Create synthetic data
set.seed(123)
d1 <- multidesign(matrix(rnorm(10*20), 10, 20), 
                  data.frame(y=1:10, subject=1, run=rep(1:5, 2)))
d2 <- multidesign(matrix(rnorm(10*20), 10, 20), 
                  data.frame(y=1:10, subject=2, run=rep(1:5, 2)))
d3 <- multidesign(matrix(rnorm(10*20), 10, 20), 
                  data.frame(y=1:10, subject=3, run=rep(1:5, 2)))

# Create similarity function (NA-tolerant)
S <- matrix(runif(10*10), 10, 10)
S <- abs(cor(S))
row.names(S) <- colnames(S) <- 1:10
simfun <- createSimFun(S)  # Handles NA labels automatically

# Create hyperdesign and run alignment
hd <- hyperdesign(list(d1, d2, d3))
result <- lowrank_align(hd, y, simfun=simfun)

# Semi-supervised learning with missing labels
d1_semi <- d1
d1_semi$design$y[1:3] <- NA  # Mark some samples as unlabeled
d2_semi <- d2
d2_semi$design$y[1:2] <- NA
hd_semi <- hyperdesign(list(d1_semi, d2_semi, d3))
result_semi <- lowrank_align(hd_semi, y, simfun=simfun)
#> Semi-supervised low-rank alignment: 25 labeled samples, 5 unlabeled samples
#> Detected 5 isolated nodes (unlabeled samples). Will skip corresponding zero eigenvalue modes.
# }