Graph-Anchored Multiple Factor Analysis — graph_anchored

`graph_anchored_mfa()` extends [anchored_mfa()] to settings where auxiliary blocks do not share aligned columns and borrowing across blocks is induced by sparse graphs over both auxiliary features and anchor rows. Rows of each auxiliary block are linked to a common anchor matrix `Y` through `row_index`, while graph Laplacian penalties can encourage both similar auxiliary features across blocks and similar anchor rows in `Y` to share latent structure.

Missing domains are handled by omission: each observed subject-domain pair is treated as one auxiliary block. This supports subjects with only `D1`, subjects with `D1` and `D2`, and mixtures of observed-domain patterns.

Usage

graph_anchored_mfa(
  Y,
  X,
  row_index,
  block_info = NULL,
  preproc = multivarious::center(),
  ncomp = 2,
  normalization = c("MFA", "None", "custom"),
  alpha = NULL,
  score_constraint = c("none", "orthonormal"),
  feature_graph = NULL,
  graph_lambda = 0,
  graph_form = c("laplacian", "adjacency", "normalized_laplacian"),
  score_graph = NULL,
  score_graph_lambda = 0,
  score_graph_form = c("laplacian", "adjacency", "normalized_laplacian"),
  score_graph_k = 10,
  score_graph_weight_mode = c("heat", "binary"),
  score_graph_sigma = NULL,
  max_iter = 50,
  tol = 1e-06,
  ridge = 1e-08,
  verbose = FALSE,
  use_future = FALSE,
  ...
)

Arguments

Y: Numeric matrix/data.frame (`N × q`) serving as the anchored target space.
X: Auxiliary blocks. Either a flat named list of matrices/data frames, or a nested list `X[[subject]][[domain]]`.
row_index: A structure mirroring `X`. Each vector maps rows of the corresponding auxiliary block to rows of `Y`.
block_info: Optional data frame describing flattened blocks. If supplied, it must have one row per flattened block. Recommended columns are `block`, `subject`, and `domain`.
preproc: A `multivarious` preprocessing pipeline (a `pre_processor` or `prepper`) or a list of them. If a list, it must have length `1 + length(flattened_X)` and will be applied to `c(list(Y), flattened_X)`.
ncomp: Integer number of latent components.
normalization: Block weighting scheme. `"MFA"` uses inverse squared first singular values; `"None"` uses uniform weights; `"custom"` uses `alpha`.
alpha: Optional numeric vector of per-block weights. When `normalization = "custom"`, it must have length `1 + length(flattened_X)`, with the first weight corresponding to `Y`.
score_constraint: Identification strategy for the anchored score matrix. `"none"` uses the historical unconstrained update followed by QR normalization inside each ALS iteration. `"orthonormal"` enforces `S transpose S = I` with a constrained majorization/polar update.
feature_graph: Feature-graph specification; see Details.
graph_lambda: Non-negative scalar controlling graph-penalty strength.
graph_form: Interpretation of `feature_graph` when it is matrix-like, or the Laplacian construction used for edge-based inputs.
score_graph: Optional score-graph specification; see Details.
score_graph_lambda: Non-negative scalar controlling row/score-graph smoothing strength.
score_graph_form: Interpretation of `score_graph` when it is matrix-like, or the Laplacian construction used for edge-based inputs.
score_graph_k: Integer number of neighbors used when `score_graph = "knn"`.
score_graph_weight_mode: Weighting applied when `score_graph = "knn"`. `"heat"` uses a Gaussian similarity kernel on neighbor distances; `"binary"` assigns weight 1 to every retained neighbor edge.
score_graph_sigma: Optional positive bandwidth used when `score_graph = "knn"` and `score_graph_weight_mode = "heat"`. If `NULL`, a robust value is estimated from the k-th neighbor distances.
max_iter: Maximum ALS iterations.
tol: Relative convergence tolerance on the objective.
ridge: Non-negative ridge stabilization applied to loading and score updates.
verbose: Logical; if `TRUE`, prints iteration diagnostics.
use_future: Logical; if `TRUE`, block-wise computations that do not depend on one another are performed via `furrr::future_map()` when available.
...: Unused (reserved for future extensions).

Value

An object inheriting from `multivarious::multiblock_biprojector` with additional classes `graph_anchored_mfa`, `anchored_mfa`, and `linked_mfa`. The object contains anchored scores in `s`, auxiliary loadings in `V_list`, anchor loadings in `B`, block metadata in `block_info`, and graph metadata in `graph_laplacian`, `graph_lambda`, `score_graph_laplacian`, and `score_graph_lambda`.

Details

## Model The fitted model has the form: $$Y \approx S B^\top$$ $$X_k \approx S[\mathrm{idx}_k,] V_k^\top$$ where `S` is `N × ncomp`, `B` is `q × ncomp`, and each `V_k` is `p_k × ncomp`.

Let `V` denote the row-wise concatenation of the auxiliary loading matrices `V_k`, let `L_G` be a feature-graph Laplacian over all auxiliary features, and let `L_S` be a score-graph Laplacian over anchor rows of `Y`. The estimator minimizes the anchored-MFA reconstruction loss plus the graph smoothness terms $$\lambda_G \mathrm{tr}(V^\top L_G V) + \lambda_S \mathrm{tr}(S^\top L_S S)$$ and ridge penalties used to identify and stabilize the fitted factors, $$\mathrm{ridge} \left(\|S\|_F^2 + \|B\|_F^2 + \sum_k \|V_k\|_F^2\right).$$ The anchored score matrix can be fit either with the historical unconstrained/QR update (`score_constraint = "none"`) or with an explicit orthonormal constraint (`score_constraint = "orthonormal"`). The score-graph term is equivalent to $$\frac{\lambda_S}{2} \sum_{i,j} w_{ij} \|S_{i\cdot} - S_{j\cdot}\|_2^2$$ for adjacency weights $w_{ij}$, so nearby rows in `Y` are encouraged to have nearby latent scores.

When both `graph_lambda = 0` (or `feature_graph = NULL`) and `score_graph_lambda = 0` (or `score_graph = NULL`), the method reduces to [anchored_mfa()] up to numerical tolerance.

## Input organization `X` and `row_index` may be supplied either as: * flat lists of observed blocks, or * nested subject/domain lists, e.g. `X[[subject]][[domain]]`.

Nested input is flattened internally into one block per observed subject-domain pair. The resulting mapping is recorded in `block_info`.

## Feature graph `feature_graph` may be: * `NULL` (no graph penalty), * `"colnames"` to connect identical auxiliary column names across blocks, * a data frame with columns `block1`, `feature1`, `block2`, `feature2` and optional `weight`, or * a square sparse/dense matrix interpreted according to `graph_form`.

## Score graph `score_graph` may be: * `NULL` (no score penalty), * `"knn"` to construct a symmetric k-nearest-neighbor graph on preprocessed rows of `Y`, * a data frame with columns `row1`, `row2`, and optional `weight`, or * a square sparse/dense matrix interpreted according to `score_graph_form`.

For more control over row-graph construction, an external graph builder such as the `adjoin` package can be used to create a weighted kNN adjacency or Laplacian from `Y`, then supplied here as `score_graph`.

Examples

set.seed(1)
N <- 30
Y <- matrix(rnorm(N * 3), N, 3)
X1 <- matrix(rnorm(15 * 5), 15, 5)
X2 <- matrix(rnorm(15 * 4), 15, 4)
idx <- list(X1 = sample.int(N, 15), X2 = sample.int(N, 15))

fit <- graph_anchored_mfa(
  Y = Y,
  X = list(X1 = X1, X2 = X2),
  row_index = idx,
  ncomp = 2,
  score_graph = "knn",
  score_graph_k = 5,
  score_graph_weight_mode = "heat",
  score_graph_lambda = 1
)
#> Applying the same preprocessor definition independently to each block.