`graph_anchored_mfa()` extends [anchored_mfa()] to settings where auxiliary blocks do not share aligned columns and borrowing across blocks is induced by sparse graphs over both auxiliary features and anchor rows. Rows of each auxiliary block are linked to a common anchor matrix `Y` through `row_index`, while graph Laplacian penalties can encourage both similar auxiliary features across blocks and similar anchor rows in `Y` to share latent structure.
Missing domains are handled by omission: each observed subject-domain pair is treated as one auxiliary block. This supports subjects with only `D1`, subjects with `D1` and `D2`, and mixtures of observed-domain patterns.
Usage
graph_anchored_mfa(
Y,
X,
row_index,
block_info = NULL,
preproc = multivarious::center(),
ncomp = 2,
normalization = c("MFA", "None", "custom"),
alpha = NULL,
score_constraint = c("none", "orthonormal"),
feature_graph = NULL,
graph_lambda = 0,
graph_form = c("laplacian", "adjacency", "normalized_laplacian"),
score_graph = NULL,
score_graph_lambda = 0,
score_graph_form = c("laplacian", "adjacency", "normalized_laplacian"),
score_graph_k = 10,
score_graph_weight_mode = c("heat", "binary"),
score_graph_sigma = NULL,
max_iter = 50,
tol = 1e-06,
ridge = 1e-08,
verbose = FALSE,
use_future = FALSE,
...
)Arguments
- Y
Numeric matrix/data.frame (`N × q`) serving as the anchored target space.
- X
Auxiliary blocks. Either a flat named list of matrices/data frames, or a nested list `X[[subject]][[domain]]`.
- row_index
A structure mirroring `X`. Each vector maps rows of the corresponding auxiliary block to rows of `Y`.
- block_info
Optional data frame describing flattened blocks. If supplied, it must have one row per flattened block. Recommended columns are `block`, `subject`, and `domain`.
- preproc
A `multivarious` preprocessing pipeline (a `pre_processor` or `prepper`) or a list of them. If a list, it must have length `1 + length(flattened_X)` and will be applied to `c(list(Y), flattened_X)`.
- ncomp
Integer number of latent components.
- normalization
Block weighting scheme. `"MFA"` uses inverse squared first singular values; `"None"` uses uniform weights; `"custom"` uses `alpha`.
- alpha
Optional numeric vector of per-block weights. When `normalization = "custom"`, it must have length `1 + length(flattened_X)`, with the first weight corresponding to `Y`.
- score_constraint
Identification strategy for the anchored score matrix. `"none"` uses the historical unconstrained update followed by QR normalization inside each ALS iteration. `"orthonormal"` enforces `S transpose S = I` with a constrained majorization/polar update.
- feature_graph
Feature-graph specification; see Details.
- graph_lambda
Non-negative scalar controlling graph-penalty strength.
- graph_form
Interpretation of `feature_graph` when it is matrix-like, or the Laplacian construction used for edge-based inputs.
- score_graph
Optional score-graph specification; see Details.
- score_graph_lambda
Non-negative scalar controlling row/score-graph smoothing strength.
- score_graph_form
Interpretation of `score_graph` when it is matrix-like, or the Laplacian construction used for edge-based inputs.
- score_graph_k
Integer number of neighbors used when `score_graph = "knn"`.
- score_graph_weight_mode
Weighting applied when `score_graph = "knn"`. `"heat"` uses a Gaussian similarity kernel on neighbor distances; `"binary"` assigns weight 1 to every retained neighbor edge.
- score_graph_sigma
Optional positive bandwidth used when `score_graph = "knn"` and `score_graph_weight_mode = "heat"`. If `NULL`, a robust value is estimated from the k-th neighbor distances.
- max_iter
Maximum ALS iterations.
- tol
Relative convergence tolerance on the objective.
- ridge
Non-negative ridge stabilization applied to loading and score updates.
- verbose
Logical; if `TRUE`, prints iteration diagnostics.
- use_future
Logical; if `TRUE`, block-wise computations that do not depend on one another are performed via `furrr::future_map()` when available.
- ...
Unused (reserved for future extensions).
Value
An object inheriting from `multivarious::multiblock_biprojector` with additional classes `graph_anchored_mfa`, `anchored_mfa`, and `linked_mfa`. The object contains anchored scores in `s`, auxiliary loadings in `V_list`, anchor loadings in `B`, block metadata in `block_info`, and graph metadata in `graph_laplacian`, `graph_lambda`, `score_graph_laplacian`, and `score_graph_lambda`.
Details
## Model The fitted model has the form: $$Y \approx S B^\top$$ $$X_k \approx S[\mathrm{idx}_k,] V_k^\top$$ where `S` is `N × ncomp`, `B` is `q × ncomp`, and each `V_k` is `p_k × ncomp`.
Let `V` denote the row-wise concatenation of the auxiliary loading matrices `V_k`, let `L_G` be a feature-graph Laplacian over all auxiliary features, and let `L_S` be a score-graph Laplacian over anchor rows of `Y`. The estimator minimizes the anchored-MFA reconstruction loss plus the graph smoothness terms $$\lambda_G \mathrm{tr}(V^\top L_G V) + \lambda_S \mathrm{tr}(S^\top L_S S)$$ and ridge penalties used to identify and stabilize the fitted factors, $$\mathrm{ridge} \left(\|S\|_F^2 + \|B\|_F^2 + \sum_k \|V_k\|_F^2\right).$$ The anchored score matrix can be fit either with the historical unconstrained/QR update (`score_constraint = "none"`) or with an explicit orthonormal constraint (`score_constraint = "orthonormal"`). The score-graph term is equivalent to $$\frac{\lambda_S}{2} \sum_{i,j} w_{ij} \|S_{i\cdot} - S_{j\cdot}\|_2^2$$ for adjacency weights \(w_{ij}\), so nearby rows in `Y` are encouraged to have nearby latent scores.
When both `graph_lambda = 0` (or `feature_graph = NULL`) and `score_graph_lambda = 0` (or `score_graph = NULL`), the method reduces to [anchored_mfa()] up to numerical tolerance.
## Input organization `X` and `row_index` may be supplied either as: * flat lists of observed blocks, or * nested subject/domain lists, e.g. `X[[subject]][[domain]]`.
Nested input is flattened internally into one block per observed subject-domain pair. The resulting mapping is recorded in `block_info`.
## Feature graph `feature_graph` may be: * `NULL` (no graph penalty), * `"colnames"` to connect identical auxiliary column names across blocks, * a data frame with columns `block1`, `feature1`, `block2`, `feature2` and optional `weight`, or * a square sparse/dense matrix interpreted according to `graph_form`.
## Score graph `score_graph` may be: * `NULL` (no score penalty), * `"knn"` to construct a symmetric k-nearest-neighbor graph on preprocessed rows of `Y`, * a data frame with columns `row1`, `row2`, and optional `weight`, or * a square sparse/dense matrix interpreted according to `score_graph_form`.
For more control over row-graph construction, an external graph builder such as the `adjoin` package can be used to create a weighted kNN adjacency or Laplacian from `Y`, then supplied here as `score_graph`.
Examples
set.seed(1)
N <- 30
Y <- matrix(rnorm(N * 3), N, 3)
X1 <- matrix(rnorm(15 * 5), 15, 5)
X2 <- matrix(rnorm(15 * 4), 15, 4)
idx <- list(X1 = sample.int(N, 15), X2 = sample.int(N, 15))
fit <- graph_anchored_mfa(
Y = Y,
X = list(X1 = X1, X2 = X2),
row_index = idx,
ncomp = 2,
score_graph = "knn",
score_graph_k = 5,
score_graph_weight_mode = "heat",
score_graph_lambda = 1
)
#> Applying the same preprocessor definition independently to each block.