Skip to contents

Why this comparison matters

Sometimes the response table really defines a small set of shared states across blocks. Sometimes every block has its own paired multivariate response with no exact row sharing. Sometimes you are in between: some rows map back to known anchor states, while others are novel.

This vignette is about the cases where one side is the privileged response surface. Within that scope, muscal has three honest answers:

This vignette shows the same prediction problem under each regime and focuses on the contract you actually need, not on forcing every case through repeated rows of Y.

If neither side is privileged and you want symmetric prediction or completion between paired X and Y block families, use aligned_interbattery() instead; see vignette("aligned_interbattery").

Quick start: paired responses, no exact anchor assumption

If every block has its own paired multivariate response, start with response_aligned_mfa(). If you want the cleaner supervised baseline with no predictor reconstruction term, fit aligned_rrr() alongside it.

fit_response <- response_aligned_mfa(
  Y = blockwise$train$Y,
  X = blockwise$train$X,
  ncomp = 2,
  preproc = multivarious::pass(),
  response_preproc = multivarious::pass(),
  normalization = "None",
  ridge = 1e-8,
  max_iter = 80,
  tol = 1e-9
)

fit_rrr <- aligned_rrr(
  Y = blockwise$train$Y,
  X = blockwise$train$X,
  ncomp = 2,
  preproc = multivarious::pass(),
  response_preproc = multivarious::pass(),
  ridge = 1e-8,
  max_iter = 80,
  tol = 1e-9
)
kable(quick_summary, align = c("l", "r"))
model mean_test_mse
response_aligned_mfa() 0.0093
aligned_rrr() 0.0092

Both models predict from block-specific X into a shared multivariate response space. The practical difference is structural: response_aligned_mfa() also models the predictor blocks through a shared latent space, while aligned_rrr() is the lean reduced-rank baseline.

Which data contract do you actually have?

chooser <- data.frame(
  Data_regime = c(
    "Exact shared anchor states",
    "Some anchored rows, some novel rows",
    "No anchor states, paired blockwise responses"
  ),
  What_is_observed = c(
    "One anchor-level Y table plus row maps from each X block",
    "Blockwise Y_k plus optional anchor_map for the rows you can link",
    "Only blockwise paired (X_k, Y_k)"
  ),
  Recommended_fit = c(
    "anchored_mfa()",
    "response_aligned_mfa()",
    "response_aligned_mfa() plus aligned_rrr() baseline"
  )
)

kable(chooser, align = "l")
Data_regime What_is_observed Recommended_fit
Exact shared anchor states One anchor-level Y table plus row maps from each X block anchored_mfa()
Some anchored rows, some novel rows Blockwise Y_k plus optional anchor_map for the rows you can link response_aligned_mfa()
No anchor states, paired blockwise responses Only blockwise paired (X_k, Y_k) response_aligned_mfa() plus aligned_rrr() baseline

The model choice should follow that contract directly. The point is not to force every problem into repeated rows of Y, but to keep the common-space assumption honest.

When the response really is an anchor table

Use anchored_mfa() when the scientifically meaningful response lives at the anchor-state level and each predictor block only tells you which anchor rows it touches.

fit_anchor <- anchored_mfa(
  Y = exact_anchor$Y_anchor,
  X = exact_anchor$train$X,
  row_index = exact_anchor$train$row_index,
  ncomp = 2,
  preproc = multivarious::pass(),
  normalization = "None",
  ridge = 1e-8,
  max_iter = 80,
  tol = 1e-9
)
kable(anchor_summary, align = c("l", "r", "r", "r"))
model mean_test_mse n_anchor_states learned_score_rows
anchored_mfa() 0.116 24 24

That is the clean anchored contract: the shared score table lives over anchor states, not over block rows, and new X rows are projected back to that anchor-level response surface.

When some rows are anchored but others are novel

This is the regime response_aligned_mfa() was generalized to handle. You keep the blockwise responses Y_k, pass anchor information only where it is real, and let the novel rows stay free.

fit_hybrid <- response_aligned_mfa(
  Y = blockwise$train$Y,
  X = blockwise$train$X,
  ncomp = 2,
  preproc = multivarious::pass(),
  response_preproc = multivarious::pass(),
  normalization = "None",
  anchor_response = blockwise$anchor_response,
  anchor_map = blockwise$train$anchor_map,
  coupling_lambda = 2,
  ridge = 1e-8,
  max_iter = 80,
  tol = 1e-9
)
kable(hybrid_summary, align = c("l", "r"))
prediction_path mean_test_mse
X only 0.0093
X + test-time anchor_map 0.0093

The important point is not just that the fit stores both Z_list and S. It is that prediction stays honest: default prediction uses only X, while known test-time anchor information can refine the score solve when you truly have it.

How should you choose?

  • Use anchored_mfa() when the shared response really is an anchor-state object and your blocks connect to it through row maps.
  • Use response_aligned_mfa() when each block has its own paired multivariate response and you want a shared latent geometry for the predictor blocks.
  • Add anchor_map and anchor_response to response_aligned_mfa() only when those anchors are genuine; missing anchor rows should stay missing.
  • Fit aligned_rrr() alongside response_aligned_mfa() when prediction is the main target and you want to know whether the extra predictor-reconstruction structure is actually buying anything.

Where next?