CPCA Filtering • dkge

CPCA filtering in DKGE provides a principled framework for decomposing task-related variance into interpretable subcomponents before fitting latent bases. When your experimental design includes multiple types of effects—such as main effects versus interactions, experimental conditions versus control baselines, or different cognitive domains—CPCA allows you to analyze these effect types separately while maintaining their mathematical relationships.

The method operates directly in the effect space defined by the design kernel $K$ , ensuring that the resulting bases remain $K$ -orthogonal and preserve the interpretability of your experimental design structure. This vignette demonstrates what CPCA filtering accomplishes, provides guidance on when it proves most valuable, and walks through practical implementation strategies.

When and Why to Use CPCA Filtering

CPCA filtering becomes valuable when your experimental design naturally contains multiple types of task effects that warrant separate analysis. Rather than analyzing all effects together in a single latent space, CPCA allows you to focus your analysis on specific effect types while cleanly separating others.

Common scenarios where CPCA proves beneficial:

Factorial designs: Separate main effects from interaction terms to understand how basic experimental manipulations differ from their combined effects. For example, in a 2×2 design studying attention and working memory, you might isolate the main effects of each factor from their interaction.

Experimental versus control conditions: Focus analysis on your experimental manipulations while factoring out baseline or control conditions. This approach can reveal cleaner patterns in your conditions of interest.

Multi-domain studies: When studying different cognitive processes within the same experiment, separate effects related to different domains (e.g., working memory versus attention) to understand domain-specific versus shared neural mechanisms.

Planned versus exploratory contrasts: Isolate your primary hypotheses from exploratory or secondary analyses, ensuring that your main effects of interest receive focused statistical attention.

How CPCA Filtering Works

The compressed covariance $\hat C$ that DKGE analyzes contains variance from all experimental effects mixed together. CPCA filtering mathematically separates this total variance into distinct subcomponents before eigendecomposition. The process involves several key steps that preserve the mathematical relationships defined by your design kernel $K$ .

First, DKGE constructs a projector onto your chosen effect subspace using the $K$ metric through dkge_projector_K(). This projector respects the similarity relationships between experimental effects that are encoded in your design kernel. Next, dkge_cpca_split_chat() applies this projector to split the compressed covariance into design and residual components. Finally, DKGE fits separate bases for the components you request while preserving $K$ -orthogonality between the returned bases, ensuring they can be analyzed jointly.

You can specify the design-aligned subspace either by naming specific effects through cpca_blocks or by providing an explicit basis matrix via cpca_T. The cpca_part argument determines which filtered components are returned ("design", "resid", or "both"), while cpca_ridge optionally adds regularization to stabilize small eigenvalues during decomposition.

Simulated Experiment: Attention-Working Memory Study

To demonstrate CPCA filtering with realistic neuroimaging scenarios, we simulate data from a factorial attention-working memory experiment. Our design includes two main effects (attention cue validity and working memory load), their interaction, plus additional control conditions. This structure naturally lends itself to CPCA analysis, where we might want to isolate the main experimental effects from their interaction and control conditions.

The simulated dataset contains strong signals in the primary experimental manipulations (attention and working memory main effects) and weaker but meaningful variance in the interaction term and control conditions. This mirrors real neuroimaging studies where primary effects of interest typically show stronger and more consistent patterns than secondary effects.

S <- 8
q <- 6
P <- 16
Tlen <- 80
effects <- c("attn_valid", "attn_invalid", "wmem_high", "wmem_low", "interact", "control")

betas <- replicate(S, {
  # Strong signals for main experimental effects
  main_effects <- matrix(rnorm(2 * P, sd = 1.5), 2, P)
  # Weaker signals for interaction and control conditions
  secondary_effects <- matrix(rnorm((q - 2) * P, sd = 0.4), q - 2, P)
  mat <- rbind(main_effects, secondary_effects)
  rownames(mat) <- effects
  mat
}, simplify = FALSE)

designs <- replicate(S, {
  X <- matrix(rnorm(Tlen * q), Tlen, q)
  X <- qr.Q(qr(X))
  colnames(X) <- effects
  X
}, simplify = FALSE)

subjects <- Map(function(b, X, id) dkge_subject(b, X, id = id),
                betas, designs, paste0("sub", seq_len(S)))
bundle <- dkge_data(subjects)

When we fit the standard DKGE model without CPCA filtering, all task-related variance is compressed into a single latent basis. The leading eigenvalues reflect the combined influence of both our primary experimental effects and secondary conditions, making it difficult to isolate the specific patterns we want to study.

fit_plain <- dkge(bundle, K = diag(q), rank = 3)
round(fit_plain$evals[1:4], 3)
#> [1] 2507 2107  209  180

These eigenvalues represent the mixed signal from all experimental conditions. While this standard approach captures the dominant patterns in the data, it doesn’t allow us to focus specifically on our primary experimental manipulations versus their interactions and control conditions.

Isolating Primary Experimental Effects

Now we apply CPCA filtering to separate our primary experimental effects (attention and working memory main effects) from the secondary effects (interaction and control conditions). We achieve this by identifying the first two effects in our design as the “design-aligned” subspace through cpca_blocks = 1:2.

Setting cpca_part = "both" instructs DKGE to return both the design-aligned basis (focused on our primary effects) and the residual basis (containing the secondary effects). This dual analysis allows us to examine both effect types while maintaining their mathematical independence.

fit_cpca <- dkge(bundle,
                 K = diag(q),
                 cpca_blocks = 1:2,
                 cpca_part = "both",
                 rank = 3)
#> Warning: Requested rank 3 exceeds effective rank 2. Reducing to 2 components.

fit_cpca$cpca$part
#> [1] "both"
round(fit_cpca$cpca$evals_design[1:3], 3)
#> [1] 2503 2095    0
round(fit_cpca$cpca$evals_resid[1:3], 3)
#> [1] 213 182 157

Notice how CPCA filtering has cleanly separated the variance components. The design eigenvalues now reflect only the primary experimental effects we specified, while the residual eigenvalues capture the secondary effects including interactions and control conditions. This separation allows for focused analysis of each effect type.

The mathematical beauty of this approach lies in the preservation of $K$ -orthogonality between the design and residual bases. This orthogonality ensures that the two sets of components are mathematically independent in the design kernel metric, preventing any contamination between primary and secondary effect patterns:

Ud <- fit_cpca$cpca$U_design
Ur <- fit_cpca$cpca$U_resid
round(max(abs(t(Ud) %*% fit_cpca$K %*% Ur)), 6)
#> [1] 0

Influence of the Design Kernel

The split is metric-aware: changing kernel alters which directions count as “design-aligned.” A smooth kernel diffuses the projector across neighbouring rows, so design energy leaks into adjacent effects.

K_smooth <- outer(seq_len(q), seq_len(q), function(i, j) 0.7^abs(i - j))
fit_kernel <- dkge(bundle,
                   K = K_smooth,
                   cpca_blocks = 1:2,
                   cpca_part = "both",
                   rank = 3)
round(fit_kernel$cpca$evals_design[1:3], 3)
#> [1] 3273  819    0
round(fit_kernel$cpca$evals_resid[1:3], 3)
#> [1] 309.0  97.9  47.6

Compared with the identity kernel, the smoother kernel draws more variance into the design-aligned slice and slightly spreads the corresponding loadings across adjacent effects. The projector honours the correlation structure encoded by K_smooth, so your choice of kernel directly shapes which latent directions are considered design-driven. When effects do not align with coordinate axes, pass a custom cpca_T that expresses the intended K-weighted span explicitly.

Behind the scenes, DKGE accomplishes this separation through mathematical projectors that operate in the design kernel metric. The function dkge_cpca_split_chat() applies these projectors to split the compressed covariance matrix before eigendecomposition. You can examine this split directly to understand how the total variance is partitioned:

T_design <- diag(1, q)[, 1:2]
split_plain <- dkge_cpca_split_chat(fit_plain$Chat, T_design, fit_plain$K)
round(diag(split_plain$Chat_design), 3)
#> [1] 2301 2297    0    0    0    0
round(diag(split_plain$Chat_resid), 3)
#> [1]   0   0 145 209 158 175

The diagonal elements show how variance is distributed between design and residual components. Notice that when cpca_part = "design" or "both", the fitted model’s compressed covariance matrix (fit_cpca$Chat) equals the design-filtered component, confirming that the analysis focuses specifically on your chosen effects:

max(abs(fit_cpca$Chat - fit_cpca$cpca$Chat_design))
#> [1] 0

Using Custom Effect Combinations

Sometimes your effects of interest don’t correspond to simple subsets of experimental conditions. In such cases, you can provide a custom basis matrix through cpca_T to specify exactly which combinations of effects should be treated as “design-aligned.”

The columns of your custom basis matrix span the linear combinations of effects you want to analyze together. For example, you might want to combine specific experimental conditions or weight certain effects more heavily than others. Here we demonstrate by creating a custom basis that combines multiple conditions with differential weighting:

T_custom <- qr.Q(qr(cbind(c(1, 1, 0, 0, 0, 0),
                          c(0, 0, 2, 1, 0, 0))))
fit_custom <- dkge(bundle,
                   K = diag(q),
                   cpca_T = T_custom,
                   cpca_part = "design",
                   rank = 2)
round(fit_custom$cpca$evals_design[1:2], 3)
#> [1] 2095  150

This custom basis creates two design components: the first combines the attention conditions equally, while the second emphasizes the working memory conditions with higher weight on the high-load condition. The fitted loadings reflect these chosen combinations, while the residual component is omitted since we requested only the design-aligned analysis.

Numerical Stabilization with Ridge Regularization

When working with real neuroimaging data, you may encounter situations where the filtered covariance matrix becomes nearly rank-deficient, leading to numerical instability during eigendecomposition. The optional cpca_ridge parameter addresses this issue by adding a small diagonal ridge term before eigendecomposition, improving numerical stability without substantially altering the results.

fit_ridge <- dkge(bundle,
                  K = diag(q),
                  cpca_blocks = 1:2,
                  cpca_part = "design",
                  cpca_ridge = 1e-3,
                  rank = 3)
diag_shift <- diag(fit_ridge$cpca$Chat_design - fit_ridge$cpca$Chat_design_raw)
round(head(diag_shift), 6)
#> [1] 0.001 0.001 0.001 0.001 0.001 0.001

The ridge regularization adds the specified value to each diagonal element of the covariance matrix, as shown by the consistent shift across diagonal entries. The original unregularized matrix remains available in Chat_design_raw for comparison and diagnostic purposes.

Simplified Interface for CPCA Analysis

For workflows that focus exclusively on CPCA filtering, the dkge_cpca_fit() function provides a streamlined interface that reduces code verbosity. This convenience wrapper handles the CPCA-specific arguments while forwarding all other parameters to the main dkge() function, making your analysis code more concise and readable.

fit_wrapper <- dkge_cpca_fit(bundle,
                             K = diag(q),
                             cpca_blocks = 1:2,
                             cpca_part = "design",
                             rank = 3)
#> Warning: Requested rank 3 exceeds effective rank 2. Reducing to 2 components.
identical(round(fit_wrapper$U, 6), round(fit_cpca$U, 6))
#> [1] TRUE

Summary: Strategic Applications of CPCA Filtering

CPCA filtering proves most valuable when your research questions naturally call for decomposing task-related variance into distinct components. The method excels in several key scenarios:

Factorial experimental designs benefit from CPCA when you need to separate main effects from their interactions, allowing for cleaner interpretation of basic experimental manipulations versus their combined effects.

Multi-domain cognitive studies can use CPCA to isolate domain-specific effects (such as working memory versus attention) while maintaining mathematical independence between cognitive systems.

Hypothesis-driven analyses gain power when CPCA focuses the statistical analysis on planned contrasts while factoring out exploratory or control conditions that might dilute the signal of interest.

Comparative studies across datasets become more interpretable when CPCA ensures that the same types of effects are analyzed consistently, improving the reliability of cross-study comparisons.

The mathematical foundation of CPCA filtering ensures that this decomposition preserves interpretability while maintaining compatibility with all other DKGE tools. Whether you proceed with contrast testing, bootstrap inference, or visualization, the $K$ -orthogonal components can be analyzed using the full DKGE toolkit without modification.

By directing the eigendecomposition toward your specific research questions, CPCA filtering transforms a general-purpose dimension reduction into a targeted analytical tool that respects both your experimental design and your theoretical hypotheses.