Generate Contrast Matrices

Creates a numeric contrast matrix for use in RSA or encoding models, based on condition labels and a specification.

Usage

contrasts(
  labels = NULL,
  spec,
  metadata = NULL,
  data = NULL,
  centre = TRUE,
  scale = c("none", "sd", "l2"),
  orth = FALSE,
  keep_attr = TRUE
)

Arguments

labels: Character vector. Required if `metadata` is NULL. Specifies all unique condition labels in the desired order for the rows of the contrast matrix.
spec: Formula. Defines the contrasts. If `metadata` is NULL, uses the mini-DSL (see Details). If `metadata` is provided, uses standard R formula syntax referencing columns in `metadata` (excluding the `label` column).
metadata: Optional tibble/data.frame. If provided, it must contain a `label` column matching the conditions, and other columns representing features or factors used in the `spec` formula. `labels` argument is ignored if `metadata` is provided.
data: Ignored in this version. Reserved for future extensions allowing direct input of feature matrices or RDMs for PCA/MDS contrasts.
centre: Logical. If TRUE (default), columns of the resulting matrix are mean-centered.
scale: Character string specifying scaling method after centering (if `orth=FALSE`). Options: `"none"` (default), `"sd"` (divide by sample standard deviation), `"l2"` (divide by L2 norm / vector length to get unit vectors). This argument is *ignored* if `orth = TRUE`.
orth: Logical. If FALSE (default), the matrix columns represent the specified contrasts directly (after centering/scaling). If TRUE, an orthonormal basis for the column space is computed via QR decomposition. Resulting columns will be orthogonal and have unit length (L2 norm = 1).
keep_attr: Logical. If TRUE (default) and `orth = TRUE`, the original column names (before orthogonalization) are stored in `attr(C, "source")`.

Value

A numeric matrix (K x Q), where K is the number of labels and Q is the number of contrasts/orthogonal components. If `orth = TRUE` and `keep_attr = TRUE`, it includes attributes detailing the source (`"source"`) and any dropped (`"dropped"`) columns due to rank deficiency.

Details

This function provides two main ways to define contrasts:

Via a `labels` vector and a `spec` formula using a mini-DSL like `~ factor1(levelA + levelB ~ levelC + .) + factor2(...)`.
Via a `metadata` tibble (containing condition labels and predictor columns) and a standard R formula `spec` (e.g., `~ pred1 + pred2 + pred1:pred2`).

The function automatically handles centering, scaling, and optional orthogonalization.

**Mini-DSL for `spec` (when `metadata` is NULL):** The formula should be of the form `~ name1(levelsA ~ levelsB) + name2(...)`.

`name1`, `name2`, etc., become the factor/contrast names. These are used to generate initial binary (+1/-1/0) columns.
`levelsA` are condition labels (from `labels` argument) separated by `+`. These get coded +1 for the named factor.
`levelsB` are condition labels separated by `+`, or `.` (period). These get coded -1 for the named factor. `.` means "all labels not listed in `levelsA`".
Labels not mentioned in a factor definition get coded 0 for that factor.
Interaction terms (e.g., `factorName1:factorName2`) can be included in `spec`. These are passed to `model.matrix` which computes them based on the previously generated factor columns.

If `centre = TRUE` (default), the resulting columns from `model.matrix` are mean-centered. For binary factors created by the DSL (e.g. +1/-1/0 coding), if groups are balanced, they might already be near zero-mean. The explicit centering step ensures this property regardless of input or balance.

Orthogonalization

If `orth = TRUE`, uses `qr.Q(qr(C))` to find an orthonormal basis. The number of columns in the output will be the rank of the input matrix. Columns are renamed `Orth1`, `Orth2`, etc. Scaling is ignored as the columns already have unit L2 norm. If `keep_attr = TRUE`: `attr(C_orth, "source")` stores the names of the original columns that formed the basis for the orthogonalized matrix. `attr(C_orth, "dropped")` stores the names of original columns that were linearly dependent and thus not part of the basis, if any.

Scaling

Applied *after* centering if `orth=FALSE`.

`"none"`: No scaling.
`"sd"`: `scale(..., center=FALSE, scale=TRUE)`. Uses sample standard deviation (N-1 denominator). Note that for columns with few unique values (e.g., a centered +/-1 contrast), the SD can be slightly different depending on whether the number of items is even or odd, due to the N-1 denominator. This might lead to minor differences in scaled norms.
`"l2"`: Divides each column by its L2 norm (`sqrt(sum(x^2))`).

Specific Behaviors

If `orth = TRUE` and the input matrix has only one column after potential centering, that column is scaled to unit L2 norm. Centering still depends on the `centre` argument.
If `centre = FALSE` and `orth = TRUE`, the QR decomposition is performed on the *uncentered* columns.
If the mini-DSL `. ` notation is used for `levelsB` and `levelsA` already contains all `labels`, `levelsB` becomes empty, potentially resulting in a constant (zero) column before centering. A warning is issued in this case.

Masking

This function masks the `stats::contrasts` function. To use the base R function, explicitly call `stats::contrasts()`.

Examples

labs <- c("faces","animals","plants","tools",
          "vehicles","furniture","buildings","food")

# 1) Mini-DSL: 2x2 Factorial (Animacy x Size) + Interaction, Orthonormal
C1 <- contrasts(
        labels = labs,
        spec   = ~ anim( faces + animals + plants + food ~ . )
                 + size( faces + animals + tools + furniture ~ . )
                 + anim:size,
        orth   = TRUE)
print(colnames(C1))
#> [1] "Orth1" "Orth2" "Orth3"
print(attr(C1, "source"))
#> [1] "anim"      "size"      "anim:size"
print(round(crossprod(C1), 5))
#>       Orth1 Orth2 Orth3
#> Orth1     1     0     0
#> Orth2     0     1     0
#> Orth3     0     0     1

# 2) Mini-DSL: One-vs-rest, Centered, Unit Length (L2)
C2 <- contrasts(labels = labs,
                spec   = ~ faces( faces ~ . ) + tools( tools ~ . ),
                scale = "l2")
print(round(colSums(C2^2), 5)) # Should be 1
#> faces tools 
#>     1     1 

# 3) Metadata + Formula: Centered, Scaled (SD)
meta <- tibble::tribble(
  ~label,      ~anim, ~size,
  "faces",        1,    0,
  "animals",      1,    0,
  "plants",       1,    1,
  "tools",        0,    0,
  "vehicles",     0,    1,
  "furniture",    0,    0,
  "buildings",    0,    1,
  "food",         1,    1)
# Note: labels argument is ignored here, order comes from meta$label
# Also note: This function masks stats::contrasts
C3 <- contrasts(metadata = meta,
                spec     = ~ anim + size + anim:size,
                scale    = "sd")
print(round(colMeans(C3), 5)) # Should be 0
#>      anim      size anim:size 
#>         0         0         0 
print(round(apply(C3, 2, sd), 5)) # Should be 1
#>      anim      size anim:size 
#>         1         1         1