The YAML file is the durable contract for the non-GUI pipeline. It is
what prepare_firstlevel(), prepare_pls(),
run_pls(), plscli, and the Shiny import/export
path all share.
This article is a reference for that contract. It focuses on
structure, required fields, and common variants. For the scripted
workflow itself, see vignette("scripted-workflows").
The intended starting point is a scaffold, not a blank file:
write_pipeline_template("study.yml")What sections does the spec contain?
The scaffold written by write_pipeline_template() gives
the intended shape.
required_sections
#> [1] "dataset" "design" "first_level" "pls" "execution"
#> [6] "outputs"Those sections have distinct roles:
-
dataset: where the BIDS data live and how subjects/tasks are selected -
design: how first-level regressors are built -
first_level: what first-level outputs to write -
pls: how those outputs are mapped into a PLS analysis -
execution: local versus array execution settings -
outputs: where artifacts are written
What is the smallest valid spec?
The minimum viable spec is deliberately small. You need:
- a BIDS directory
- at least one task label
- a first-level design formula
- a PLS method
- an output root
cat(as.yaml(minimal_spec))
#> dataset:
#> bids_dir: /tmp/RtmpKpXFeS/plsrri-yaml-2900d81dad9/bids
#> task: stroop
#> design:
#> formula: onset ~ hrf(condition, basis = 'spmg1')
#> first_level:
#> output:
#> type: estimates
#> statistics: estimate
#> pls:
#> method: task
#> nperm: 0
#> nboot: 0
#> outputs:
#> root: /tmp/RtmpKpXFeS/plsrri-yaml-2900d81dad9/outThat small object already validates and picks up defaults for
execution.mode, execution.parallelism,
first_level.strategy, and pls.input.
How should you read the top-level sections?
dataset
dataset answers: what study is this, and which task/run
space should be used?
Common fields:
bids_dirtaskspacegroup_column- optional subject/session/run filters
design
design defines the first-level model, not the PLS
contrast. The key field is formula.
design:
formula: onset ~ hrf(condition, basis = 'spmg1')
block: ~ runThat formula is where you choose single-df HRFs versus basis expansions such as FIR or tent-style models.
first_level
first_level controls how the GLM stage is run and what
maps are written.
The important distinction is:
-
type = estimates: write condition-level beta-like maps -
type = contrasts: write named contrast maps -
type = F: write F-statistic outputs
pls
pls tells the second stage how to interpret first-level
outputs.
pls:
method: task
input:
type: estimates
statistic: estimate
nperm: 1000
nboot: 500The method field maps onto supported plsrri
methods such as:
tasktask_nonrotatedbehaviorbehavior_nonrotatedmultiblockmultiblock_nonrotated
How do basis-expanded first-level outputs fit into the spec?
The YAML needs two pieces of information when first-level output labels encode basis functions such as FIR bins or tent functions:
- how first-level labels are written
- how PLS should fold those labels back into a basis-aware manifest
cat(as.yaml(basis_spec))
#> dataset:
#> bids_dir: /tmp/RtmpKpXFeS/plsrri-yaml-2900d81dad9/bids
#> task: stroop
#> design:
#> formula: onset ~ hrf(condition, basis = 'fir', K = 4)
#> first_level:
#> output:
#> type: estimates
#> statistics: estimate
#> basis_pattern: ^(.*)_bin([0-9]+)$
#> basis_order:
#> - bin1
#> - bin2
#> - bin3
#> - bin4
#> pls:
#> method: task
#> input:
#> type: estimates
#> statistic: estimate
#> basis_pattern: ^(.*)_bin([0-9]+)$
#> condition_group: 1
#> basis_group: 2
#> basis_order:
#> - '1'
#> - '2'
#> - '3'
#> - '4'
#> nperm: 0
#> nboot: 0
#> outputs:
#> root: /tmp/RtmpKpXFeS/plsrri-yaml-2900d81dad9/fir-outThe important point is that basis handling belongs in both places:
-
design.formuladetermines what the first-level model estimates -
pls.input.*tells the PLS stage how to reinterpret those basis-labelled maps
How does the spec map to CLI stages?
The YAML is consumed incrementally. Not every stage needs every section.
| CLI stage | Main sections used |
|---|---|
plscli validate |
all |
plscli discover |
dataset, outputs
|
plscli firstlevel-plan |
dataset, design, first_level,
outputs
|
plscli firstlevel-run |
first-level plan artifacts plus execution
|
plscli pls-plan |
pls, outputs
|
plscli pls-run |
pls, planned manifests, outputs
|
plscli report |
outputs or an existing artifact root |
That split is why the same YAML can drive:
- a local end-to-end run
- an HPC array workflow
- a Shiny-exported pipeline configuration
What should you treat as stable?
The stable contract is:
- the YAML structure described here
- the artifact root under
outputs.root - the public helpers:
Where should you go next?
Use vignette("scripted-workflows") for the staged R and
CLI workflow, and ?write_pipeline_template /
?read_pipeline_spec for the corresponding help pages.