Add a script-based stage to a parade flow
script_stage.RdWraps an external script as a first-class pipeline stage with declarative
output file templates and automatic parameter wiring. The script is expected
to write its output files; script_stage() verifies they exist and returns
file reference tibbles compatible with artifact() / file_ref().
Arguments
- fl
A
parade_flowobject- id
Unique stage identifier (character)
- script
Path to the script file (R, Python, bash, etc.)
- produces
Character vector of output declarations. Two forms:
Templates (contain glue braces): glue-style path templates resolved per grid row. Can be named or unnamed (unnamed single defaults to
"output").Names only (no braces): output names only. The script must call
script_returns()to declare actual paths.
- needs
Character vector of upstream stage IDs this stage depends on. For the
sourceengine, upstream outputs are injected into the script environment as{stage}.{field}variables.- engine
Execution engine:
"source"(default, usesbase::source()) or"system"(usesbase::system2()).- interpreter
For
engine = "system", the interpreter command. IfNULL, guessed from the script file extension.- prefix
Whether to prefix output columns with stage ID (default
TRUE).- skip_when
Optional function (or formula) that receives the row's variables (grid columns, upstream outputs, constants). If it returns
TRUE, the stage is skipped for that row and output columns are filled withNA. Useful for avoiding redundant work when the same outputs are shared across multiple grid rows.- skip_if_exists
Logical (default
FALSE). IfTRUE, checks whether all resolved output files already exist before running the script. When they do, the script is skipped and validfile_ref()tibbles are returned (withwritten = FALSE, existed = TRUE). Requires template mode (produces with glue placeholders); using it with manifest mode will error.- use_manifest
Logical (defaults to
skip_if_exists). WhenTRUE, enables a completion manifest that records{params} -> {output_paths}after each successful execution. On subsequent runs the manifest is consulted first, which allows skipping even when theproducestemplate has changed (e.g., after adding a new grid parameter). Seecompletion_manifest(),manifest_adopt(),manifest_clear().- ...
Additional constant arguments passed through to the stage.
Two modes for produces
Template mode — values contain glue placeholders (curly braces).
script_stage() resolves paths, injects them as variables
(output_path, <name>_path), and verifies the files after the
script finishes:
produces = c(model = "results/\{subject\}/model.rds")
Manifest mode — values are plain output names (no braces).
The script decides where to write and calls script_returns() to
declare the paths. script_stage() reads the manifest and verifies:
produces = c("model", "metrics")
Portable scripts with get_arg()
Scripts can use get_arg() to read parameters. It works transparently
with both the source and system engines:
Caching with skip_if_exists
Unlike skip_when, which runs before template resolution and fills
outputs with NA, skip_if_exists runs inside the wrapper after
paths are resolved. This means:
Downstream stages receive real
file_ref()tibbles they can read.The check uses the actual resolved file paths, so it works correctly when multiple grid rows map to the same output files.
If any declared output is missing, the script runs normally.
skip_when and skip_if_exists are orthogonal and can be combined:
skip_when is evaluated first; if it doesn't skip, the wrapper runs
and skip_if_exists is checked next.
Examples
if (FALSE) { # \dontrun{
# Template mode: caller declares paths
flow(grid) |>
script_stage("fit",
script = "scripts/fit_model.R",
produces = "results/{subject}/model.rds"
)
# Template mode: multiple named outputs
flow(grid) |>
script_stage("fit",
script = "scripts/fit_model.R",
produces = c(
model = "results/{subject}/model.rds",
metrics = "results/{subject}/metrics.csv"
)
)
# Manifest mode: script declares paths via script_returns()
flow(grid) |>
script_stage("fit",
script = "scripts/fit_model.R",
produces = c("model", "metrics")
)
# System engine
flow(grid) |>
script_stage("preproc",
script = "scripts/preprocess.py",
engine = "system",
produces = "output/{subject}.nii.gz"
)
} # }