Declarative parallel dataflow for R — from laptop to HPC. Define what to compute, not how to loop. Parade builds typed, parallel workflows; persists large outputs as artifacts (sinks); and talks to SLURM directly (submit, monitor, cancel) so you rarely have to leave R.
- Website & docs: https://bbuchsbaum.github.io/parade/
- Source: https://github.com/bbuchsbaum/parade
Why parade? Clean, composable pipelines with explicit types and lazily persisted outputs, plus first-class HPC ergonomics (portable paths, SLURM defaults, live monitoring).
Install
# development version
# install.packages("remotes")
remotes::install_github("bbuchsbaum/parade")Note: The CRAN package named parade is unrelated (economic “income parades”). This project is currently GitHub-only.
60-second tour
library(parade)
library(progressr)
handlers(global = TRUE) # progress bars everywhere
paths_init() # portable paths: artifacts://, data://, etc.
# Declare the parameter space
grid <- param_grid(subject = c("s01", "s02"), session = 1:2)
# Build a typed, composable pipeline
fl <- flow(grid) |>
stage(
id = "fit",
f = function(subject, session) {
model <- lm(rnorm(1000) ~ rnorm(1000))
list(model = model, rmse = runif(1))
},
schema = schema(model = artifact(), rmse = dbl()), # big → artifact, small → memory
sink = sink_spec(fields = "model",
dir = "artifacts://fits",
template = "{.stage}/{subject}/ses{session}-{.row_key}")
)
# Execute locally or with futures/mirai/SLURM
res <- collect(fl, engine = "future", workers = 4)
res$model[[1]] # file-ref (path, bytes, sha256, written/existed)
res$rmse # numeric in-memory- Artifacts (sinks) keep memory tiny and runs resumable — see the Artifacts vignette.
-
Portable paths like
artifacts://resolve to scratch on HPC or temp on laptops — see Smart Path Management. -
Typed returns (
dbl(),int(),lst(),artifact()) catch mistakes early — see Core concepts.
Submit & monitor SLURM jobs from R
# One-command HPC setup (recommended for clusters)
parade_init_hpc(persist = TRUE)
slurm_defaults_set(
partition = "general",
time = "2h", # accepts 2h / 120min / H:MM:SS
cpus_per_task = 8,
mem = NA, # omit --mem if your site forbids it
persist = TRUE
)
job <- submit_slurm("scripts/train.R", args = c("--fold", "1"))
parade_dashboard(job) # unified summary (or action = "top" for live UI)
script_status(job) # quick check
script_tail(job, 80)
script_top(job) # live CPU/RSS and logs
# Multiple jobs together:
jobs_top(list(job1, job2, job3))- Defaults & omit-by-NA are built in — see Using SLURM Defaults.
-
script_top()/jobs_top()give a text dashboard for CPU/RSS + logs — see SLURM script submission & monitoring. - Distribution options (grouping/barriers, throttling, chunking) are declarative — see Distribution: local & SLURM.
Mirai backend (optional)
Standard future/furrr parallelism is capped at ~125 connections. The mirai backend lifts that limit by running persistent daemon workers that pull tasks from a central dispatcher — giving you low-latency fan-out, automatic load balancing, and optional SSH/TLS transport.
# Local: spin up 8 daemon workers on this machine.
# The dispatcher feeds tasks to whichever worker is free next.
fl |>
distribute(dist_mirai(n = 8, dispatcher = TRUE)) |>
collect()
# HPC: launch 32 daemon workers as SLURM jobs.
# Each worker is a persistent R process that pulls work from the dispatcher,
# so you get load balancing across nodes without pre-partitioning the grid.
handle <- fl |>
distribute(use_mirai_slurm(n = 32, partition = "compute", time = "2h")) |>
submit()See Mirai backend for patterns and tradeoffs.
Portable paths (laptop ↔︎ HPC without edits)
Hard-coded paths break when you move between your laptop and a cluster. Parade solves this with protocol-style aliases that resolve to the right directory on each machine:
sink_spec(fields = "model", dir = "artifacts://fits")
# on laptop → /tmp/parade-artifacts/fits
# on HPC → $SCRATCH/parade-artifacts/fitsThe aliases — artifacts://, data://, scratch://, registry://, config://, cache:// — check environment variables first (PARADE_ARTIFACTS, PARADE_SCRATCH, …), then fall back to sensible defaults (shared scratch on SLURM, tempdir locally). Override any of them with paths_set() or parade_init_hpc().
Artifact catalog (discoverability)
# List artifacts under your artifacts root (uses sink sidecars when present)
artifact_catalog()
# Search by stage/field/row_key/path substring
artifact_catalog_search(query = "fit")Why not {targets} / {drake} / {furrr}?
Parade is deliberately small and compositional: - Dataframe-shaped param grids vs. global DAG caches - Pseudo-typed returns for crisp contracts - Built-in sinks for large results - HPC ergonomics: SLURM submission, defaults, monitoring, path aliases
They play nicely together; parade focuses on elegant, fast fan-out/fan-in.
Learn more
- Parade core: flows, stages, schemas → https://bbuchsbaum.github.io/parade/articles/parade-core.html
- Artifacts & sinks → https://bbuchsbaum.github.io/parade/articles/parade-artifacts.html
- Distribution (local/SLURM) → https://bbuchsbaum.github.io/parade/articles/parade-slurm-distribution.html
- SLURM monitoring → https://bbuchsbaum.github.io/parade/articles/parade-scripts-monitoring.html
- Mirai backend → https://bbuchsbaum.github.io/parade/articles/parade-mirai.html
- Paths → https://bbuchsbaum.github.io/parade/articles/parade-paths.html
Contributing
PRs welcome! Please: - follow tidyverse style (lintr + styler), - add tests for new user-facing behavior, - update roxygen and a NEWS entry.