Skip to contents

R-CMD-check codecov

Declarative parallel dataflow for R — from laptop to HPC. Define what to compute, not how to loop. Parade builds typed, parallel workflows; persists large outputs as artifacts (sinks); and talks to SLURM directly (submit, monitor, cancel) so you rarely have to leave R.

Why parade? Clean, composable pipelines with explicit types and lazily persisted outputs, plus first-class HPC ergonomics (portable paths, SLURM defaults, live monitoring).

Install

# development version
# install.packages("remotes")
remotes::install_github("bbuchsbaum/parade")

Note: The CRAN package named parade is unrelated (economic “income parades”). This project is currently GitHub-only.

60-second tour

library(parade)
library(progressr)
handlers(global = TRUE)   # progress bars everywhere

paths_init()              # portable paths: artifacts://, data://, etc.

# Declare the parameter space
grid <- param_grid(subject = c("s01", "s02"), session = 1:2)

# Build a typed, composable pipeline
fl <- flow(grid) |>
  stage(
    id = "fit",
    f = function(subject, session) {
      model <- lm(rnorm(1000) ~ rnorm(1000))
      list(model = model, rmse = runif(1))
    },
    schema = schema(model = artifact(), rmse = dbl()),   # big → artifact, small → memory
    sink   = sink_spec(fields = "model",
                       dir = "artifacts://fits",
                       template = "{.stage}/{subject}/ses{session}-{.row_key}")
  )

# Execute locally or with futures/mirai/SLURM
res <- collect(fl, engine = "future", workers = 4)

res$model[[1]]   # file-ref (path, bytes, sha256, written/existed)
res$rmse         # numeric in-memory

Submit & monitor SLURM jobs from R

# One-command HPC setup (recommended for clusters)
parade_init_hpc(persist = TRUE)

slurm_defaults_set(
  partition = "general",
  time = "2h",           # accepts 2h / 120min / H:MM:SS
  cpus_per_task = 8,
  mem = NA,              # omit --mem if your site forbids it
  persist = TRUE
)

job <- submit_slurm("scripts/train.R", args = c("--fold", "1"))

parade_dashboard(job)  # unified summary (or action = "top" for live UI)
script_status(job)     # quick check
script_tail(job, 80)
script_top(job)     # live CPU/RSS and logs

# Multiple jobs together:
jobs_top(list(job1, job2, job3))

Mirai backend (optional)

Standard future/furrr parallelism is capped at ~125 connections. The mirai backend lifts that limit by running persistent daemon workers that pull tasks from a central dispatcher — giving you low-latency fan-out, automatic load balancing, and optional SSH/TLS transport.

# Local: spin up 8 daemon workers on this machine.
# The dispatcher feeds tasks to whichever worker is free next.
fl |>
  distribute(dist_mirai(n = 8, dispatcher = TRUE)) |>
  collect()

# HPC: launch 32 daemon workers as SLURM jobs.
# Each worker is a persistent R process that pulls work from the dispatcher,
# so you get load balancing across nodes without pre-partitioning the grid.
handle <- fl |>
  distribute(use_mirai_slurm(n = 32, partition = "compute", time = "2h")) |>
  submit()

See Mirai backend for patterns and tradeoffs.

Portable paths (laptop ↔︎ HPC without edits)

Hard-coded paths break when you move between your laptop and a cluster. Parade solves this with protocol-style aliases that resolve to the right directory on each machine:

sink_spec(fields = "model", dir = "artifacts://fits")
#  on laptop → /tmp/parade-artifacts/fits
#  on HPC    → $SCRATCH/parade-artifacts/fits

The aliases — artifacts://, data://, scratch://, registry://, config://, cache:// — check environment variables first (PARADE_ARTIFACTS, PARADE_SCRATCH, …), then fall back to sensible defaults (shared scratch on SLURM, tempdir locally). Override any of them with paths_set() or parade_init_hpc().

See Smart Path Management.

Artifact catalog (discoverability)

# List artifacts under your artifacts root (uses sink sidecars when present)
artifact_catalog()

# Search by stage/field/row_key/path substring
artifact_catalog_search(query = "fit")

Why not {targets} / {drake} / {furrr}?

Parade is deliberately small and compositional: - Dataframe-shaped param grids vs. global DAG caches - Pseudo-typed returns for crisp contracts - Built-in sinks for large results - HPC ergonomics: SLURM submission, defaults, monitoring, path aliases

They play nicely together; parade focuses on elegant, fast fan-out/fan-in.

Contributing

PRs welcome! Please: - follow tidyverse style (lintr + styler), - add tests for new user-facing behavior, - update roxygen and a NEWS entry.

Albers theme

This package uses the albersdown theme. Vignettes are styled with vignettes/albers.css and a local vignettes/albers.js; the palette family is provided via params$family (default ‘red’). The pkgdown site uses template: { package: albersdown }.

Albers theme

This package uses the albersdown theme. Vignettes are styled with vignettes/albers.css and a local vignettes/albers.js; the palette family is provided via params$family (default ‘red’). The pkgdown site uses template: { package: albersdown }.

Albers theme

This package uses the albersdown theme. Existing vignette theme hooks are replaced so albers.css and local albers.js render consistently on CRAN and GitHub Pages. The palette family is provided via params$family (default ‘red’). The pkgdown site uses template: { package: albersdown }.