Unified API: Functions and Scripts, One Surface
parade-unified-api.RmdThis vignette shows how the same Parade ergonomics apply whether you submit a script or a function. You get one mental model and one set of verbs.
See also:
- Overview: high-level orientation — Overview
- Quickstart: your first pipeline — Quickstart
- Sinks & Artifacts: robust I/O — Sinks
Quick start: function first
library(parade)
# Local dev:
paths_init(quiet = TRUE)
# HPC (recommended):
# parade_init_hpc(persist = TRUE)
# Submit a single function call
job <- slurm_call(
function(x, y) x^2 + y^2,
x = 3,
y = 4,
name_by = "index",
engine = "local"
)
print(job)
cat("Result:", job$result, "\n")Map over inputs (function or script)
files <- c("data1.csv", "data2.csv", "data3.csv")
# Function path
jobs <- slurm_map(files, ~ paste("Processing", .x), .name_by = "stem", .engine = "local")
results <- collect(jobs)
# Script path (CLI flags via args_cli)
jobs <- slurm_map(files, "scripts/process_one.R",
.args = args_cli(input = .x),
.name_by = stem())Packed jobs (HPC-friendly) and “one big machine” best-effort
When you have many tasks, HPC admins often prefer fewer jobs that fully use a node. Use packed mode:
jobs <- slurm_map(
1:10000,
~ .x^2,
.packed = TRUE,
.workers_per_node = 32,
.target_jobs = 200, # optional: choose chunk size from a target job count
.resources = list(nodes = 1, ntasks = 1, cpus_per_task = 32, time = "2h", mem = "16G")
)If you want to think in terms of a cluster shape (e.g., 10 nodes ×
196 cores), use slurm_map_cluster():
jobs <- slurm_map_cluster(
1:10000,
~ .x^2,
nodes = 10,
cpus_per_node = 196,
oversubscribe = 2,
.resources = list(time = "2h", mem = "64G")
)Flows (DAGs): express an allocation shape once
If you’re using flow() + submit(), you can
express “10 nodes × 196 cores” as a distribution spec and keep the DAG
itself unchanged:
fl <- fl |>
distribute(dist_slurm_allocation(
nodes = 10,
cores_per_node = 196,
within = "multicore",
target_jobs = 20, # optional oversubscription for heterogeneous runtimes
resources = list(time = "2h", mem = "64G")
))This is still static partitioning at submit time. If task durations
vary widely, consider dispatcher-style backends
(dist_mirai(..., dispatcher = TRUE) or
dist_crew(...)) for more dynamic load balancing.
Elegant naming and paths
# Naming helpers: stem/index/digest/glue_name
jobs <- slurm_map(files, ~ tools::file_path_sans_ext(.x),
.name_by = stem("sample_(\\d+)"), .engine = "local")
# Path macros in write_result
tmp <- tempdir()
jobs <- slurm_map(1:3, ~ .x^3,
.name_by = index("cube"),
.write_result = file.path(tmp, "results_{name}_{index}.rds"),
.engine = "local")Jobset verbs everywhere
All jobset verbs (await, status,
collect, cancel, progress,
open_logs) work on results from slurm_map().
If you submit a single function with slurm_call(), add
.as_jobset = TRUE to opt into the same surface:
# Wrap a single job as a one‑element jobset
jobs <- slurm_call(
function(file) { Sys.sleep(1); read.csv(file)[1:5, ] },
file = "data/example.csv",
name = "proc-example",
write_result = path$artifacts("results/{run}/{stem}.rds"),
.as_jobset = TRUE
)
# Same verbs as slurm_map()
jobs |> progress() |> collect()
open_logs(jobs, selection = "all")Parallel arguments with pmap
df <- data.frame(x = 1:4, y = 5:8, method = c("add","multiply","add","multiply"))
jobs <- slurm_pmap(df, function(x, y, method) if (method == "add") x + y else x * y,
.name_by = glue_name("{method}-{x}-{y}"), .engine = "local")
collect(jobs)Key arguments at a glance
To keep the surface consistent across functions and scripts, a few arguments show up repeatedly. Here’s what they mean and how to pick values with confidence:
- name_by / .name_by: how job names are generated
- “auto”: sensible default; uses a file stem when it looks like a path, otherwise uses the index (job-1, job-2, …).
- stem(): extracts the filename stem. You can pass a regex to keep just a capture group, e.g., stem(“sample_(\d+)”).
- index(prefix = “job”, width = 0): generates names like job-1 or task-001.
- digest(prefix = “job”, length = 8): short, content-based names.
- glue_name(“template”): template-based; accepts pmap arguments (e.g.,
“{method}-{x}-{y}”) or map element/index via
.x/.i. - slurm_call() uses
name_by = ...; slurm_map()/slurm_pmap() use.name_by = ....
- engine / .engine: where the work runs
- “slurm” (default): submits to SLURM (scripts via submit_slurm(); functions via slurm_call with serialization).
- “local”: runs the function in the current R session (fast for
debugging); still honors
write_resultso you can poke at outputs. - For slurm_map(),
.engineaffects function submissions. Script submissions always go through SLURM (no local script runner is provided).
- write_result / .write_result: on-disk results for functions
- Path template where a function’s return value is saved (e.g., “artifacts://results/{run}/{stem}.rds”).
- Macros like {run}, {stem}, {name}, {index}, {date}, {time}, {user}, {host} are expanded the same way everywhere.
- resources / .resources: how much you ask SLURM for
- Accepts a profile name (e.g., “gpu”, “standard”), a chained profile object (profile() %>% res_time(“2h”) %>% cpus(8)), or a simple list.
- Legacy shortcuts like “cpu8”, “mem32G”, “gpu2” are still recognized.
- .as_jobset (slurm_call only): opt into the jobset surface for single
jobs
- When TRUE, a one‑element
parade_jobsetis returned so you can writejobs |> progress() |> collect()everywhere.
- When TRUE, a one‑element
Summary
- “One surface”: scripts via
submit_slurm(), functions viaslurm_call() -
slurm_map()/slurm_pmap()dispatch correctly for both kinds -
name_by, path macros, resource profiles, and flow control work uniformly - Use
.as_jobset = TRUEto get the same jobset verbs for single calls