Skip to contents

Submits multiple SLURM jobs by mapping a function or script over a vector or list. Automatically dispatches to slurm_call for functions or submit_slurm for scripts.

Usage

slurm_map(
  .x,
  .f,
  ...,
  .args = NULL,
  .name_by = "auto",
  .resources = NULL,
  .packages = character(),
  .write_result = NULL,
  .engine = c("slurm", "local"),
  .progress = FALSE,
  .options = NULL,
  .error_policy = NULL,
  .packed = FALSE,
  .workers_per_node = NULL,
  .chunk_size = NULL,
  .target_jobs = NULL,
  .parallel_backend = c("auto", "callr", "multicore", "multisession")
)

Arguments

.x

Vector or list to map over

.f

Function, formula, or script path to apply to each element

...

Additional arguments passed to the function or script

.args

Named list of additional arguments (alternative to ...)

.name_by

Naming strategy: "auto", "index", "stem", "digest", or a function

.resources

Resource specification (profile name, profile object, list, or NULL)

.packages

Character vector of packages to load (for functions)

.write_result

Path template for saving results (supports macros)

.engine

Execution engine: "slurm" (default) or "local"

.progress

Show progress bar

.options

Flow control options (e.g., wave_policy() or concurrency_limit())

.error_policy

Error handling policy for job failures

.packed

Logical; if TRUE, pack multiple tasks into single SLURM jobs for efficient node utilization (default: FALSE)

.workers_per_node

Integer; number of parallel workers per node when packed (defaults to resources$cpus_per_task if present, else 1)

.chunk_size

Integer; number of tasks per packed job (defaults to .workers_per_node)

.target_jobs

Optional integer; when .packed = TRUE and .chunk_size is not provided, choose a chunk size that yields approximately this many packed jobs (useful for "treat N nodes like one machine" workflows).

.parallel_backend

Backend for within-node parallelism when .packed = TRUE. One of: "callr", "multicore", "multisession", or "auto". Ignored when .packed = FALSE. Defaults to "callr" for strong isolation.

Value

A parade_jobset object containing all submitted jobs

Details

When .f is a function or formula (e.g., ~ .x + 1), each element of .x is passed as the first argument to the function. When .f is a character string path to a script, it's treated as a script submission with appropriate argument conversion.

The .name_by parameter controls job naming:

  • "auto": Automatic naming based on context

  • "index": Use numeric index (job-1, job-2, etc.)

  • "stem": Extract stem from file paths in .x

  • "digest": Use content hash

  • function: Custom naming function receiving element and index

Packed Execution for HPC Efficiency:

Use .packed = TRUE to pack multiple tasks into single SLURM jobs for better node utilization on HPC systems. This is critical when admins expect full-node allocations:

  • Standard mode (.packed = FALSE): 1000 files → 1000 SLURM jobs → likely 1000 nodes

  • Packed mode (.packed = TRUE, .workers_per_node = 20): 1000 files → 50 SLURM jobs → 50 nodes, each using 20 cores

Packed mode automatically:

  • Chunks inputs into batches

  • Requests appropriate cpus_per_task

  • Runs tasks in parallel per node using the selected backend (.parallel_backend): "callr" (default, most isolated), "multicore" (HPC Linux), or "multisession"

  • Works with flow control via .options (e.g., max_in_flight())

  • Preserves element-level naming and result writing with {stem}, {run} macros

Examples

# Local execution example (no SLURM required)
local_jobs <- slurm_map(1:3, ~ .x^2, .engine = "local")
results <- collect(local_jobs)

# \donttest{
# Note: The following examples require a SLURM cluster environment
if (Sys.which("squeue") != "") {
  # Map a function over files
  files <- c("data1.csv", "data2.csv")
  process_data <- function() identity  # stub for example
  jobs <- slurm_map(files, ~ read.csv(.x) |> process_data(),
                    .name_by = "stem",
                    .write_result = "results/{stem}.rds")

  # Map a script with CLI arguments
  jobs <- slurm_map(files, "scripts/process.R",
                    .args = args_cli(verbose = TRUE))

  # Use formula notation with SLURM
  numbers <- 1:10
  jobs <- slurm_map(numbers, ~ .x^2 + .x,
                    .name_by = "index")

  # PACKED EXECUTION: Process 1000 files using 20 cores per node
  # This submits ~50 jobs instead of 1000, making HPC admins happy
  files <- glob("data/*.csv")
  jobs <- slurm_map(
    files,
    ~ read.csv(.x)[1:5, ],
    .name_by = "stem",
    .write_result = path$artifacts("results/{run}/{stem}.rds"),
    .packed = TRUE,
    .workers_per_node = 20,
    .resources = list(cpus_per_task = 20, mem = "64G", time = "4h")
  )
  # Track progress and collect element-level results
  results <- jobs |> progress() |> collect()  # Returns 1000 results

  # Wait for all jobs and collect results
  results <- jobs |> await() |> collect()
}
# }