Skip to contents

Note: Code evaluation is disabled in this vignette to keep builds fast and environment-agnostic. Copy code into an interactive R session to run locally or on your cluster.

The Problem: Hardcoded Paths Break Your Code

Imagine you’re developing a neuroimaging analysis on your laptop. Your code might look like this:

# Works on your laptop
model <- readRDS("/Users/alice/projects/brain_study/models/brain_model.rds")
results <- analyze_brain(model)
saveRDS(results, "/Users/alice/projects/brain_study/outputs/results.rds")

But when you or a collaborator runs this code on an HPC cluster, it breaks immediately:

  • /Users/alice/ doesn’t exist on Linux clusters
  • Different clusters have completely different filesystem layouts
  • Writing large files to home directories often violates cluster policies
  • Your collaborator Bob can’t run your code without rewriting all the paths

The Solution: Portable Path Aliases

parade solves this with smart path aliases that automatically adapt to any environment:

library(parade)

# Same code works everywhere
model <- readRDS(resolve_path("artifacts://models/brain_model.rds"))
results <- analyze_brain(model)
saveRDS(results, resolve_path("artifacts://outputs/results.rds"))

The artifacts:// prefix is a portable alias that parade automatically translates:

  • On your laptop: → /var/folders/.../parade-artifacts/
  • On SLURM cluster: → $PARADE_SCRATCH/parade-artifacts/ (preferred) or $SCRATCH/parade-artifacts/ (common). If neither is set, it falls back to $SLURM_TMPDIR (node-local scratch).
  • On your collaborator’s machine: → whatever their appropriate storage location is

No more broken paths. No more rewriting code for different systems.

On clusters, the main “gotcha” is node-local scratch: SLURM_TMPDIR is often local to the compute node running a job. If you put registry:// or artifacts:// there, other jobs (or your login node) may not be able to see the files.

The recommended setup is:

  1. Point Parade at shared scratch (site-specific).
  2. Initialize paths in HPC mode (especially on login nodes).
  3. Run the built-in doctor once to confirm everything is writable.
# In your shell (recommended for most clusters)
export PARADE_SCRATCH="/scratch/$USER"   # or your site's shared scratch
library(parade)

# Auto-detects HPC on compute nodes (scheduler vars), and on many login nodes
# when common scratch variables (SCRATCH/WORK/etc.) are set:
paths_init(profile = "auto", create = TRUE)
parade_doctor(create = TRUE)

Or use the one-command helper (creates paths, scaffolds a SLURM template, and can persist to parade.json):

parade_init_hpc(persist = TRUE)

To make the configuration easy to reuse in SLURM scripts, you can also generate shell exports:

cat(paste(paths_export(), collapse = "\n"))

If you prefer not to create directories automatically, use:

paths_init(profile = "hpc")
paths_validate(create = FALSE)

Quick Start: Your First Portable Analysis

Let’s make a simple analysis portable in three steps:

Step 1: Initialize parade’s path system

library(parade)

# Auto-detect your environment and set up paths
paths_init()

# Tip (HPC): on a login node or if your scheduler env isn't set, use:
# paths_init(profile = "hpc"); parade_doctor(create = TRUE)

# See where your aliases point
paths_get()
#> project:   /home/alice/myproject
#> data:      /home/alice/myproject/data
#> artifacts: /tmp/RtmpXYZ/parade-artifacts
#> registry:  /tmp/RtmpXYZ/parade-registry

Step 2: Use aliases instead of hardcoded paths

# Before (breaks on different systems):
data <- read.csv("/home/alice/myproject/data/experiment.csv")
saveRDS(model, "/home/alice/myproject/outputs/model.rds")

# After (works everywhere):
data <- read.csv(resolve_path("data://experiment.csv"))
saveRDS(model, resolve_path("artifacts://model.rds"))

Step 3: Run the same code anywhere

# On your laptop
paths_init()
saveRDS(big_model, resolve_path("artifacts://models/final.rds"))
# Saves to: /var/folders/temp/parade-artifacts/models/final.rds

# On HPC cluster (same code!)
paths_init(profile = "hpc")
saveRDS(big_model, resolve_path("artifacts://models/final.rds"))
# Saves to: $PARADE_SCRATCH/parade-artifacts/models/final.rds (or $SCRATCH/parade-artifacts/...)

That’s it! Your code now works on any system without modification.

Understanding the Seven Path Aliases

parade provides seven aliases, each designed for a specific type of data in your workflow:

Core Data Aliases

data:// - Input datasets (read-only)

# Your raw data, reference files, shared datasets
brain_atlas <- readRDS(resolve_path("data://references/MNI_atlas.rds"))
subjects <- read.csv(resolve_path("data://participants.csv"))

artifacts:// - Analysis outputs (large files)

# Models, results, processed data - goes to fast scratch storage
saveRDS(fitted_model, resolve_path("artifacts://models/model_v2.rds"))
write.csv(results, resolve_path("artifacts://results/final_results.csv"))

project:// - Your code and scripts

# Source files, small configuration files
source("project://R/analysis_functions.R")
params <- yaml::read_yaml("project://config/params.yaml")

System Aliases

scratch:// - Temporary files (deleted after jobs)

# Intermediate files, working data
temp_file <- "scratch://temp_processing.rds"

registry:// - Job management files

# SLURM templates, job scripts (managed by parade)
template <- "registry://templates/my_slurm.tmpl"

config:// - parade configuration

# Settings, profiles (usually automatic)
"config://profiles/production.json"

cache:// - Downloaded/cached data

# Reusable downloads, package data
"cache://downloaded/large_dataset.tar.gz"

Common Scenarios

Scenario 1: Laptop Development → HPC Production

You’re developing on your laptop with small test data, then running full analysis on a cluster:

# During development (laptop)
paths_init()
paths_set(
  data = "~/projects/test_data",      # Small test dataset
  artifacts = "~/projects/outputs"     # Local outputs
)

# Your analysis code (unchanged!)
run_analysis <- function() {
  data <- readRDS("data://brain_scans.rds")
  model <- fit_model(data)
  saveRDS(model, "artifacts://fitted_model.rds")
}

# In production (HPC cluster) 
paths_init()
paths_set(
  data = "/shared/datasets/full_data",     # Full dataset
  artifacts = "/scratch/$USER/outputs"     # Fast scratch storage
)

# Same analysis code still works!
run_analysis()

Scenario 2: Collaboration with Different Systems

Alice and Bob are collaborating but have different setups:

# Alice's setup (Mac laptop)
paths_init()
#> artifacts: /var/folders/abc/temp/parade-artifacts

# Bob's setup (Linux workstation) 
paths_init()
#> artifacts: /tmp/bob/parade-artifacts

# Shared analysis code (works for both!)
analyze_subjects <- function(subjects) {
  for (subj in subjects) {
    data <- readRDS(resolve_path(sprintf("data://subjects/%s.rds", subj)))
    results <- process_subject(data)
    saveRDS(results, resolve_path(sprintf("artifacts://results/%s.rds", subj)))
  }
}

Scenario 3: Multi-stage Pipeline with Different Storage Needs

Different stages of your pipeline need different storage strategies:

library(parade)

# Configure storage for each data type
paths_init()
paths_set(
  data = "/shared/readonly/inputs",       # Shared input data
  scratch = Sys.getenv("SLURM_TMPDIR"),  # Fast local SSD
  artifacts = "/scratch/$USER/outputs"    # Persistent scratch
)

# Pipeline uses appropriate storage for each stage
flow(subjects) |>
  
  # Stage 1: Load from shared storage
  stage("load", function(subject) {
    readRDS(sprintf("data://raw/%s.rds", subject))
  }) |>
  
  # Stage 2: Process using fast local storage
  stage("process", function(data) {
    temp_file <- sprintf("scratch://processing_%s.rds", data$id)
    # ... heavy processing using temp_file ...
  }) |>
  
  # Stage 3: Save results to scratch
  stage("save", function(results) {
    saveRDS(results, sprintf("artifacts://final/%s.rds", results$id))
  })

Configuring for Your HPC System

Automatic Detection

parade automatically detects common HPC environments:

# Auto-detects SLURM
paths_init()
# Automatically uses $SLURM_TMPDIR for scratch

# Auto-detects PBS
paths_init()  
# Automatically uses $PBS_O_WORKDIR

# Auto-detects SGE
paths_init()
# Automatically uses $TMPDIR

Manual Configuration

For custom HPC setups, explicitly set your paths:

# Configure once for your HPC system
paths_set(
  scratch = "/fast/local/$USER",           # Fast local SSD
  artifacts = "/lustre/$USER/outputs",     # Parallel filesystem
  registry = "/lustre/$USER/jobs",         # Shared job storage
  data = "/projects/shared/datasets"       # Readonly shared data
)

# Save configuration for future sessions
paths_set(..., persist = TRUE)

Environment Variables

Set system-wide defaults via environment variables:

# In ~/.bashrc or job scripts
export PARADE_SCRATCH="/fast/scratch/$USER"
export PARADE_ARTIFACTS="/fast/scratch/$USER/outputs"
export PARADE_DATA="/projects/shared/data"

Advanced Patterns

Pattern 1: Dynamic Environment Switching

# Detect and configure based on environment
setup_paths <- function() {
  if (interactive()) {
    # Development settings
    paths_set(artifacts = "~/temp/dev_outputs")
    message("Using development paths")
    
  } else if (Sys.getenv("SLURM_JOB_ID") != "") {
    # Production SLURM settings
    paths_set(
      scratch = Sys.getenv("SLURM_TMPDIR"),
      artifacts = sprintf("/scratch/%s/prod_outputs", Sys.getenv("USER"))
    )
    message("Using SLURM production paths")
    
  } else {
    # Default settings
    paths_init()
    message("Using default paths")
  }
}

Pattern 2: Project-Specific Organization

# Organize outputs by analysis phase
paths_set(
  artifacts = "/scratch/$USER/project_X"
)

# Create structured output directories
save_results <- function(phase, name, object) {
  path <- sprintf("artifacts://%s/%s.rds", phase, name)
  saveRDS(object, path)
}

# Usage
save_results("preprocessing", "cleaned_data", cleaned)
save_results("modeling", "final_model", model)
save_results("validation", "cv_results", cv)

# Results in:
# /scratch/$USER/project_X/preprocessing/cleaned_data.rds
# /scratch/$USER/project_X/modeling/final_model.rds
# /scratch/$USER/project_X/validation/cv_results.rds

Pattern 3: Integration with Sinks

# Sinks automatically use path aliases
sink_spec(
  fields = c("model", "predictions"),
  dir = "artifacts://models",  # Portable path
  template = "{subject}/{session}_{task}.rds"
)

# Different storage for different output types
model_sink <- sink_spec(
  fields = "model",
  dir = "artifacts://large_models",  # Goes to scratch
  format = "rds"
)

config_sink <- sink_spec(
  fields = "params",
  dir = "project://configs",  # Stays with code
  format = "json"
)

Path Resolution Functions

resolve_path() - Convert aliases to absolute paths

# Resolve any path with an alias
resolve_path("artifacts://model.rds")
#> "/scratch/alice/parade-artifacts/model.rds"

resolve_path("data://raw/scan.nii")
#> "/shared/datasets/raw/scan.nii"

# Works with regular paths too
resolve_path("/absolute/path.txt")   # Already absolute
resolve_path("relative/path.txt")    # Made absolute

path_here() - Build paths from components

# Construct paths programmatically
model_dir <- path_here("artifacts", "models", "v2")
#> "/scratch/alice/parade-artifacts/models/v2"

# Automatically creates directories
output_dir <- path_here("artifacts", "results", create = TRUE)

# Skip auto-creation if needed
temp_path <- path_here("scratch", "temp", create = FALSE)

Troubleshooting

Issue: “Cannot find path alias”

# Check your current configuration
paths_get()

# Re-initialize if needed
paths_init()

Issue: “Permission denied” when writing

# Check that your aliases point to writable locations
paths_get()

# Update to writable directory
paths_set(artifacts = "/tmp/my_outputs")

Issue: Different paths on different nodes

# Use node-local storage for better performance
paths_set(
  scratch = ifelse(
    Sys.getenv("SLURM_TMPDIR") != "",
    Sys.getenv("SLURM_TMPDIR"),  # Node-local on SLURM
    "/tmp"                          # Fallback
  )
)

Best Practices

  1. Initialize paths at the start of every script

  2. Use appropriate aliases for different data types

    • data:// for inputs (read-only)
    • artifacts:// for outputs (large files)
    • scratch:// for temporary files
  3. Never hardcode absolute paths

    # Bad
    saveRDS(model, "/home/alice/outputs/model.rds")
    
    # Good
    saveRDS(model, resolve_path("artifacts://model.rds"))
  4. Document your path configuration

    # Show configuration in logs
    message("Parade paths configured:")
    print(paths_get())

Next Steps

Now that you understand portable paths, learn about:

Quick Reference

Function Purpose Example
paths_init() Auto-configure paths paths_init()
paths_get() Show current paths paths_get()
paths_set() Set custom paths paths_set(artifacts = "/scratch")
resolve_path() Convert alias to absolute resolve_path("data://file.csv")
path_here() Build path from parts path_here("artifacts", "models")
Alias Use For Example
data:// Input data read.csv(resolve_path("data://input.csv"))
artifacts:// Outputs saveRDS(m, resolve_path("artifacts://model.rds"))
scratch:// Temp files "scratch://temp.rds"
project:// Code/config source("project://R/utils.R")