Smart Path Management: Write Once, Run Anywhere
parade-paths.RmdNote: Code evaluation is disabled in this vignette to keep builds fast and environment-agnostic. Copy code into an interactive R session to run locally or on your cluster.
The Problem: Hardcoded Paths Break Your Code
Imagine you’re developing a neuroimaging analysis on your laptop. Your code might look like this:
# Works on your laptop
model <- readRDS("/Users/alice/projects/brain_study/models/brain_model.rds")
results <- analyze_brain(model)
saveRDS(results, "/Users/alice/projects/brain_study/outputs/results.rds")But when you or a collaborator runs this code on an HPC cluster, it breaks immediately:
-
/Users/alice/doesn’t exist on Linux clusters - Different clusters have completely different filesystem layouts
- Writing large files to home directories often violates cluster policies
- Your collaborator Bob can’t run your code without rewriting all the paths
The Solution: Portable Path Aliases
parade solves this with smart path aliases that automatically adapt to any environment:
library(parade)
# Same code works everywhere
model <- readRDS(resolve_path("artifacts://models/brain_model.rds"))
results <- analyze_brain(model)
saveRDS(results, resolve_path("artifacts://outputs/results.rds"))The artifacts:// prefix is a portable
alias that parade automatically translates:
-
On your laptop: →
/var/folders/.../parade-artifacts/ -
On SLURM cluster: →
$PARADE_SCRATCH/parade-artifacts/(preferred) or$SCRATCH/parade-artifacts/(common). If neither is set, it falls back to$SLURM_TMPDIR(node-local scratch). - On your collaborator’s machine: → whatever their appropriate storage location is
No more broken paths. No more rewriting code for different systems.
HPC quickstart (recommended)
On clusters, the main “gotcha” is node-local
scratch: SLURM_TMPDIR is often local to the
compute node running a job. If you put registry:// or
artifacts:// there, other jobs (or your login node) may not
be able to see the files.
The recommended setup is:
- Point Parade at shared scratch (site-specific).
- Initialize paths in HPC mode (especially on login nodes).
- Run the built-in doctor once to confirm everything is writable.
# In your shell (recommended for most clusters)
export PARADE_SCRATCH="/scratch/$USER" # or your site's shared scratch
library(parade)
# Auto-detects HPC on compute nodes (scheduler vars), and on many login nodes
# when common scratch variables (SCRATCH/WORK/etc.) are set:
paths_init(profile = "auto", create = TRUE)
parade_doctor(create = TRUE)Or use the one-command helper (creates paths, scaffolds a SLURM
template, and can persist to parade.json):
parade_init_hpc(persist = TRUE)To make the configuration easy to reuse in SLURM scripts, you can also generate shell exports:
cat(paste(paths_export(), collapse = "\n"))If you prefer not to create directories automatically, use:
paths_init(profile = "hpc")
paths_validate(create = FALSE)Quick Start: Your First Portable Analysis
Let’s make a simple analysis portable in three steps:
Step 1: Initialize parade’s path system
library(parade)
# Auto-detect your environment and set up paths
paths_init()
# Tip (HPC): on a login node or if your scheduler env isn't set, use:
# paths_init(profile = "hpc"); parade_doctor(create = TRUE)
# See where your aliases point
paths_get()
#> project: /home/alice/myproject
#> data: /home/alice/myproject/data
#> artifacts: /tmp/RtmpXYZ/parade-artifacts
#> registry: /tmp/RtmpXYZ/parade-registryStep 2: Use aliases instead of hardcoded paths
# Before (breaks on different systems):
data <- read.csv("/home/alice/myproject/data/experiment.csv")
saveRDS(model, "/home/alice/myproject/outputs/model.rds")
# After (works everywhere):
data <- read.csv(resolve_path("data://experiment.csv"))
saveRDS(model, resolve_path("artifacts://model.rds"))Step 3: Run the same code anywhere
# On your laptop
paths_init()
saveRDS(big_model, resolve_path("artifacts://models/final.rds"))
# Saves to: /var/folders/temp/parade-artifacts/models/final.rds
# On HPC cluster (same code!)
paths_init(profile = "hpc")
saveRDS(big_model, resolve_path("artifacts://models/final.rds"))
# Saves to: $PARADE_SCRATCH/parade-artifacts/models/final.rds (or $SCRATCH/parade-artifacts/...)That’s it! Your code now works on any system without modification.
Understanding the Seven Path Aliases
parade provides seven aliases, each designed for a specific type of data in your workflow:
Core Data Aliases
data:// - Input datasets
(read-only)
# Your raw data, reference files, shared datasets
brain_atlas <- readRDS(resolve_path("data://references/MNI_atlas.rds"))
subjects <- read.csv(resolve_path("data://participants.csv"))artifacts:// - Analysis outputs (large
files)
# Models, results, processed data - goes to fast scratch storage
saveRDS(fitted_model, resolve_path("artifacts://models/model_v2.rds"))
write.csv(results, resolve_path("artifacts://results/final_results.csv"))project:// - Your code and scripts
System Aliases
scratch:// - Temporary files (deleted
after jobs)
# Intermediate files, working data
temp_file <- "scratch://temp_processing.rds"registry:// - Job management files
# SLURM templates, job scripts (managed by parade)
template <- "registry://templates/my_slurm.tmpl"config:// - parade configuration
# Settings, profiles (usually automatic)
"config://profiles/production.json"cache:// - Downloaded/cached data
# Reusable downloads, package data
"cache://downloaded/large_dataset.tar.gz"Common Scenarios
Scenario 1: Laptop Development → HPC Production
You’re developing on your laptop with small test data, then running full analysis on a cluster:
# During development (laptop)
paths_init()
paths_set(
data = "~/projects/test_data", # Small test dataset
artifacts = "~/projects/outputs" # Local outputs
)
# Your analysis code (unchanged!)
run_analysis <- function() {
data <- readRDS("data://brain_scans.rds")
model <- fit_model(data)
saveRDS(model, "artifacts://fitted_model.rds")
}
# In production (HPC cluster)
paths_init()
paths_set(
data = "/shared/datasets/full_data", # Full dataset
artifacts = "/scratch/$USER/outputs" # Fast scratch storage
)
# Same analysis code still works!
run_analysis()Scenario 2: Collaboration with Different Systems
Alice and Bob are collaborating but have different setups:
# Alice's setup (Mac laptop)
paths_init()
#> artifacts: /var/folders/abc/temp/parade-artifacts
# Bob's setup (Linux workstation)
paths_init()
#> artifacts: /tmp/bob/parade-artifacts
# Shared analysis code (works for both!)
analyze_subjects <- function(subjects) {
for (subj in subjects) {
data <- readRDS(resolve_path(sprintf("data://subjects/%s.rds", subj)))
results <- process_subject(data)
saveRDS(results, resolve_path(sprintf("artifacts://results/%s.rds", subj)))
}
}Scenario 3: Multi-stage Pipeline with Different Storage Needs
Different stages of your pipeline need different storage strategies:
library(parade)
# Configure storage for each data type
paths_init()
paths_set(
data = "/shared/readonly/inputs", # Shared input data
scratch = Sys.getenv("SLURM_TMPDIR"), # Fast local SSD
artifacts = "/scratch/$USER/outputs" # Persistent scratch
)
# Pipeline uses appropriate storage for each stage
flow(subjects) |>
# Stage 1: Load from shared storage
stage("load", function(subject) {
readRDS(sprintf("data://raw/%s.rds", subject))
}) |>
# Stage 2: Process using fast local storage
stage("process", function(data) {
temp_file <- sprintf("scratch://processing_%s.rds", data$id)
# ... heavy processing using temp_file ...
}) |>
# Stage 3: Save results to scratch
stage("save", function(results) {
saveRDS(results, sprintf("artifacts://final/%s.rds", results$id))
})Configuring for Your HPC System
Automatic Detection
parade automatically detects common HPC environments:
# Auto-detects SLURM
paths_init()
# Automatically uses $SLURM_TMPDIR for scratch
# Auto-detects PBS
paths_init()
# Automatically uses $PBS_O_WORKDIR
# Auto-detects SGE
paths_init()
# Automatically uses $TMPDIRManual Configuration
For custom HPC setups, explicitly set your paths:
# Configure once for your HPC system
paths_set(
scratch = "/fast/local/$USER", # Fast local SSD
artifacts = "/lustre/$USER/outputs", # Parallel filesystem
registry = "/lustre/$USER/jobs", # Shared job storage
data = "/projects/shared/datasets" # Readonly shared data
)
# Save configuration for future sessions
paths_set(..., persist = TRUE)Advanced Patterns
Pattern 1: Dynamic Environment Switching
# Detect and configure based on environment
setup_paths <- function() {
if (interactive()) {
# Development settings
paths_set(artifacts = "~/temp/dev_outputs")
message("Using development paths")
} else if (Sys.getenv("SLURM_JOB_ID") != "") {
# Production SLURM settings
paths_set(
scratch = Sys.getenv("SLURM_TMPDIR"),
artifacts = sprintf("/scratch/%s/prod_outputs", Sys.getenv("USER"))
)
message("Using SLURM production paths")
} else {
# Default settings
paths_init()
message("Using default paths")
}
}Pattern 2: Project-Specific Organization
# Organize outputs by analysis phase
paths_set(
artifacts = "/scratch/$USER/project_X"
)
# Create structured output directories
save_results <- function(phase, name, object) {
path <- sprintf("artifacts://%s/%s.rds", phase, name)
saveRDS(object, path)
}
# Usage
save_results("preprocessing", "cleaned_data", cleaned)
save_results("modeling", "final_model", model)
save_results("validation", "cv_results", cv)
# Results in:
# /scratch/$USER/project_X/preprocessing/cleaned_data.rds
# /scratch/$USER/project_X/modeling/final_model.rds
# /scratch/$USER/project_X/validation/cv_results.rdsPattern 3: Integration with Sinks
# Sinks automatically use path aliases
sink_spec(
fields = c("model", "predictions"),
dir = "artifacts://models", # Portable path
template = "{subject}/{session}_{task}.rds"
)
# Different storage for different output types
model_sink <- sink_spec(
fields = "model",
dir = "artifacts://large_models", # Goes to scratch
format = "rds"
)
config_sink <- sink_spec(
fields = "params",
dir = "project://configs", # Stays with code
format = "json"
)Path Resolution Functions
resolve_path() - Convert aliases to absolute paths
# Resolve any path with an alias
resolve_path("artifacts://model.rds")
#> "/scratch/alice/parade-artifacts/model.rds"
resolve_path("data://raw/scan.nii")
#> "/shared/datasets/raw/scan.nii"
# Works with regular paths too
resolve_path("/absolute/path.txt") # Already absolute
resolve_path("relative/path.txt") # Made absolute
path_here() - Build paths from components
# Construct paths programmatically
model_dir <- path_here("artifacts", "models", "v2")
#> "/scratch/alice/parade-artifacts/models/v2"
# Automatically creates directories
output_dir <- path_here("artifacts", "results", create = TRUE)
# Skip auto-creation if needed
temp_path <- path_here("scratch", "temp", create = FALSE)Troubleshooting
Issue: “Cannot find path alias”
# Check your current configuration
paths_get()
# Re-initialize if needed
paths_init()Issue: Different paths on different nodes
# Use node-local storage for better performance
paths_set(
scratch = ifelse(
Sys.getenv("SLURM_TMPDIR") != "",
Sys.getenv("SLURM_TMPDIR"), # Node-local on SLURM
"/tmp" # Fallback
)
)Best Practices
-
Initialize paths at the start of every script
-
Use appropriate aliases for different data types
-
data://for inputs (read-only) -
artifacts://for outputs (large files) -
scratch://for temporary files
-
-
Never hardcode absolute paths
# Bad saveRDS(model, "/home/alice/outputs/model.rds") # Good saveRDS(model, resolve_path("artifacts://model.rds")) -
Document your path configuration
Next Steps
Now that you understand portable paths, learn about:
- Using Artifacts and Sinks for automatic data management
- SLURM Integration for cluster computing
- Core Workflow Concepts for building complete pipelines
Quick Reference
| Function | Purpose | Example |
|---|---|---|
paths_init() |
Auto-configure paths | paths_init() |
paths_get() |
Show current paths | paths_get() |
paths_set() |
Set custom paths | paths_set(artifacts = "/scratch") |
resolve_path() |
Convert alias to absolute | resolve_path("data://file.csv") |
path_here() |
Build path from parts | path_here("artifacts", "models") |
| Alias | Use For | Example |
|---|---|---|
data:// |
Input data | read.csv(resolve_path("data://input.csv")) |
artifacts:// |
Outputs | saveRDS(m, resolve_path("artifacts://model.rds")) |
scratch:// |
Temp files | "scratch://temp.rds" |
project:// |
Code/config | source("project://R/utils.R") |