Deterministic, zero-copy parallel execution for R.
shard is a parallel runtime for workloads that look like: - “run the same numeric kernel over many slices of big data” - “thousands of independent tasks over a shared dataset” - “parallel GLM / simulation / bootstrap / feature screening”
It focuses on three things that are often painful in R parallelism: 1) Shared immutable inputs (avoid duplicating large objects across workers) 2) Explicit output buffers (avoid huge result-gather lists) 3) Deterministic cleanup (supervise workers and recycle on memory drift)
Installation
From CRAN (once released):
install.packages("shard")Development version:
# install.packages("pak")
pak::pak("bbuchsbaum/shard")Core concepts
1) Share large inputs (read-only by default)
Shared objects are designed for zero-copy parallel reads (where the OS allows) and are treated as immutable by default inside parallel tasks.
2) Allocate explicit output buffers
Instead of returning giant objects from each worker, write to a preallocated buffer:
3) Run shard_map() over shards
blocks <- shard::shards(1e6, block_size = "auto")
run <- shard::shard_map(
blocks,
borrow = list(X = X, Y = Y),
out = list(out = out),
workers = 8,
fun = function(block, X, Y, out) {
# block contains indices
idx <- block$idx
out[idx] <- colMeans(Y[, idx, drop = FALSE])
}
)
shard::report(run)Safety defaults (and why they matter)
Borrowed inputs are immutable
By default, trying to mutate borrowed/shared inputs is treated as a bug: - cow = "deny" (default): mutation triggers an error - cow = "audit": detect and flag (best-effort; platform dependent) - cow = "allow": allow copy-on-write, track it, and enforce budgets
Why default is deny: - Prevents silent memory blowups from accidental wide writes - Prevents subtle correctness bugs (changes are private to a worker) - Keeps behavior predictable across platforms
Deterministic cleanup via worker supervision
R’s GC and allocator behavior can lead to memory drift in long-running workers.
shard monitors per-worker memory usage and can recycle workers when drift exceeds thresholds, keeping end-of-run memory close to baseline.
Diagnostics
After a run, shard can report: - total and per-worker peak RSS - end RSS vs baseline - materialized bytes (hidden copies) - recycling events, retries, timing
rep <- shard::report(run)
print(rep)
shard::mem_report(run)
shard::copy_report(run)Convenience wrappers
If your workload is “apply a function over columns” or “lapply over a list”, shard provides convenience wrappers that handle sharing and buffering automatically while still running through the supervised runtime.
List lapply (guarded gather)
xs <- lapply(1:1000, function(i) rnorm(100))
out <- shard::shard_lapply_shared(
xs,
FUN = function(el) mean(el),
workers = 8
)For large outputs (big vectors/data.frames per element), prefer buffer(), table_sink(), or shard_reduce() instead of gathering everything to the master.