delarr provides a lightweight delayed matrix type for R with a tidy-friendly API. It keeps the surface area small—one S3 class plus a handful of verbs—while offering fused elementwise transforms, reductions, and streaming materialisation in column chunks. Column blocks can also be streamed straight to disk via the bundled HDF5 writer.
Installation
The package is under active development. Clone the repository and use pkgload::load_all() or devtools::install() to experiment with the API.
# install.packages("pkgload")
pkgload::load_all(".")Getting started
library(delarr)
mat <- matrix(rnorm(20), 5, 4)
arr <- delarr(mat)
# Lazy pipeline
out <- arr |>
d_center(dim = "rows", na.rm = TRUE) |>
d_map(~ .x * 0.5) |>
d_reduce(mean, dim = "rows")
collect(out)Streaming straight to disk
# assume `X` lives inside an HDF5 file
lzy <- delarr_hdf5("input.h5", "X")
# Apply a transformation lazily and stream the result into a new dataset
# (dim(lzy)[2] supplies the total column count for the writer)
lzy |>
d_zscore(dim = "cols") |>
collect(into = hdf5_writer(
path = "output.h5",
dataset = "X_zscore",
ncol = dim(lzy)[2],
chunk = c(128L, 4096L)
))Backends
-
delarr_mem()wraps any in-memory matrix. -
delarr_hdf5()exposes a dataset throughhdf5r. -
delarr_backend()lets you create a seed from any(rows, cols) -> matrixpull function. -
hdf5_writer()pairs withcollect(into = ...)to stream results back to disk without materialising the full matrix in memory (supplyncolto size the destination dataset up front).
Additional backends (e.g., mmap) can be layered on by supplying a compatible pull function; no extra dependencies ship in the core package.
Pipelined verbs
-
d_map()/d_map2()for elementwise transformations. -
d_center()/d_scale()/d_zscore()/d_detrend()for common preprocessing, each with optionalna.rmhandling. -
d_reduce()for row-wise or column-wise reductions, with streamingna.rmsupport for sum/mean/min/max. -
d_where()for masked updates, optionally replacing masked entries via thefillargument. -
collect()to realise the data (streamed in chunks), optionally writing to disk withhdf5_writer(), andblock_apply()for chunk-wise computation.
All verbs return another delarr, so pipelines stay lazy until collect() materialises the result.