Creates or extends an HDF5 dataset so that collect(x, into = writer) can
stream column blocks directly to disk without materialising the full matrix
in memory.
Usage
hdf5_writer(path, dataset, ncol, chunk = c(128L, 4096L), compression = 4L)Arguments
- path
Path to the HDF5 file. The file is created if it does not exist.
- dataset
Name of the dataset to create or update.
- ncol
Total number of columns that will be written. The writer uses this to size the target dataset up-front.
- chunk
Integer vector of length two giving the chunk size
(rows, cols)for the target dataset (optional).- compression
Gzip compression level (0-9). Use 0 for no compression, higher values for better compression at cost of speed. Default is 4. Use NULL to disable compression entirely.
Value
A writer object with $write() and $finalize() methods understood
by collect().
Examples
# Create source data in a temp HDF5 file
tf_in <- tempfile(fileext = ".h5")
data <- matrix(1:20, nrow = 4, ncol = 5)
f <- hdf5r::H5File$new(tf_in, mode = "w")
f$create_dataset("X", robj = data)
f$close_all()
# Load, transform, and stream to output file
darr <- delarr_hdf5(tf_in, "X")
transformed <- darr |> d_center(dim = "cols")
tf_out <- tempfile(fileext = ".h5")
writer <- hdf5_writer(tf_out, "result", ncol = ncol(transformed), compression = 4L)
collect(transformed, into = writer)
# Verify output
g <- hdf5r::H5File$new(tf_out, mode = "r")
result <- g[["result"]]$read()
g$close_all()
result
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1.5 -1.5 -1.5 -1.5 -1.5
#> [2,] -0.5 -0.5 -0.5 -0.5 -0.5
#> [3,] 0.5 0.5 0.5 0.5 0.5
#> [4,] 1.5 1.5 1.5 1.5 1.5
# Clean up
unlink(c(tf_in, tf_out))