Skip to contents

title: “02. Data Structures and Sampling Frames” author: “Bradley R. Buchsbaum” date: “2025-09-01” output: rmarkdown::html_vignette vignette: > % % % —

Introduction: Linking Data and Design

Effective fMRI analysis requires associating the measured brain activity (the imaging data) with crucial metadata, including:

  • Temporal Structure: When each scan was acquired (TR) and how scans are grouped into runs.
  • Spatial Structure: Which brain locations (voxels) are included in the analysis (mask).
  • Experimental Design: Timing and properties of experimental events or conditions.

The fmrireg package uses several dataset objects to encapsulate this information, providing a consistent input format for modeling functions like event_model, baseline_model, fmri_lm, and estimate_betas.

This vignette describes the main dataset classes and how to create them.

The sampling_frame

Before diving into datasets, recall the sampling_frame object (introduced in the Overview and detailed in other vignettes). It defines the fundamental temporal structure shared by all dataset types:

  • blocklens: A vector specifying the number of scans (time points) in each run.
  • TR: The repetition time (time between scans) in seconds.
sframe_example <- sampling_frame(blocklens = c(150, 160), TR = 2.0)
print(sframe_example)
#> Sampling Frame
#> ==============
#> 
#> Structure:
#>   2 blocks
#>   Total scans: 310
#> 
#> Timing:
#>   TR: 2 s
#>   Precision: 0.1 s
#> 
#> Duration:
#>   Total time: 620.0 s

Dataset objects internally create or utilize a sampling_frame based on the provided run lengths and TR.

Overview of Dataset Classes

fmrireg offers different dataset classes depending on how your data is stored:

  • fmri_mem_dataset: For volumetric fMRI data already loaded into R memory (as NeuroVec objects).
  • fmri_file_dataset: For volumetric fMRI data stored in image files (e.g., NIfTI) on disk.
  • matrix_dataset: For fMRI data represented as a standard R matrix (time points x voxels/components).
  • latent_dataset: For dimension-reduced data (e.g., PCA/ICA components), typically requiring the fmristore package.

All these inherit from a base fmri_dataset class.

In-Memory Volumetric Data (fmri_mem_dataset)

Use this when your fMRI runs are loaded as neuroim2::NeuroVec objects in your R session.

Key Arguments:

  • scans: A list of NeuroVec objects, one for each run.
  • mask: A neuroim2::NeuroVol or neuroim2::LogicalNeuroVol object representing the brain mask.
  • TR: Repetition time (seconds).
  • run_length (Optional): Vector of run lengths; if omitted, inferred from the dimensions of the NeuroVec objects in scans.
  • event_table (Optional): A data.frame containing experimental design information.
# Create minimal example data (2 runs)
d <- c(5, 5, 5, 20) # Small dimensions for example
mask_vol <- neuroim2::LogicalNeuroVol(array(TRUE, d[1:3]), neuroim2::NeuroSpace(d[1:3]))

scan1 <- neuroim2::NeuroVec(array(rnorm(prod(d)), d), neuroim2::NeuroSpace(d))
scan2 <- neuroim2::NeuroVec(array(rnorm(prod(d)), d), neuroim2::NeuroSpace(d))

# Example event table
events_df <- data.frame(
  onset = c(5, 15, 5, 15), 
  condition = factor(c("A", "B", "A", "B")),
  run = c(1, 1, 2, 2)
)

# Create the dataset object
mem_dset <- fmri_mem_dataset(scans = list(scan1, scan2), 
                             mask = mask_vol, 
                             TR = 2.0, 
                             # run_length automatically inferred as c(20, 20)
                             event_table = events_df)

print(mem_dset)
#> 
#> === fMRI Dataset ===
#> 
#> ** Dimensions:
#>   - Timepoints: 40 
#>   - Runs: 2  
#>   - Objects: 2 pre-loaded NeuroVec object(s)
#>   - Voxels in mask: (lazy)
#> 
#> ** Temporal Structure:
#>   - TR: 2 seconds
#>   - Run lengths: 20, 20 
#> 
#> ** Event Table:
#>   - Rows: 4 
#>   - Variables: onset, condition, run 
#>   - First few events:
#>   onset condition run
#> 1     5         A   1
#> 2    15         B   1
#> 3     5         A   2
# Access components
print(mem_dset$sampling_frame)
#> Sampling Frame
#> ==============
#> 
#> Structure:
#>   2 blocks
#>   Total scans: 40
#> 
#> Timing:
#>   TR: 2 s
#>   Precision: 0.1 s
#> 
#> Duration:
#>   Total time: 80.0 s

File-Based Volumetric Data (fmri_file_dataset)

This is often the most practical option for typical fMRI analyses where data resides in files.

Key Arguments:

  • scans: A character vector of file paths to the 4D fMRI image files (e.g., .nii.gz), one path per run.
  • mask: A character string giving the file path to the 3D mask image file.
  • TR: Repetition time (seconds).
  • run_length: A numeric vector specifying the number of volumes (time points) in each run file listed in scans.
  • event_table (Optional): A data.frame with experimental design info.
  • base_path (Optional): A path to prepend to relative file paths in scans and mask.
  • preload (Optional, Default: FALSE): If TRUE, load the mask and scan data into memory immediately. If FALSE (recommended for large data), data is read only when accessed.
  • mode (Optional): Storage mode for neuroim2 when reading data (e.g., “normal”, “mmap”).
# --- Create Dummy Files (for illustration only) ---
# In a real analysis, these files would already exist.
tmp_dir <- tempdir()
mask_filename <- "mask.nii.gz"
scan1_filename <- "run1.nii.gz"
scan2_filename <- "run2.nii.gz"

mask_file_full_path <- file.path(tmp_dir, mask_filename)
scan1_file_full_path <- file.path(tmp_dir, scan1_filename)
scan2_file_full_path <- file.path(tmp_dir, scan2_filename)

# Create small dummy mask and scans using neuroim2 functionality
d <- c(5, 5, 5) # Mask dimensions
d_run1 <- c(d, 20) # Run 1 dimensions (time=20)
d_run2 <- c(d, 25) # Run 2 dimensions (time=25)

mask_vol_dummy <- neuroim2::NeuroVol(array(1, d), neuroim2::NeuroSpace(d))
scan1_dummy <- neuroim2::NeuroVec(array(rnorm(prod(d_run1)), d_run1), neuroim2::NeuroSpace(d_run1))
scan2_dummy <- neuroim2::NeuroVec(array(rnorm(prod(d_run2)), d_run2), neuroim2::NeuroSpace(d_run2))

# Ensure dummy files are written using their full paths
neuroim2::write_vol(mask_vol_dummy, mask_file_full_path)
neuroim2::write_vec(scan1_dummy, scan1_file_full_path)
neuroim2::write_vec(scan2_dummy, scan2_file_full_path)
# --- End Dummy File Creation ---

# Create the file-based dataset object
# Pass only filenames to 'scans' and 'mask', and specify the directory in 'base_path'
file_dset <- fmri_dataset(scans = c(scan1_filename, scan2_filename), 
                            mask = mask_filename, 
                            TR = 1.5, 
                            run_length = c(20, 25), # Must match time dim of files
                            event_table = events_df, 
                            base_path = tmp_dir,    # Set base_path to the temp directory
                            preload = FALSE) # Keep data on disk

# This print statement should now work
print(file_dset)
#> 
#> === fMRI Dataset ===
#> 
#> ** Dimensions:
#>   - Timepoints: 45 
#>   - Runs: 2  
#>   - Backend: nifti_backend 
#>   - Data dimensions: 45 x ? (timepoints x voxels)
#>   - Voxels in mask: (lazy)
#> 
#> ** Temporal Structure:
#>   - TR: 1.5 seconds
#>   - Run lengths: 20, 25 
#> 
#> ** Event Table:
#>   - Rows: 4 
#>   - Variables: onset, condition, run 
#>   - First few events:
#> # A tibble: 3 × 3
#>   onset condition   run
#>   <dbl> <fct>     <dbl>
#> 1     5 A             1
#> 2    15 B             1
#> 3     5 A             2

# Clean up dummy files (optional, commented out for vignette)
# file.remove(mask_file_full_path, scan1_file_full_path, scan2_file_full_path)

Using preload=FALSE is memory-efficient as only the required data segments are read when needed (e.g., during model fitting).

Matrix Data (matrix_dataset)

Use this if your fMRI data is already represented as a 2D matrix where rows are time points and columns are voxels or components (e.g., after surface projection or ROI averaging).

Key Arguments:

  • datamat: The numeric matrix (time x features).
  • TR: Repetition time (seconds).
  • run_length: Vector specifying the number of rows (time points) belonging to each run.
  • event_table (Optional): A data.frame with design info (must have same total number of rows as datamat).
# Example matrix (100 time points, 50 features/voxels)
# Two runs of 50 time points each
time_points <- 100
features <- 50
run_len <- c(50, 50)
example_matrix <- matrix(rnorm(time_points * features), time_points, features)

# Example event table for matrix data
events_mat_df <- data.frame(
  onset = c(seq(5, 45, by=10), seq(5, 45, by=10)), 
  condition = factor(rep(c("C", "D"), 10)),
  run = rep(1:2, each = 5)
)

mat_dset <- matrix_dataset(datamat = example_matrix, 
                           TR = 2.5, 
                           run_length = run_len,
                           event_table = events_mat_df)

print(mat_dset)
#> 
#> === fMRI Dataset ===
#> 
#> ** Dimensions:
#>   - Timepoints: 100 
#>   - Runs: 2  
#>   - Matrix: 100 x 50 (timepoints x voxels)
#>   - Voxels in mask: (lazy)
#> 
#> ** Temporal Structure:
#>   - TR: 2.5 seconds
#>   - Run lengths: 50, 50 
#> 
#> ** Event Table:
#>   - Rows: 20 
#>   - Variables: onset, condition, run 
#>   - First few events:
#>   onset condition run
#> 1     5         C   1
#> 2    15         D   1
#> 3    25         C   1
print(mat_dset$sampling_frame)
#> Sampling Frame
#> ==============
#> 
#> Structure:
#>   2 blocks
#>   Total scans: 100
#> 
#> Timing:
#>   TR: 2.5 s
#>   Precision: 0.1 s
#> 
#> Duration:
#>   Total time: 250.0 s

For matrix_dataset, the concept of a spatial mask is implicit; all columns provided in datamat are included.

Latent Data (latent_dataset)

This class is designed for data that has undergone dimensionality reduction (e.g., PCA, ICA). It wraps a LatentNeuroVec object, which stores the basis vectors (latent components over time) and loadings (spatial maps of components). Creating and using LatentNeuroVec objects typically requires the fmristore package.

Key Arguments:

  • lvec: A LatentNeuroVec object from the fmristore package.
  • TR: Repetition time (seconds).
  • run_length: Vector specifying run lengths (must sum to the time dimension of lvec).
  • event_table (Optional): Experimental design data.frame.
# Conceptual example (requires fmristore package and a LatentNeuroVec)
# Assuming 'my_latent_neuro_vec' is a LatentNeuroVec object representing
# 20 components over 300 time points (2 runs of 150)

# latent_dset <- latent_dataset(lvec = my_latent_neuro_vec, 
#                              TR = 2.0, 
#                              run_length = c(150, 150),
#                              event_table = some_event_df)
# 
# print(latent_dset)

This dataset type essentially behaves like a matrix_dataset where the matrix columns are the latent component time series.

Using Dataset Objects

Next

  • 03 Simulating fMRI Data

  • 04 fMRI Linear Model (GLM) Once created, these dataset objects serve as the primary data input for fmrireg’s modeling functions:

  • event_model(..., sampling_frame = dset$sampling_frame)

  • baseline_model(..., sframe = dset$sampling_frame)

  • fmri_lm(model, dataset = dset)

  • estimate_betas(..., dataset = dset)

They provide a standardized way to access data (get_data(dset)), masks (get_mask(dset)), and timing information (blocklens(dset), blockids(dset)), regardless of the underlying storage format.

Choosing the appropriate dataset class depends on where your data resides (memory, files) and its format (volumetric, matrix, latent).