Constructing Datasets for MVPA Analysis
Bradley Buchsbaum
2025-09-28
Constructing_Datasets.RmdThis vignette shows how to construct real neuroimaging datasets for
MVPA. The examples cover both volumetric (image‑based) and surface‑based
inputs using helpers in dataset.R. The code snippets are
set to eval = FALSE because they reference file paths you
should replace with your own data.
Creating a Real Volumetric (Image-Based) Dataset
This example assumes you have a 4D fMRI file (“bold.nii.gz”) and a corresponding 3D brain mask file (“mask.nii.gz”).
library(neuroim2)
# Read the fMRI data as a NeuroVec object using neuroim2::read_vec
train_neurovec <- neuroim2::read_vec("path/to/bold.nii.gz", mode = "normal")
# Read the brain mask and create a NeuroVol object
mask_vec <- neuroim2::read_vec("path/to/mask.nii.gz", mode = "normal")
mask_vol <- NeuroVol(as.array(mask_vec), NeuroSpace(dim(mask_vec), spacing = c(2, 2, 2)))
# Create the MVPA image dataset
real_dataset <- mvpa_dataset(train_data = train_neurovec, mask = mask_vol)
# Display dataset details
print(real_dataset)Creating a Real Surface-Based Dataset
This example assumes you have cortical geometry stored in a file (e.g., “subject.lh.smoothwm.asc”) and a signal matrix in CSV format (“surface_data.csv”). The signal matrix should have dimensions corresponding to the number of vertices and the number of observations.
## remotes::install_github("bbuchsbaum/neurosurf")
library(neurosurf)
# Load the cortical geometry
geom <- read_surf_geometry("path/to/subject.lh.smoothwm.asc")
# Read the surface data from a CSV file
# The CSV should not have a header and have dimensions: number of vertices x number of observations
data_matrix <- as.matrix(read.csv("path/to/surface_data.csv", header = FALSE))
# Verify that the number of rows in the data matches the geometry
nvert <- nrow(neurosurf::vertices(geom))
if(nrow(data_matrix) != nvert) {
stop("The number of vertices in the data does not match the geometry.")
}
# Create a NeuroSurfaceVector using the geometry and the data matrix
real_neurosurf <- NeuroSurfaceVector(geom, 1:nvert, data_matrix)
# Create the MVPA surface dataset; if no mask is provided, one is generated automatically
real_surface_dataset <- mvpa_surface_dataset(train_data = real_neurosurf, name = "lh")
# Display dataset details
print(real_surface_dataset)