Constructing Datasets for MVPA Analysis
Bradley Buchsbaum
2025-09-28
Constructing_Datasets.Rmd
This vignette shows how to construct real neuroimaging datasets for
MVPA. The examples cover both volumetric (image‑based) and surface‑based
inputs using helpers in dataset.R
. The code snippets are
set to eval = FALSE
because they reference file paths you
should replace with your own data.
Creating a Real Volumetric (Image-Based) Dataset
This example assumes you have a 4D fMRI file (“bold.nii.gz”) and a corresponding 3D brain mask file (“mask.nii.gz”).
library(neuroim2)
# Read the fMRI data as a NeuroVec object using neuroim2::read_vec
train_neurovec <- neuroim2::read_vec("path/to/bold.nii.gz", mode = "normal")
# Read the brain mask and create a NeuroVol object
mask_vec <- neuroim2::read_vec("path/to/mask.nii.gz", mode = "normal")
mask_vol <- NeuroVol(as.array(mask_vec), NeuroSpace(dim(mask_vec), spacing = c(2, 2, 2)))
# Create the MVPA image dataset
real_dataset <- mvpa_dataset(train_data = train_neurovec, mask = mask_vol)
# Display dataset details
print(real_dataset)
Creating a Real Surface-Based Dataset
This example assumes you have cortical geometry stored in a file (e.g., “subject.lh.smoothwm.asc”) and a signal matrix in CSV format (“surface_data.csv”). The signal matrix should have dimensions corresponding to the number of vertices and the number of observations.
## remotes::install_github("bbuchsbaum/neurosurf")
library(neurosurf)
# Load the cortical geometry
geom <- read_surf_geometry("path/to/subject.lh.smoothwm.asc")
# Read the surface data from a CSV file
# The CSV should not have a header and have dimensions: number of vertices x number of observations
data_matrix <- as.matrix(read.csv("path/to/surface_data.csv", header = FALSE))
# Verify that the number of rows in the data matches the geometry
nvert <- nrow(neurosurf::vertices(geom))
if(nrow(data_matrix) != nvert) {
stop("The number of vertices in the data does not match the geometry.")
}
# Create a NeuroSurfaceVector using the geometry and the data matrix
real_neurosurf <- NeuroSurfaceVector(geom, 1:nvert, data_matrix)
# Create the MVPA surface dataset; if no mask is provided, one is generated automatically
real_surface_dataset <- mvpa_surface_dataset(train_data = real_neurosurf, name = "lh")
# Display dataset details
print(real_surface_dataset)