Constructing Datasets for MVPA Analysis
Bradley Buchsbaum
2026-04-14
Source:vignettes/Constructing_Datasets.Rmd
Constructing_Datasets.RmdThis vignette shows how to construct neuroimaging datasets for MVPA. We start with a runnable synthetic example, then show the pattern for real volumetric and surface data.
Quick Start with Synthetic Data
The easiest way to get started is with
gen_sample_dataset(), which creates a complete dataset and
design in one call:
library(rMVPA)
library(neuroim2)
# Create a synthetic 6x6x6 dataset with 80 observations in 4 runs, 2 conditions
ds <- gen_sample_dataset(D = c(6, 6, 6), nobs = 80, blocks = 4, nlevels = 2)
# The result contains a dataset and a design
print(ds$dataset)
#>
#> MVPA Dataset
#>
#> - Training Data
#> - Dimensions: 6 x 6 x 6 x 80 observations
#> - Type: DenseNeuroVec
#> - Test Data
#> - None
#> - Mask Information
#> - Areas: TRUE : 216
#> - Active voxels/vertices: 216
print(ds$design)
#>
#> MVPA Design
#>
#> - Training Data
#> - Observations: 80
#> - Response Type: Factor
#> - Levels: a, b
#> - Class Distribution: a: 40, b: 40
#> - Test Data
#> - None
#> - Structure
#> - Blocking: Present
#> - Number of Blocks: 4
#> - Mean Block Size: 20 (SD: 0 )
#> - Split Groups: None
# You can also construct each piece manually:
dset <- mvpa_dataset(ds$dataset$train_data, mask = ds$dataset$mask)
design_df <- data.frame(
Y = ds$design$y_train,
block = ds$design$block_var
)
mvdes <- mvpa_design(design_df, y_train = ~ Y, block_var = ~ block)
print(mvdes)
#>
#> MVPA Design
#>
#> - Training Data
#> - Observations: 80
#> - Response Type: Factor
#> - Levels: a, b
#> - Class Distribution: a: 40, b: 40
#> - Test Data
#> - None
#> - Structure
#> - Blocking: Present
#> - Number of Blocks: 4
#> - Mean Block Size: 20 (SD: 0 )
#> - Split Groups: NoneCreating a Real Volumetric (Image-Based) Dataset
The examples below use eval = FALSE because they
reference file paths you should replace with your own data.
This example assumes you have a 4D fMRI file (“bold.nii.gz”) and a corresponding 3D brain mask file (“mask.nii.gz”).
library(neuroim2)
# Read the fMRI data as a NeuroVec object using neuroim2::read_vec
train_neurovec <- neuroim2::read_vec("path/to/bold.nii.gz", mode = "normal")
# Read the brain mask and create a NeuroVol object
mask_vec <- neuroim2::read_vec("path/to/mask.nii.gz", mode = "normal")
mask_vol <- NeuroVol(as.array(mask_vec), NeuroSpace(dim(mask_vec), spacing = c(2, 2, 2)))
# Create the MVPA image dataset
real_dataset <- mvpa_dataset(train_data = train_neurovec, mask = mask_vol)
# Display dataset details
print(real_dataset)Creating a Real Surface-Based Dataset
This example assumes you have cortical geometry stored in a file (e.g., “subject.lh.smoothwm.asc”) and a signal matrix in CSV format (“surface_data.csv”). The signal matrix should have dimensions corresponding to the number of vertices and the number of observations.
## remotes::install_github("bbuchsbaum/neurosurf")
library(neurosurf)
# Load the cortical geometry
geom <- read_surf_geometry("path/to/subject.lh.smoothwm.asc")
# Read the surface data from a CSV file
# The CSV should not have a header and have dimensions: number of vertices x number of observations
data_matrix <- as.matrix(read.csv("path/to/surface_data.csv", header = FALSE))
# Verify that the number of rows in the data matches the geometry
nvert <- nrow(neurosurf::vertices(geom))
if(nrow(data_matrix) != nvert) {
stop("The number of vertices in the data does not match the geometry.")
}
# Create a NeuroSurfaceVector using the geometry and the data matrix
real_neurosurf <- NeuroSurfaceVector(geom, 1:nvert, data_matrix)
# Create the MVPA surface dataset; if no mask is provided, one is generated automatically
real_surface_dataset <- mvpa_surface_dataset(train_data = real_neurosurf, name = "lh")
# Display dataset details
print(real_surface_dataset)