Skip to contents

Introduction

This tutorial explains how to run regional multivariate pattern analysis (MVPA) using MVPA_Regional.R. The script performs MVPA on specified brain regions, enabling both classification and regression analyses on fMRI data. Regional analysis can be conducted on volumetric (NIfTI) or surface-based neuroimaging data, and allows for separate training and testing subsets.

Key Features

The script handles both volumetric NIfTI and surface-based data formats. You can evaluate specific regions by using separate training and testing subsets. All parameters are configurable through YAML or R files.

The analysis produces comprehensive outputs including performance maps, prediction tables, and configuration records. Cross-validation options include blocked, k-fold, and two-fold approaches.

The script works with built-in MVPA models and integrates with the caret package for additional model options. Data preprocessing includes optional centering/scaling and flexible feature selection methods.


Running the Script

1. Basic Usage

If you have:

  • A 4D fMRI file for training (e.g., train_data.nii)
  • A trial-by-trial design matrix (e.g., train_design.txt)
  • A brain mask file (e.g., mask.nii)

You can run the regional analysis from the command line:

Rscript MVPA_Regional.R --train_design=train_design.txt \
                         --train_data=train_data.nii \
                         --mask=mask.nii \
                         --model=sda_notune \
                         --label_column=condition \
                         --ncores=4 \
                         --output=my_regional_output

2. Understanding Data Modes

The script supports two primary data modes:

Image Mode (Volumetric Data)

  • Default mode (--data_mode=image)
  • Works with NIfTI files and a binary mask
  • Analyzes region-level data based on voxel masks

Surface Mode

  • Activated with --data_mode=surface
  • Processes surface-based neuroimaging data
  • Can handle multiple surface sections

3. Available Models

The script supports various classification and regression models:

Built-in MVPA Models:

  • corclass: Correlation-based classifier with template matching
  • sda_notune: Shrinkage Discriminant Analysis without tuning
  • sda_boot: SDA with bootstrap resampling
  • glmnet_opt: Elastic net with EPSGO parameter optimization
  • sparse_sda: SDA with sparsity constraints
  • sda_ranking: SDA with automatic feature ranking
  • mgsda: Multi-Group Sparse Discriminant Analysis
  • lda_thomaz: Modified LDA for high-dimensional data
  • hdrda: High-Dimensional Regularized Discriminant Analysis

Caret Models:

  • Any model available in the caret package (e.g., rf, svmRadial, glmnet)

4. Cross-Validation Options

Multiple cross-validation strategies are available:

Blocked Cross-Validation

--block_column=session

Uses a blocking variable (e.g., session) for splitting the data.

K-Fold Cross-Validation

Default when no block column is specified; uses random splits.

Two-Fold Cross-Validation

Specify in the configuration file:

cross_validation:
  name: "twofold"
  nreps: 10

Advanced Cross-Validation Methods

In addition to the standard options above, several advanced cross-validation strategies are available:

  • Blocked Cross-Validation: Divides the dataset based on a blocking variable (e.g., session) so that samples from the same block remain together.
  • K-Fold Cross-Validation: Randomly partitions the data into k folds, providing a robust estimate of model performance.
  • Bootstrap Blocked Cross-Validation: Generates bootstrap resamples within blocks to assess model stability in heterogeneous datasets.
  • Sequential Blocked Cross-Validation: Assigns sequential folds within each block, preserving temporal or ordered structures.
  • Custom Cross-Validation: Allows you to define custom training and testing splits if standard methods do not fit your experimental design.

Specify the desired method in your configuration file by setting the name field under cross_validation. For example, to use bootstrap blocked cross-validation:

cross_validation:
  name: "bootstrap"   # Options: "twofold", "bootstrap", "sequential", "custom", "kfold"
  nreps: 10

Choose the method that best aligns with your data structure and experimental design.

5. Feature Selection

Enable feature selection with:

feature_selector:
  method: "anova"  # Options: "correlation", "t-test", etc.
  cutoff_type: "percentile"
  cutoff_value: 0.1

6. Understanding label_column

The label column specifies the target variable:

  • For classification, it should contain categorical labels (e.g., “Face”, “House”).
  • For regression, it should contain continuous values (e.g., reaction times).

Example Design File (train_design.txt):

trial  condition  subject  session
1      Face       S01      1
2      House      S01      1
3      Face       S01      1
4      House      S01      1
5      Face       S01      2

7. Using a Configuration File

Instead of specifying all options on the command line, you can use a configuration file.

Example YAML Config File (regional_config.yaml):

# Data Sources
train_design: "train_design.txt"
test_design: "test_design.txt"
train_data: "train_data.nii"
test_data: "test_data.nii"
mask: "mask.nii"

# Analysis Parameters
model: "rf"  # Random Forest classifier
data_mode: "image"  # or "surface"
ncores: 4
label_column: "condition"
block_column: "session"

# Output Options
output: "regional_results"
normalize_samples: TRUE
class_metrics: TRUE

# Advanced Options
feature_selector:
  method: "anova"
  cutoff_type: "percentile"
  cutoff_value: 0.1

cross_validation:
  name: "twofold"
  nreps: 10

# Optional Subsetting: Define different subsets for training and testing
train_subset: "subject == 'S01'"
test_subset: "subject == 'S02'"

Running with a Config File:

Rscript MVPA_Regional.R --config=regional_config.yaml

8. Expected Outputs

After running the script, the output directory (e.g., regional_results/) contains:

  • Performance Maps: NIfTI files with region-level performance metrics (e.g., accuracy, AUC).
  • Prediction Tables: Text files summarizing predictions for each region.
  • Configuration File: config.yaml with complete analysis parameters for reproducibility.

Example directory structure:

regional_results/
├── performance_table.txt   # Regional performance metrics
├── prediction_table.txt    # Prediction details per region
├── regional_metric1.nii    # Regional performance map (e.g., accuracy or AUC)
├── regional_metric2.nii    # Additional metric maps (if applicable)
└── config.yaml             # Analysis configuration

For regression analyses, different metrics (e.g., r2.nii, rmse.nii, spearcor.nii) will be output.

9. Performance Considerations

  • Use --normalize_samples=TRUE for improved model performance.
  • Increase --ncores to leverage multi-core systems.
  • Adjust parameters based on spatial resolution and hypotheses.
  • Select appropriate cross-validation strategies to prevent overfitting.

Summary

MVPA_Regional.R provides comprehensive regional MVPA analysis capabilities. It handles both volumetric and surface-based data formats with flexible configuration through command line or config files. The tool generates detailed performance maps and prediction tables, while incorporating robust cross-validation and feature selection to ensure reliable results.

Next Steps: - Experiment with various models (--model=rf, --model=sda_notune). - Test different feature selection methods. - Evaluate both classification and regression scenarios. - Optimize processing using parallel computation.

Happy regional analysis!