Skip to contents

Introduction

This tutorial explains how to run regional multivariate pattern analysis (MVPA) using MVPA_Regional.R. The script performs MVPA on specified brain regions, enabling both classification and regression analyses on fMRI data. Regional analysis can be conducted on volumetric (NIfTI) or surface-based neuroimaging data, and allows for separate training and testing subsets.

Key Features:

  • Flexible Input Handling: Works with both volumetric (NIfTI) and surface-based data
  • Subset Analysis: Supports separate training and testing subsets for region-specific evaluation
  • Configurable via Files: Parameters can be set using YAML or R configuration files
  • Detailed Outputs: Generates performance maps, prediction tables, and configuration files
  • Robust Cross-Validation: Includes options for blocked, k-fold, and two-fold cross-validation
  • Model Versatility: Compatible with built-in MVPA models and models available in the caret package
  • Data Normalization & Feature Selection: Optional centering/scaling and customizable feature selection methods

Running the Script

1. Basic Usage

If you have:

  • A 4D fMRI file for training (e.g., train_data.nii)
  • A trial-by-trial design matrix (e.g., train_design.txt)
  • A brain mask file (e.g., mask.nii)

You can run the regional analysis from the command line:

Rscript MVPA_Regional.R --train_design=train_design.txt \
                         --train_data=train_data.nii \
                         --mask=mask.nii \
                         --model=sda_notune \
                         --label_column=condition \
                         --ncores=4 \
                         --output=my_regional_output

2. Understanding Data Modes

The script supports two primary data modes:

Image Mode (Volumetric Data)

  • Default mode (--data_mode=image)
  • Works with NIfTI files and a binary mask
  • Analyzes region-level data based on voxel masks

Surface Mode

  • Activated with --data_mode=surface
  • Processes surface-based neuroimaging data
  • Can handle multiple surface sections

3. Available Models

The script supports various classification and regression models:

Built-in MVPA Models:

  • corclass: Correlation-based classifier with template matching
  • sda_notune: Shrinkage Discriminant Analysis without tuning
  • sda_boot: SDA with bootstrap resampling
  • glmnet_opt: Elastic net with EPSGO parameter optimization
  • sparse_sda: SDA with sparsity constraints
  • sda_ranking: SDA with automatic feature ranking
  • mgsda: Multi-Group Sparse Discriminant Analysis
  • lda_thomaz: Modified LDA for high-dimensional data
  • hdrda: High-Dimensional Regularized Discriminant Analysis

Caret Models:

  • Any model available in the caret package (e.g., rf, svmRadial, glmnet)

4. Cross-Validation Options

Multiple cross-validation strategies are available:

Blocked Cross-Validation

--block_column=session

Uses a blocking variable (e.g., session) for splitting the data.

K-Fold Cross-Validation

Default when no block column is specified; uses random splits.

Two-Fold Cross-Validation

Specify in the configuration file:

cross_validation:
  name: "twofold"
  nreps: 10

Advanced Cross-Validation Methods

In addition to the standard options above, several advanced cross-validation strategies are available:

  • Blocked Cross-Validation: Divides the dataset based on a blocking variable (e.g., session) so that samples from the same block remain together.
  • K-Fold Cross-Validation: Randomly partitions the data into k folds, providing a robust estimate of model performance.
  • Bootstrap Blocked Cross-Validation: Generates bootstrap resamples within blocks to assess model stability in heterogeneous datasets.
  • Sequential Blocked Cross-Validation: Assigns sequential folds within each block, preserving temporal or ordered structures.
  • Custom Cross-Validation: Allows you to define custom training and testing splits if standard methods do not fit your experimental design.

Specify the desired method in your configuration file by setting the name field under cross_validation. For example, to use bootstrap blocked cross-validation:

cross_validation:
  name: "bootstrap"   # Options: "twofold", "bootstrap", "sequential", "custom", "kfold"
  nreps: 10

Choose the method that best aligns with your data structure and experimental design.

5. Feature Selection

Enable feature selection with:

feature_selector:
  method: "anova"  # Options: "correlation", "t-test", etc.
  cutoff_type: "percentile"
  cutoff_value: 0.1

6. Understanding label_column

The label column specifies the target variable:

  • For classification, it should contain categorical labels (e.g., “Face”, “House”).
  • For regression, it should contain continuous values (e.g., reaction times).

Example Design File (train_design.txt):

trial  condition  subject  session
1      Face       S01      1
2      House      S01      1
3      Face       S01      1
4      House      S01      1
5      Face       S01      2

7. Using a Configuration File

Instead of specifying all options on the command line, you can use a configuration file.

Example YAML Config File (regional_config.yaml):

# Data Sources
train_design: "train_design.txt"
test_design: "test_design.txt"
train_data: "train_data.nii"
test_data: "test_data.nii"
mask: "mask.nii"

# Analysis Parameters
model: "rf"  # Random Forest classifier
data_mode: "image"  # or "surface"
ncores: 4
label_column: "condition"
block_column: "session"

# Output Options
output: "regional_results"
normalize_samples: TRUE
class_metrics: TRUE

# Advanced Options
feature_selector:
  method: "anova"
  cutoff_type: "percentile"
  cutoff_value: 0.1

cross_validation:
  name: "twofold"
  nreps: 10

# Optional Subsetting: Define different subsets for training and testing
train_subset: "subject == 'S01'"
test_subset: "subject == 'S02'"

Running with a Config File:

Rscript MVPA_Regional.R --config=regional_config.yaml

8. Expected Outputs

After running the script, the output directory (e.g., regional_results/) contains:

  • Performance Maps: NIfTI files with region-level performance metrics (e.g., accuracy, AUC).
  • Prediction Tables: Text files summarizing predictions for each region.
  • Configuration File: config.yaml with complete analysis parameters for reproducibility.

Example directory structure:

regional_results/
├── performance_table.txt   # Regional performance metrics
├── prediction_table.txt    # Prediction details per region
├── regional_metric1.nii    # Regional performance map (e.g., accuracy or AUC)
├── regional_metric2.nii    # Additional metric maps (if applicable)
└── config.yaml             # Analysis configuration

For regression analyses, different metrics (e.g., r2.nii, rmse.nii, spearcor.nii) will be output.

9. Performance Considerations

  • Use --normalize_samples=TRUE for improved model performance.
  • Increase --ncores to leverage multi-core systems.
  • Adjust parameters based on spatial resolution and hypotheses.
  • Select appropriate cross-validation strategies to prevent overfitting.

Summary

  • MVPA_Regional.R is a versatile tool for regional MVPA analysis.
  • Supports both volumetric and surface-based data.
  • Flexible configuration via command line or config files.
  • Outputs detailed performance maps and prediction tables.
  • Robust cross-validation and optional feature selection enhance analysis.

Next Steps: - Experiment with various models (--model=rf, --model=sda_notune). - Test different feature selection methods. - Evaluate both classification and regression scenarios. - Optimize processing using parallel computation.

Happy regional analysis!