Skip to contents

Introduction

This tutorial explains how to run a searchlight-based Multivariate Pattern Analysis (MVPA) using MVPA_Searchlight.R. The script performs a local classification or regression analysis on fMRI data by iterating over each voxel (or node for surface data) and extracting information from a surrounding neighborhood.

Key Features:

  • Flexible Input Handling: Works with volumetric (NIfTI) and surface-based neuroimaging data
  • Parallel Processing: Supports multi-core computation for speed
  • Customizable Models: Choose from classifiers and regressors (e.g., rf, sda_notune, corsim)
  • Reproducible Outputs: Generates config files and metric maps
  • Built-in Cross-Validation: Includes options for blocked and stratified validation
  • Data Normalization: Optional centering and scaling of samples
  • Feature Selection: Support for various feature selection methods
  • Multiple Data Modes: Handles both volumetric and surface-based analyses

Running the Script

1. Basic Usage

If you have:

  • A 4D fMRI file for training: train_data.nii
  • A trial-by-trial design matrix: train_design.txt
  • A brain mask file: mask.nii

You can run the script from the command line:

MVPA_Searchlight.R --radius=6 \
                           --train_design=train_design.txt \
                           --train_data=train_data.nii \
                           --mask=mask.nii \
                           --model=sda_notune \
                           --label_column=condition \
                           --ncores=4 \
                           --output=my_searchlight_output

2. Understanding Data Modes

The script supports two primary data modes:

Image Mode (Volumetric Data)

  • Default mode (--data_mode=image)
  • Works with NIfTI format files
  • Requires a binary mask file
  • Processes voxel-wise data in 3D space

Surface Mode

  • Activated with --data_mode=surface
  • Works with surface-based neuroimaging data
  • Can handle multiple surface sections
  • Processes data on cortical surface meshes

3. Available Models

The script supports various classification and regression models:

Built-in MVPA Models:

  • corclass: Correlation-based classifier with template matching
  • sda_notune: Simple Shrinkage Discriminant Analysis without tuning
  • sda_boot: SDA with bootstrap resampling
  • glmnet_opt: Elastic net with EPSGO parameter optimization
  • sparse_sda: SDA with sparsity constraints
  • sda_ranking: SDA with automatic feature ranking
  • mgsda: Multi-Group Sparse Discriminant Analysis
  • lda_thomaz: Modified LDA for high-dimensional data
  • hdrda: High-Dimensional Regularized Discriminant Analysis

Caret Models:

  • Any model available in the caret package (e.g., rf, svmRadial, glmnet)

4. Cross-Validation Options

The script supports multiple cross-validation strategies:

Blocked Cross-Validation

--block_column=session

Uses a blocking variable (e.g., session) for cross-validation splits.

K-Fold Cross-Validation

Default when no block column is specified. Uses random splits.

Two-Fold Cross-Validation

Specify in the configuration file:

cross_validation:
  name: "twofold"
  nreps: 10

Advanced Cross-Validation Methods

In addition to the standard options above, several advanced cross-validation strategies are available:

  • Blocked Cross-Validation: Divides the dataset based on a blocking variable (e.g., session) so that samples from the same block remain together.
  • K-Fold Cross-Validation: Randomly partitions the data into k folds, providing a robust estimate of model performance.
  • Bootstrap Blocked Cross-Validation: Generates bootstrap resamples within blocks to assess model stability in heterogeneous datasets.
  • Sequential Blocked Cross-Validation: Assigns sequential folds within each block, preserving temporal or ordered structures.
  • Custom Cross-Validation: Allows you to define custom training and testing splits if standard methods do not fit your experimental design.

Specify the desired method in your configuration file by setting the name field under cross_validation. For example, to use bootstrap blocked cross-validation:

cross_validation:
  name: "bootstrap"   # Options: "twofold", "bootstrap", "sequential", "custom", "kfold"
  nreps: 10

Choose the method that best aligns with your data structure and experimental design.

5. Feature Selection

Enable feature selection with the --feature_selector parameter:

feature_selector:
  method: "anova"  # or "correlation", "t-test", etc.
  cutoff_type: "percentile"
  cutoff_value: 0.1

6. Understanding label_column

The label column is critical as it specifies the target variable for classification or regression.

  • If performing classification, this column should contain categorical labels (e.g., "Face" vs. "House").
  • If performing regression, this column should contain continuous values (e.g., reaction times, confidence ratings).

Example Design File (train_design.txt):

trial  condition  subject  session
1      Face       S01      1
2      House      S01      1
3      Face       S01      1
4      House      S01      1
5      Face       S01      2

7. Using a Configuration File

Instead of specifying all options on the command line, you can use a YAML or R script configuration file.

Example YAML Config File (config.yaml):

# Data Sources
train_design: "train_design.txt"
test_design: "test_design.txt"
train_data: "train_data.nii"
test_data: "test_data.nii"
mask: "mask.nii"

# Analysis Parameters
model: "rf"  # Random Forest classifier
data_mode: "image"  # or "surface"
ncores: 4
radius: 6
label_column: "condition"
block_column: "session"

# Output Options
output: "searchlight_results"
normalize_samples: TRUE
class_metrics: TRUE

# Advanced Options
feature_selector:
  method: "anova"
  cutoff_type: "percentile"
  cutoff_value: 0.1

cross_validation:
  name: "twofold"
  nreps: 10

# Optional Subsetting
train_subset: "subject == 'S01'"
test_subset: "subject == 'S02'"

Running with a Config File:

Rscript MVPA_Searchlight.R --config=config.yaml

8. Expected Outputs

After running the script, the output directory (searchlight_results/) contains:

  • Performance Maps: NIfTI files for each performance metric
    • accuracy.nii: Overall classification accuracy map
    • auc.nii: Area Under Curve (AUC) performance map
    • For multiclass problems with class_metrics: TRUE:
      • auc_class1.nii, auc_class2.nii, etc.: Per-class AUC maps
  • Probability Maps: When available
    • prob_observed.nii: Probabilities for observed classes
    • prob_predicted.nii: Probabilities for predicted classes
  • Configuration
    • config.yaml: Complete record of analysis parameters for reproducibility

Example directory structure:

searchlight_results/
├── accuracy.nii          # Overall classification accuracy
├── auc.nii              # Mean AUC across classes
├── auc_class1.nii       # AUC for class 1 (if class_metrics: TRUE)
├── auc_class2.nii       # AUC for class 2 (if class_metrics: TRUE)
├── prob_observed.nii    # Probabilities for observed classes
├── prob_predicted.nii   # Probabilities for predicted classes
└── config.yaml          # Analysis configuration

The exact files will depend on: - Whether it’s a binary or multiclass classification - If class_metrics: TRUE is set - The type of analysis (classification vs regression) - The model type used

For regression analyses, you’ll see different metrics: - r2.nii: R-squared values - rmse.nii: Root Mean Square Error - spearcor.nii: Spearman correlation

9. Performance Considerations

  • Use --normalize_samples=TRUE for better model performance
  • Increase --ncores for faster processing on multi-core systems
  • Adjust --radius based on your spatial resolution and hypothesis
  • Consider using --type=randomized for faster approximate searchlights
  • Set appropriate memory limits with options(future.globals.maxSize)

Summary

  • MVPA_Searchlight.R is a flexible tool for searchlight-based MVPA
  • Supports both volumetric and surface-based analyses
  • Offers multiple cross-validation and feature selection strategies
  • Provides extensive configuration options via command line or config files
  • Generates comprehensive performance metrics and reproducible outputs

Next Steps: - Try different models (--model=rf, --model=sda_notune) - Experiment with feature selection methods - Explore surface-based MVPA with --data_mode=surface - Use cross-validation strategies appropriate for your design - Optimize performance with parallel processing

Happy searchlighting!