Skip to contents

Introduction

This tutorial explains how to run a searchlight-based Multivariate Pattern Analysis (MVPA) using MVPA_Searchlight.R. The script performs a local classification or regression analysis on fMRI data by iterating over each voxel (or node for surface data) and extracting information from a surrounding neighborhood.

Key features

The script handles both volumetric (NIfTI) and surface data and can parallelize across cores. You can select classifiers and regressors such as rf, sda_notune, and corsim, enable feature selection, and choose cross‑validation schemes that respect run structure. Outputs include performance and probability maps together with a complete configuration file for reproducibility. Optional normalization (centering/scaling) is available in both data modes.


Running the Script

1. Basic Usage

If you have:

  • A 4D fMRI file for training: train_data.nii
  • A trial-by-trial design matrix: train_design.txt
  • A brain mask file: mask.nii

You can run the script from the command line:

MVPA_Searchlight.R --radius=6 \
                           --train_design=train_design.txt \
                           --train_data=train_data.nii \
                           --mask=mask.nii \
                           --model=sda_notune \
                           --label_column=condition \
                           --ncores=4 \
                           --output=my_searchlight_output

2. Data modes

The script supports two primary data modes:

Image mode (volumetric)

Default (--data_mode=image), operates on NIfTI files with a binary mask and processes voxels in 3D.

Surface mode

Use --data_mode=surface for cortical meshes; multiple sections are supported.

3. Models

The script supports various classification and regression models:

Built‑in MVPA models

  • corclass: Correlation-based classifier with template matching
  • sda_notune: Simple Shrinkage Discriminant Analysis without tuning
  • sda_boot: SDA with bootstrap resampling
  • glmnet_opt: Elastic net with EPSGO parameter optimization
  • sparse_sda: SDA with sparsity constraints
  • sda_ranking: SDA with automatic feature ranking
  • mgsda: Multi-Group Sparse Discriminant Analysis
  • lda_thomaz: Modified LDA for high-dimensional data
  • hdrda: High-Dimensional Regularized Discriminant Analysis

You can also register custom models via register_mvpa_model().

4. Cross‑validation options

The script supports multiple cross-validation strategies:

Blocked Cross-Validation

--block_column=session

Uses a blocking variable (e.g., session) for cross-validation splits.

K-Fold Cross-Validation

Default when no block column is specified. Uses random splits.

Two-Fold Cross-Validation

Specify in the configuration file:

cross_validation:
  name: "twofold"
  nreps: 10

Advanced cross‑validation methods

Beyond standard blocked and k‑fold splits, you can use bootstrap blocked CV (resampling within runs), sequential blocked CV (ordered folds), or provide custom train/test indices. Specify the method in the config file under cross_validation.name. For example:

cross_validation:
  name: "bootstrap"   # Options: "twofold", "bootstrap", "sequential", "custom", "kfold"
  nreps: 10

Choose the method that best matches your data structure and experimental design.

5. Feature Selection

Enable feature selection with the --feature_selector parameter:

feature_selector:
  method: "anova"  # or "correlation", "t-test", etc.
  cutoff_type: "percentile"
  cutoff_value: 0.1

6. Understanding label_column

The label column is critical as it specifies the target variable for classification or regression.

  • If performing classification, this column should contain categorical labels (e.g., "Face" vs. "House").
  • If performing regression, this column should contain continuous values (e.g., reaction times, confidence ratings).

Example Design File (train_design.txt):

trial  condition  subject  session
1      Face       S01      1
2      House      S01      1
3      Face       S01      1
4      House      S01      1
5      Face       S01      2

7. Using a Configuration File

Instead of specifying all options on the command line, you can use a YAML or R script configuration file.

Example YAML Config File (config.yaml):

# Data Sources
train_design: "train_design.txt"
test_design: "test_design.txt"
train_data: "train_data.nii"
test_data: "test_data.nii"
mask: "mask.nii"

# Analysis Parameters
model: "rf"  # Random Forest classifier
data_mode: "image"  # or "surface"
ncores: 4
radius: 6
label_column: "condition"
block_column: "session"

# Output Options
output: "searchlight_results"
normalize_samples: TRUE
class_metrics: TRUE

# Advanced Options
feature_selector:
  method: "anova"
  cutoff_type: "percentile"
  cutoff_value: 0.1

cross_validation:
  name: "twofold"
  nreps: 10

# Optional Subsetting
train_subset: "subject == 'S01'"
test_subset: "subject == 'S02'"

Running with a Config File:

Rscript MVPA_Searchlight.R --config=config.yaml

8. Expected Outputs

After running the script, the output directory (searchlight_results/) contains:

  • Performance Maps: NIfTI files for each performance metric
    • accuracy.nii: Overall classification accuracy map
    • auc.nii: Area Under Curve (AUC) performance map
    • For multiclass problems with class_metrics: TRUE:
      • auc_class1.nii, auc_class2.nii, etc.: Per-class AUC maps
  • Probability Maps: When available
    • prob_observed.nii: Probabilities for observed classes
    • prob_predicted.nii: Probabilities for predicted classes
  • Configuration
    • config.yaml: Complete record of analysis parameters for reproducibility

Example directory structure:

searchlight_results/
├── accuracy.nii          # Overall classification accuracy
├── auc.nii              # Mean AUC across classes
├── auc_class1.nii       # AUC for class 1 (if class_metrics: TRUE)
├── auc_class2.nii       # AUC for class 2 (if class_metrics: TRUE)
├── prob_observed.nii    # Probabilities for observed classes
├── prob_predicted.nii   # Probabilities for predicted classes
└── config.yaml          # Analysis configuration

The exact files will depend on: - Whether it’s a binary or multiclass classification - If class_metrics: TRUE is set - The type of analysis (classification vs regression) - The model type used

For regression analyses, you’ll see different metrics: - r2.nii: R-squared values - rmse.nii: Root Mean Square Error - spearcor.nii: Spearman correlation

9. Performance Considerations

  • Use --normalize_samples=TRUE for better model performance
  • Increase --ncores for faster processing on multi-core systems
  • Adjust --radius based on your spatial resolution and hypothesis
  • Consider using --type=randomized for faster approximate searchlights
  • Set appropriate memory limits with options(future.globals.maxSize)

Summary

MVPA_Searchlight.R provides a flexible searchlight-based MVPA tool that works with both volumetric and surface-based data. It includes cross-validation, feature selection, and extensive configuration through command line or config files. The tool generates comprehensive metrics and reproducible outputs to help you analyze your neuroimaging data.

Next Steps: - Try different models (--model=rf, --model=sda_notune) - Experiment with feature selection methods - Explore surface-based MVPA with --data_mode=surface - Use cross-validation strategies appropriate for your design - Optimize performance with parallel processing

Happy searchlighting!