MVPA Searchlight Tutorial
Your Name
2025-03-20
CommandLineScripts.Rmd
Introduction
This tutorial explains how to run a searchlight-based Multivariate Pattern Analysis (MVPA) using MVPA_Searchlight.R. The script performs a local classification or regression analysis on fMRI data by iterating over each voxel (or node for surface data) and extracting information from a surrounding neighborhood.
Key Features:
The script handles both volumetric (NIfTI) and surface-based
neuroimaging data. It leverages parallel processing across multiple
cores for faster computation. You can choose from various classifiers
and regressors like rf
, sda_notune
, and
corsim
. The analysis generates reproducible outputs
including config files and metric maps. Cross-validation options include
both blocked and stratified approaches. Data can be optionally
normalized through centering and scaling. The script supports different
feature selection methods and works seamlessly with both volumetric and
surface-based analyses.
Running the Script
1. Basic Usage
If you have:
- A 4D fMRI file for training:
train_data.nii
- A trial-by-trial design matrix:
train_design.txt
- A brain mask file:
mask.nii
You can run the script from the command line:
2. Understanding Data Modes
The script supports two primary data modes:
3. Available Models
The script supports various classification and regression models:
Built-in MVPA Models:
-
corclass
: Correlation-based classifier with template matching -
sda_notune
: Simple Shrinkage Discriminant Analysis without tuning -
sda_boot
: SDA with bootstrap resampling -
glmnet_opt
: Elastic net with EPSGO parameter optimization -
sparse_sda
: SDA with sparsity constraints -
sda_ranking
: SDA with automatic feature ranking -
mgsda
: Multi-Group Sparse Discriminant Analysis -
lda_thomaz
: Modified LDA for high-dimensional data -
hdrda
: High-Dimensional Regularized Discriminant Analysis
4. Cross-Validation Options
The script supports multiple cross-validation strategies:
Advanced Cross-Validation Methods
In addition to the standard options above, several advanced cross-validation strategies are available:
- Blocked Cross-Validation: Divides the dataset based on a blocking variable (e.g., session) so that samples from the same block remain together.
- K-Fold Cross-Validation: Randomly partitions the data into k folds, providing a robust estimate of model performance.
- Bootstrap Blocked Cross-Validation: Generates bootstrap resamples within blocks to assess model stability in heterogeneous datasets.
- Sequential Blocked Cross-Validation: Assigns sequential folds within each block, preserving temporal or ordered structures.
- Custom Cross-Validation: Allows you to define custom training and testing splits if standard methods do not fit your experimental design.
Specify the desired method in your configuration file by setting the
name
field under cross_validation
. For
example, to use bootstrap blocked cross-validation:
cross_validation:
name: "bootstrap" # Options: "twofold", "bootstrap", "sequential", "custom", "kfold"
nreps: 10
Choose the method that best aligns with your data structure and experimental design.
6. Understanding label_column
The label column is critical as it specifies the target variable for classification or regression.
- If performing classification, this column should
contain categorical labels (e.g.,
"Face"
vs."House"
). - If performing regression, this column should contain continuous values (e.g., reaction times, confidence ratings).
Example Design File
(train_design.txt
):
trial condition subject session
1 Face S01 1
2 House S01 1
3 Face S01 1
4 House S01 1
5 Face S01 2
7. Using a Configuration File
Instead of specifying all options on the command line, you can use a YAML or R script configuration file.
Example YAML Config File
(config.yaml
):
# Data Sources
train_design: "train_design.txt"
test_design: "test_design.txt"
train_data: "train_data.nii"
test_data: "test_data.nii"
mask: "mask.nii"
# Analysis Parameters
model: "rf" # Random Forest classifier
data_mode: "image" # or "surface"
ncores: 4
radius: 6
label_column: "condition"
block_column: "session"
# Output Options
output: "searchlight_results"
normalize_samples: TRUE
class_metrics: TRUE
# Advanced Options
feature_selector:
method: "anova"
cutoff_type: "percentile"
cutoff_value: 0.1
cross_validation:
name: "twofold"
nreps: 10
# Optional Subsetting
train_subset: "subject == 'S01'"
test_subset: "subject == 'S02'"
Running with a Config File:
8. Expected Outputs
After running the script, the output directory
(searchlight_results/
) contains:
-
Performance Maps: NIfTI files for each performance
metric
-
accuracy.nii
: Overall classification accuracy map -
auc.nii
: Area Under Curve (AUC) performance map - For multiclass problems with
class_metrics: TRUE
:-
auc_class1.nii
,auc_class2.nii
, etc.: Per-class AUC maps
-
-
-
Probability Maps: When available
-
prob_observed.nii
: Probabilities for observed classes -
prob_predicted.nii
: Probabilities for predicted classes
-
-
Configuration
-
config.yaml
: Complete record of analysis parameters for reproducibility
-
Example directory structure:
searchlight_results/
├── accuracy.nii # Overall classification accuracy
├── auc.nii # Mean AUC across classes
├── auc_class1.nii # AUC for class 1 (if class_metrics: TRUE)
├── auc_class2.nii # AUC for class 2 (if class_metrics: TRUE)
├── prob_observed.nii # Probabilities for observed classes
├── prob_predicted.nii # Probabilities for predicted classes
└── config.yaml # Analysis configuration
The exact files will depend on: - Whether it’s a binary or multiclass
classification - If class_metrics: TRUE
is set - The type
of analysis (classification vs regression) - The model type used
For regression analyses, you’ll see different metrics: -
r2.nii
: R-squared values - rmse.nii
: Root Mean
Square Error - spearcor.nii
: Spearman correlation
9. Performance Considerations
- Use
--normalize_samples=TRUE
for better model performance - Increase
--ncores
for faster processing on multi-core systems - Adjust
--radius
based on your spatial resolution and hypothesis - Consider using
--type=randomized
for faster approximate searchlights - Set appropriate memory limits with
options(future.globals.maxSize)
Summary
MVPA_Searchlight.R provides a flexible searchlight-based MVPA tool that works with both volumetric and surface-based data. It includes cross-validation, feature selection, and extensive configuration through command line or config files. The tool generates comprehensive metrics and reproducible outputs to help you analyze your neuroimaging data.
Next Steps: - Try different models
(--model=rf
, --model=sda_notune
) - Experiment
with feature selection methods - Explore surface-based MVPA with
--data_mode=surface
- Use cross-validation strategies
appropriate for your design - Optimize performance with parallel
processing
Happy searchlighting!