Feature Selection in rMVPA

Introduction

Neuroimaging datasets often contain far more features than informative signal. Feature selection reduces dimensionality, improves interpretability, and can boost predictive performance—provided it is performed inside the cross‑validation loop to avoid selection bias. In rMVPA, you configure selection with a feature‑selector object. The package supports an ANOVA‑based F‑test (“FTest”) and correlation‑adjusted t‑scores (“catscore”, via sda.ranking) to rank features, together with simple cutoffs: keep the top k features or the top proportion p (e.g., p = 0.1 keeps the top 10%).

Creating a Feature Selector Object

To create a feature selector in rMVPA, use the feature_selector() function. For example, to construct a feature selector using the FTest method with a top_k cutoff (selecting the top 10 features):

suppressPackageStartupMessages(library(rMVPA))
# Create a feature selector using FTest with top_k cutoff (select top 10 features)
fsel <- feature_selector(method = "FTest", cutoff_type = "top_k", cutoff_value = 10)
fsel

## Feature Selector Object\n-----------------------\nMethod:         FTest \nCutoff Type:    top_k \nCutoff Value:   10 \n

Similarly, you can create a feature selector that selects a proportion of features using the top_p option. In the example below, we select the top 10% of features based on the FTest ranking:

# Create a feature selector using FTest with top_p cutoff (select top 10% of features)
fsel <- feature_selector(method = "FTest", cutoff_type = "top_p", cutoff_value = 0.1)
fsel

## Feature Selector Object\n-----------------------\nMethod:         FTest \nCutoff Type:    top_p \nCutoff Value:   0.1 \n

Applying Feature Selection to Data

The select_features() function applies the feature selection process to a given feature matrix X and a response variable Y. The function returns a logical vector with TRUE for selected features and FALSE otherwise.

Below is an example using simulated data:

# Simulate a response variable (categorical)
Y <- factor(rep(letters[1:4], each = 25))

# Simulate a feature matrix with 100 samples and 100 features
X <- matrix(rnorm(100 * 100), nrow = 100, ncol = 100)

# Apply feature selection using the FTest method with top_k cutoff
fsel <- feature_selector(method = "FTest", cutoff_type = "top_k", cutoff_value = 10)
selected_features <- select_features(fsel, X, Y)

# The number of selected features should be equal to the cutoff value (10)
cat("Number of selected features (top_k):", sum(selected_features), "\n")

## Number of selected features (top_k): 10

Now, let’s use the top_p option. This will select a proportion of the features. For example, with a cutoff value of 0.1, the top 10% of features will be selected:

# Apply feature selection using the FTest method with top_p cutoff (select top 10% of features)
fsel <- feature_selector(method = "FTest", cutoff_type = "top_p", cutoff_value = 0.1)
selected_features <- select_features(fsel, X, Y)

# Calculate the proportion of features selected
the_proportion <- sum(selected_features) / ncol(X)
cat("Proportion of features selected (top_p):", the_proportion, "\n")

## Proportion of features selected (top_p): 0.1

Using the catscore Method

Alternatively, you can use the catscore method to perform feature selection. The catscore method computes a correlation-adjusted t-score for each feature. Here’s an example:

# Create a feature selector using catscore with top_k cutoff (select top 10 features)
fsel <- feature_selector(method = "catscore", cutoff_type = "top_k", cutoff_value = 10)

# Simulate a response variable and feature matrix
Y <- factor(rep(letters[1:3], length.out = 90))
X <- matrix(rnorm(90 * 50), nrow = 90, ncol = 50)

# Apply feature selection using catscore
selected_features <- select_features(fsel, X, Y, ranking.score = "entropy")

cat("Number of features selected using catscore (top_k):", sum(selected_features), "\n")

## Number of features selected using catscore (top_k): 10

Summary

Feature selection is a powerful tool to reduce dimensionality in high-dimensional datasets, especially in neuroimaging applications. In rMVPA, the integration of feature selection into cross-validation workflows helps ensure that models are built on unbiased, relevant subsets of data. You can choose between different methods (FTest or catscore) and cutoff strategies (top_k vs top_p) based on your specific analysis needs.

Bradley Buchsbaum

2025-09-28

Introduction

Creating a Feature Selector Object

Applying Feature Selection to Data

Using the catscore Method

Summary