Introduction to the multivarious Package • multivarious

Introduction

The multivarious package provides generic functions and some basic implementations for dimensionality reduction of high-dimensional data. This vignette focuses on two main classes in the package, projector and bi_projector, and demonstrates how to use the project function for projecting new data onto a lower-dimensional subspace.

Projector and Bi-projector Classes

projector and bi_projector are two core classes in the multivarious package. They represent linear transformations from a high-dimensional space to a lower-dimensional space.

Projector

A projector instance maps a matrix from an \(N\)-dimensional space to a \(d\)-dimensional space, where \(d\) may be less than \(N\). The projection matrix, \(V\), is not necessarily orthogonal. This class can be used for various dimensionality reduction techniques like PCA, LDA, etc.

Bi-projector

A bi_projector instance offers a two-way mapping from samples (rows) to scores and from variables (columns) to components. This allows projecting from a \(D\)-dimensional input space to a \(d\)-dimensional subspace, and projecting from an \(n\)-dimensional variable space to the \(d\)-dimensional component space. The singular value decomposition (SVD) is a canonical example of such a two-way mapping.

The Project Function

The project function is a generic function that takes a model fit (typically an object of class bi_projector or any other class that implements a project method) and new observations. It projects these observations onto the subspace defined by the model. This enables the transformation of new data into the same lower-dimensional space as the original data. Mathematically, projection consists of the following:

\[ X \approx USV^T \]

\[ \text{projected_data} = \text{new_data} \cdot V \]

Example

In this example, we will demonstrate how to create a bi_projector object using the results of an SVD and project new data onto the same subspace as the original data.


# Load the multivarious package
library(multivarious)
#> 
#> Attaching package: 'multivarious'
#> The following object is masked from 'package:stats':
#> 
#>     residuals
#> The following object is masked from 'package:base':
#> 
#>     truncate

# Create a synthetic dataset
set.seed(42)
X <- matrix(rnorm(200), 10, 20)

# Perform SVD on the dataset
svdfit <- svd(X)

# Create a bi_projector object
p <- bi_projector(svdfit$v, s = svdfit$u %*% diag(svdfit$d), sdev = svdfit$d)

# Generate new data to project onto the same subspace as the original data
new_data <- matrix(rnorm(5 * 20), 5, 20)

projected_data <- project(p, new_data)
print(projected_data)
#>            [,1]       [,2]       [,3]       [,4]         [,5]       [,6]
#> [1,] -0.7864046 -0.4800612 -0.8865129 -0.2262336  0.678323845  2.3448302
#> [2,] -0.1325076  1.3903017 -1.5064698  1.2550899  0.673332975  0.5718498
#> [3,]  0.2864814 -0.2701905 -0.8181929 -0.8490030  0.006430116 -1.2933359
#> [4,]  1.1311516  0.2891873 -1.0273319 -1.7338016  2.030170740 -2.8299146
#> [5,] -0.8503584  0.1281519 -0.3367642  1.1956792 -0.315282938  1.2275375
#>              [,7]       [,8]       [,9]      [,10]
#> [1,] -0.985947936 -0.3905921 -0.8155574  1.2573170
#> [2,] -0.006895692  1.5533483 -0.9853844  0.1021079
#> [3,] -1.018713344 -0.8993998  1.1744588  1.1175886
#> [4,] -0.520309205  0.8700092  0.5702875  0.1959549
#> [5,]  1.078096512  0.1897177 -0.9757569 -1.1833003

In the multivarious package, the bi_projector class allows you to project new variables into the subspace defined by the model. The project_vars function is a generic function that operates on an object of a class implementing the project_vars method, such as a bi_projector object. This function projects one or more variables onto a subspace, which can be computed for a biorthogonal decomposition like Singular Value Decomposition (SVD).

Remember, given an original data matrix \(X\), the SVD of \(X\) can be written as:

\[ X \approx USV^T \]

Where \(U\) contains the left singular vectors (scores), \(S\) is a diagonal matrix containing the singular values, and \(V^T\) contains the right singular vectors (components). When we have new variables (columns) that we want to project into the same subspace as the original data, we can use the project_vars function.

Projecting New Variables onto the Subspace

Let’s say we have a new data matrix new_data with the same number of rows as the original data. To project these new variables into the subspace, we can compute:

= U^T

The result is a matrix or vector of the projected variables in the subspace.

Here’s an example of how you can use the svd_wrapper function in the multivarious package with the iris dataset to compute the SVD and project new variables into the subspace.

First, let’s load the iris dataset and compute the SVD using the svd_wrapper function:

# Load iris dataset and select the first four columns
data(iris)
X <- iris[, 1:4]

# Compute SVD using the base method and 3 components
fit <- svd_wrapper(X, ncomp = 3, preproc = center(), method = "base")

Now, let’s assume we have a new data matrix new_data with the same number of rows as the original data. To project these new variables into the subspace, we can use the project_vars function:

# Define new_data
new_data <- rnorm(nrow(iris))

# Project the new variables into the subspace
projected_vars <- project_vars(fit, new_data)

This example demonstrates how to compute the SVD using the svd_wrapper function and project new variables into the subspace defined by the SVD using the project_vars function.