Introduction to the multivarious Package
Introduction.Rmd
Introduction
The multivarious package provides generic functions and some basic implementations for dimensionality reduction of high-dimensional data. This vignette focuses on two main classes in the package, projector and bi_projector, and demonstrates how to use the project function for projecting new data onto a lower-dimensional subspace.
Projector and Bi-projector Classes
projector and bi_projector are two core classes in the
multivarious
package. They represent linear transformations
from a high-dimensional space to a lower-dimensional space.
Projector
A projector instance maps a matrix from an \(N\)-dimensional space to a \(d\)-dimensional space, where \(d\) may be less than \(N\). The projection matrix, \(V\), is not necessarily orthogonal. This class can be used for various dimensionality reduction techniques like PCA, LDA, etc.
Bi-projector
A bi_projector instance offers a two-way mapping from samples (rows) to scores and from variables (columns) to components. This allows projecting from a \(D\)-dimensional input space to a \(d\)-dimensional subspace, and projecting from an \(n\)-dimensional variable space to the \(d\)-dimensional component space. The singular value decomposition (SVD) is a canonical example of such a two-way mapping.
The Project Function
The project function is a generic function that takes a model fit (typically an object of class bi_projector or any other class that implements a project method) and new observations. It projects these observations onto the subspace defined by the model. This enables the transformation of new data into the same lower-dimensional space as the original data. Mathematically, projection consists of the following:
\[ X \approx USV^T \]
\[ \text{projected_data} = \text{new_data} \cdot V \]
Example
In this example, we will demonstrate how to create a bi_projector object using the results of an SVD and project new data onto the same subspace as the original data.
# Load the multivarious package
library(multivarious)
#>
#> Attaching package: 'multivarious'
#> The following object is masked from 'package:stats':
#>
#> residuals
#> The following object is masked from 'package:base':
#>
#> truncate
# Create a synthetic dataset
set.seed(42)
X <- matrix(rnorm(200), 10, 20)
# Perform SVD on the dataset
svdfit <- svd(X)
# Create a bi_projector object
p <- bi_projector(svdfit$v, s = svdfit$u %*% diag(svdfit$d), sdev = svdfit$d)
# Generate new data to project onto the same subspace as the original data
new_data <- matrix(rnorm(5 * 20), 5, 20)
projected_data <- project(p, new_data)
print(projected_data)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] -0.7864046 -0.4800612 -0.8865129 -0.2262336 0.678323845 2.3448302
#> [2,] -0.1325076 1.3903017 -1.5064698 1.2550899 0.673332975 0.5718498
#> [3,] 0.2864814 -0.2701905 -0.8181929 -0.8490030 0.006430116 -1.2933359
#> [4,] 1.1311516 0.2891873 -1.0273319 -1.7338016 2.030170740 -2.8299146
#> [5,] -0.8503584 0.1281519 -0.3367642 1.1956792 -0.315282938 1.2275375
#> [,7] [,8] [,9] [,10]
#> [1,] -0.985947936 -0.3905921 -0.8155574 1.2573170
#> [2,] -0.006895692 1.5533483 -0.9853844 0.1021079
#> [3,] -1.018713344 -0.8993998 1.1744588 1.1175886
#> [4,] -0.520309205 0.8700092 0.5702875 0.1959549
#> [5,] 1.078096512 0.1897177 -0.9757569 -1.1833003
In the multivarious
package, the
bi_projector
class allows you to project new variables into
the subspace defined by the model. The project_vars
function is a generic function that operates on an object of a class
implementing the project_vars
method, such as a
bi_projector
object. This function projects one or more
variables onto a subspace, which can be computed for a biorthogonal
decomposition like Singular Value Decomposition (SVD).
Remember, given an original data matrix \(X\), the SVD of \(X\) can be written as:
\[ X \approx USV^T \]
Where \(U\) contains the left
singular vectors (scores), \(S\) is a
diagonal matrix containing the singular values, and \(V^T\) contains the right singular vectors
(components). When we have new variables (columns) that we want to
project into the same subspace as the original data, we can use the
project_vars
function.
Projecting New Variables onto the Subspace
Let’s say we have a new data matrix new_data
with the
same number of rows as the original data. To project these new variables
into the subspace, we can compute:
= U^T
The result is a matrix or vector of the projected variables in the subspace.
Here’s an example of how you can use the svd_wrapper
function in the multivarious
package with the
iris
dataset to compute the SVD and project new variables
into the subspace.
First, let’s load the iris
dataset and compute the SVD
using the svd_wrapper
function:
# Load iris dataset and select the first four columns
data(iris)
X <- iris[, 1:4]
# Compute SVD using the base method and 3 components
fit <- svd_wrapper(X, ncomp = 3, preproc = center(), method = "base")
Now, let’s assume we have a new data matrix new_data
with the same number of rows as the original data. To project these new
variables into the subspace, we can use the project_vars
function:
# Define new_data
new_data <- rnorm(nrow(iris))
# Project the new variables into the subspace
projected_vars <- project_vars(fit, new_data)
This example demonstrates how to compute the SVD using the
svd_wrapper
function and project new variables into the
subspace defined by the SVD using the project_vars
function.