GPU-native truncated SVD via Lanczos bidiagonalization

Implements Golub-Kahan Lanczos bidiagonalization directly in ArrayFire C, keeping all matvecs and CGS2 reorthogonalization on the GPU. Only 2*work scalars and the final basis matrices cross PCIe per restart; no per-step host transfers.

Usage

irlba_native(
  A,
  nv = 5L,
  nu = nv,
  tol = sqrt(.Machine$double.eps),
  maxit = 100L,
  work = max(nv + 20L, 3L * nv),
  v0 = NULL,
  mode = "fast",
  backend = NULL
)

Arguments

A: A matrix or adgeMatrix. Coerced if necessary.
nv: Number of singular values/vectors to compute.
nu: Number of left singular vectors (default = nv).
tol: Convergence tolerance.
maxit: Maximum number of restarts.
work: Size of the Lanczos subspace per restart. Larger values converge in fewer restarts at the cost of more memory and work per restart. Default is max(nv + 20L, 3L * nv).
v0: Optional starting vector (length ncol(A)).
mode, backend: Passed to adgeMatrix() when coercing.

Value

A list with components d, u, v, iter, mprod, compatible with irlba::irlba().

Details

Compared to am_irlba, which routes each Lanczos matvec through S4 dispatch, this function:

eliminates S4 overhead on the hot path
replaces k sequential GEMVs for reorthogonalization with one GEMM
uploads A once and never re-uploads it across restarts

GPU-native truncated SVD via Lanczos bidiagonalization

Usage

Arguments

Value

Details

See also