Implements Golub-Kahan Lanczos bidiagonalization directly in ArrayFire C, keeping all matvecs and CGS2 reorthogonalization on the GPU. Only 2*work scalars and the final basis matrices cross PCIe per restart; no per-step host transfers.
Arguments
- A
A matrix or
adgeMatrix. Coerced if necessary.- nv
Number of singular values/vectors to compute.
- nu
Number of left singular vectors (default =
nv).- tol
Convergence tolerance.
- maxit
Maximum number of restarts.
- work
Size of the Lanczos subspace per restart. Larger values converge in fewer restarts at the cost of more memory and work per restart. Default is
max(nv + 20L, 3L * nv).- v0
Optional starting vector (length
ncol(A)).- mode, backend
Passed to
adgeMatrix()when coercing.
Details
Compared to am_irlba, which routes each Lanczos matvec through S4
dispatch, this function:
eliminates S4 overhead on the hot path
replaces k sequential GEMVs for reorthogonalization with one GEMM
uploads A once and never re-uploads it across restarts