Computes row sums of the matrix stored in a resident_handle,
using a GPU-resident reduction when the backend supports it to avoid
a round-trip download. Falls back to base::rowSums on the
materialized matrix when no resident reduction is available.