This vignette provides guidance on selecting between kNN and Sinkhorn mappers based on your analysis needs, demonstrates how to implement custom transport methods, and shows how to leverage warm starts to improve computational performance.
Mapper Factory Recap
library(dkge)
dkge_mapper("knn", k = 8, sigx = 3)
#> $method
#> [1] "knn"
#>
#> $pars
#> $pars$k
#> [1] 8
#>
#> $pars$sigx
#> [1] 3
#>
#>
#> attr(,"class")
#> [1] "dkge_mapper_knn" "dkge_mapper"
dkge_mapper("sinkhorn", epsilon = 0.05, lambda_xyz = 1, lambda_feat = 0)
#> $method
#> [1] "sinkhorn"
#>
#> $pars
#> $pars$epsilon
#> [1] 0.05
#>
#> $pars$lambda_xyz
#> [1] 1
#>
#> $pars$lambda_feat
#> [1] 0
#>
#>
#> attr(,"class")
#> [1] "dkge_mapper_sinkhorn" "dkge_mapper"The dkge_mapper() function creates a mapper
configuration that stores the backend parameters for your chosen
transport method. When you call fit_mapper() on this
configuration, it produces subject-specific transport plans that include
cached computational state, enabling efficient reuse across multiple
mapping operations.
Speed Comparison
S <- 3; q <- 3; P <- 200
centroids <- replicate(S, matrix(rnorm(P * 3), P, 3), simplify = FALSE)
anchors <- matrix(rnorm(1500 * 3), 1500, 3)
values <- lapply(seq_len(S), function(s) rnorm(P))
knn_mapper <- dkge_mapper("knn", k = 12, sigx = 5)
knn_fit <- lapply(seq_len(S), function(s) fit_mapper(knn_mapper, centroids[[s]], anchors))
sink_mapper <- dkge_mapper("sinkhorn", epsilon = 0.02, lambda_xyz = 1)
sink_fit <- lapply(seq_len(S), function(s) fit_mapper(sink_mapper, centroids[[s]], anchors))
microbenchmark::microbenchmark(
knn = lapply(seq_len(S), function(s) apply_mapper(knn_fit[[s]], values[[s]])),
sinkhorn = lapply(seq_len(S), function(s) apply_mapper(sink_fit[[s]], values[[s]])),
times = 10L
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> knn 331 336 569 379 433 2297 10
#> sinkhorn 5173 5245 7928 8314 9366 10799 10The performance characteristics of these two mapping approaches
differ significantly. kNN mapping operates through purely local
neighborhoods and scales linearly with the number of neighbors
k, making it computationally efficient for most
applications. In contrast, Sinkhorn mapping offers greater
expressiveness by supporting feature-based costs and soft matching
between points, but this flexibility comes with increased computational
overhead. However, the optimized C++ solver implementation with warm
start capabilities makes repeated Sinkhorn calls much more tractable for
iterative analyses.
Warm Starts and Dual Caching
The example below calls dkge:::sinkhorn_plan_cpp(), an
internal C++ function. It is shown here for
illustration only — the DKGE pipeline manages warm starts automatically
through its renderer objects. Do not rely on this function’s signature
in production code; use fit_mapper() /
apply_mapper() instead.
C <- as.matrix(dist(anchors[1:200, ]))
mu <- rep(1/nrow(C), nrow(C))
nu <- rep(1/ncol(C), ncol(C))
# Internal C++ entry point — exposed here for illustration only
res1 <- dkge:::sinkhorn_plan_cpp(C, mu, nu, epsilon = 0.05, max_iter = 400, tol = 1e-7)
res2 <- dkge:::sinkhorn_plan_cpp(C, mu, nu, epsilon = 0.05, max_iter = 400, tol = 1e-7,
log_u_init = res1$log_u, log_v_init = res1$log_v)
res1$iterations
res2$iterationsThe second optimization run converges significantly faster because it
leverages the saved dual variables from the previous computation. This
warm start mechanism is automatically handled by the DKGE rendering
pipeline, which stores these dual variables in the renderer object for
subsequent use. When working directly with the lower-level
sinkhorn_plan_cpp() function for experimentation, you can
manually provide these initialization values to achieve similar
performance gains.
Adding Custom Mappers
To integrate a custom mapping approach into the DKGE framework, you
need to implement two key S3 methods that follow the established naming
convention recognized by dkge_mapper().
fit_mapper.dkge_mapper_mytransport <- function(mapper, subj_points, anchor_points, ...) {
# return object with class 'dkge_mapper_fit_mytransport'
}
apply_mapper.dkge_mapper_fit_mytransport <- function(fitted_mapper, values, ...) {
# return length(anchor_points) vector
}As long as you maintain consistent output dimensions that match the expected anchor point structure, DKGE will seamlessly integrate your custom mapper into the existing analysis pipeline without requiring additional configuration.
Practical Guidance
The choice between mapping methods should be guided by your specific analysis requirements and data characteristics. For quick exploratory analyses or situations where subject clusters already show good anatomical alignment, kNN mapping provides an efficient and straightforward solution.
However, when dealing with substantial anatomical shifts between
subjects or when latent features should guide the matching process,
Sinkhorn mapping becomes the preferred choice. In these cases, you can
tune the epsilon parameter upward to achieve faster
convergence and smoother transport plans, though this comes with a
trade-off in matching precision.
To maximize computational efficiency across multiple analyses, make sure to reuse renderer objects when performing bootstraps or computing contrasts. This practice allows you to exploit the cached dual variables and avoid the computational overhead of recomputing transport plans from scratch.