Calculate the similarity matrix for a set of factors in a data frame using various similarity methods.
Usage
factor_sim(des, method = c("Jaccard", "Rogers", "simple matching", "Dice"))Arguments
- des
A data frame containing factors for which the similarity matrix will be computed.
- method
A character vector specifying the method used for computing the similarity. The available methods are:
"Jaccard" - Jaccard similarity coefficient
"Rogers" - Rogers and Tanimoto similarity coefficient
"simple matching" - Simple matching coefficient
"Dice" - Dice similarity coefficient
Details
The factor_sim function computes the similarity matrix for a set of factors in a data frame using the chosen method.
The function first converts the data frame into a model matrix, then calculates the similarity matrix using the proxy::simil
function from the proxy package.
The function supports four similarity methods: Jaccard, Rogers, simple matching, and Dice. The choice of method depends on the specific use case and the desired properties of the similarity measure.
Examples
des <- data.frame(
var1 = factor(c("a", "b", "a", "b", "a")),
var2 = factor(c("c", "c", "d", "d", "d"))
)
sim_jaccard <- factor_sim(des, method = "Jaccard")
sim_dice <- factor_sim(des, method = "Dice")