
Calculate Silhouette Width for Soft Clustering Algorithms
Source:R/softSilhouette.R
softSilhouette.Rd
Computes silhouette widths for soft clustering results by interpreting cluster membership probabilities (or their transformations) as proximity measures. Although originally designed for evaluating clustering quality within a method, this adaptation allows heuristic comparison across soft clustering algorithms using average silhouette widths.
Arguments
- prob_matrix
A numeric matrix where rows represent observations and columns represent cluster membership probabilities (or transformed probabilities, depending on
prob_type
). Ifclust_fun
is provided,prob_matrix
should be the name of the matrix component as a string (e.g.,"u"
forfcm
).- prob_type
Character string specifying the type transformation of membership matrix considered as proximity matrix in
prob_matrix
. Options are:"pp"
Posterior probabilities \([\gamma_{ik}]_{n \times K}\) (non-negative, typically summing to 1 per row), treated as similarities
"nlpp"
Negative log of posterior probabilities \([-\ln\gamma_{ik}]_{n \times K}\) (non-positive), treated as dissimilarities.
"pd"
Probability distribution \([\gamma_{ik}/\pi_{k}]_{n \times K}\) (normalized posterior probabilities relative to cluster proportions \(\pi_{k}\)), treated as similarities.
Defaults to
"pp"
.- method
Character string specifying the silhouette calculation method. Options are
"pac"
(Probability of Alternative Cluster) or"medoid"
. Defaults to"pac"
.- average
Character string specifying the type of average silhouette width calculation. Options are
"crisp"
(simple average) or"fuzzy"
(weighted average based on membership differences). Defaults to"crisp"
.- a
Numeric value controlling the fuzzifier or weight scaling in fuzzy silhouette averaging. Higher values increase the emphasis on strong membership differences. Must be positive. Defaults to
2
.- print.summary
Logical; if
TRUE
, prints a summary table of average silhouette widths and sizes for each cluster. Defaults toFALSE
.- clust_fun
Optional S3 or S4 function object or function as character string specifying a clustering function that produces the proximity measure matrix. For example,
fcm
or"fcm"
. If provided,prox_matrix
must be the name of the matrix component in the clustering output (e.g.,"d"
forfcm
whenproximity_type = "dissimilarity"
). Defaults toNULL
.- ...
Additional arguments passed to
clust_fun
, such asx,centers
forfcm
.
Value
A data frame of class "Silhouette"
containing cluster assignments, nearest neighbor clusters, silhouette widths for each observation, and weights (for fuzzy clustering). The object includes the following attributes:
- proximity_type
The proximity type used (
"similarity"
or"dissimilarity"
).- method
The silhouette calculation method used (
"medoid"
or"pac"
).
Details
Although the silhouette method was originally developed for evaluating clustering structure within a single result, this implementation allows leveraging cluster membership probabilities from soft clustering methods to construct proximity-based silhouettes. These silhouette widths can be compared heuristically across different algorithms to assess clustering quality.
See doi:10.1080/23737484.2024.2408534 for more details.
References
Raymaekers, J., & Rousseeuw, P. J. (2022). Silhouettes and quasi residual plots for neural nets and tree-based classifiers. Journal of Computational and Graphical Statistics, 31(4), 1332–1343. doi:10.1080/10618600.2022.2050249
Bhat Kapu, S., & Kiruthika. (2024). Some density-based silhouette diagnostics for soft clustering algorithms. Communications in Statistics: Case Studies, Data Analysis and Applications, 10(3-4), 221-238. doi:10.1080/23737484.2024.2408534
Examples
# \donttest{
# Compare two soft clustering algorithms using softSilhouett
# Example: FCM vs. FCM2 on iris data, using average silhouette width as a criterion
data(iris)
if (requireNamespace("ppclust", quietly = TRUE)) {
fcm_result <- ppclust::fcm(iris[, 1:4], 3)
out_fcm <- softSilhouette(prob_matrix = fcm_result$u,print.summary = TRUE)
plot(out_fcm)
sfcm <- summary(out_fcm, print.summary = FALSE)
} else {
message("Install 'ppclust' to run this example: install.packages('ppclust')")
}
#> -----------------------------------------
#> Average similarity pac silhouette: 0.7541
#> -----------------------------------------
#>
#> cluster size avg.sil.width
#> 1 1 40 0.7005
#> 2 2 50 0.9507
#> 3 3 60 0.6261
#>
#> Available attributes: names, class, row.names, proximity_type, method
if (requireNamespace("ppclust", quietly = TRUE)) {
fcm2_result <- ppclust::fcm2(iris[, 1:4], 3)
out_fcm2 <- softSilhouette(prob_matrix = fcm2_result$u,print.summary = TRUE)
plot(out_fcm2)
sfcm2 <- summary(out_fcm2, print.summary = FALSE)
} else {
message("Install 'ppclust' to run this example: install.packages('ppclust')")
}
#> -----------------------------------------
#> Average similarity pac silhouette: 0.4113
#> -----------------------------------------
#>
#> cluster size avg.sil.width
#> 1 1 25 0.3623
#> 2 2 60 0.2666
#> 3 3 65 0.5636
#>
#> Available attributes: names, class, row.names, proximity_type, method
# Compare average silhouette widths of fcm and fcm2
if (requireNamespace("ppclust", quietly = TRUE)) {
cat("FCM average silhouette width:", sfcm$avg.width, "\n")
cat("FCM2 average silhouette width:", sfcm2$avg.width, "\n")
}
#> FCM average silhouette width: 0.7541271
#> FCM2 average silhouette width: 0.411275
# }