This question was initially posted on Cross Validated but is closed due to being "off-topic." I've since encountered the same issue(s) and wondering how it can be addressed programmatically.
Using the factoextra
package from R, I am looking to visualize some cluster analyses using the fviz_cluster()
function; specifically, I am encountering issues after performing PAM (i.e., cluster::pam
).
NOTE: the data being used contains all numeric features with no missing values and have been scaled and centered prior to clustering.
My process currently looks as follows:
library(cluster)
library(factoextra)
data -> df
factoextra::get_dist(df, method = "spearman") -> dist_mtx
cluster::pam(
x = dist_mtx, #dissimilarity matrix
k = 4, #number of clusters
diss = TRUE, #flag indicating use of disimiliarity matrix
# metric = "euclidean", #ignored since dissimiliarty matrix is used
pamonce = FALSE #default for original algo
) ->
pam_res
The PAM method can take a while depending on the size of the data set, but the output is
an object of class "pam" representing the clustering. See ?pam.object for details
This is where I'm running into issues because of the backend code of fviz_cluster
. If I do the following:
fviz_cluster(
object = pam_res,
# data = df,
geom = "point"
)
I get an error stating:
Error in array(x, c(length(x), 1L), if (!is.null(names))) list(names(x), : 'data' must be of vector type, was NULL
The documentation states that the data
argument is only required when visualizing kmeans or DBSCAN. The aforementioned code chunk still does not work even if the data is included in the fviz_cluster
function.
One workaround was provided in this SO response and the actual data was appended to the resulting "pam" object (i.e., df -> pam_res$data
). Although this works, I am wondering if it actually impacts the resulting visualization? The fviz_cluster
function doesn't seem like it can use both a dissimilarity matrix AND data set to produce an image so is my dissimilarity matrix being ignored when I add the data to the object?
Any ideas would be much appreciated!
Cheers
question from:
https://stackoverflow.com/questions/65944625/pam-cluster-visualization-from-dissimilarity-measure-using-factoextra-package