Data Science Desktop Survival Guide
by Graham Williams |
|||||
Chapter: Cluster Analysis |
20200902 Cluster analysis(or clustering) is widely used in data mining to identify groups of similar data. It is well supported in R (R Core Team, 2020) with many packages available for preparing for cluster analysis, identifying a good number of clusters, performing a clustering, and evaluating the clustering. A variety of cluster analysis algorithms are available, each generating a cluster index for each data item, as the representation of the clustering. The measure of performance often involves measuring the distances of points withion a cluster and between clusters.
We have briefly introduced the KMeans clustering algorithm in decision trees as an algorithm in Section 16.7.