Recap from Day 017
In day 017 we looked into Unsupervised Learning Techniques dwelling on cluster analysis. We learned that Clustering can be roughly distinguished as Hard clustering, where each data point belongs to only one cluster, Soft clustering, where each data point can belong to more than one cluster.
Common Hard Clustering Algorithms
Today, we’ll start looking into common hard clustering algorithms.
k-Means
k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster) with the nearest mean, serving as a prototype of the cluster.
How k-Means Works
k-means partitions data into k number of mutually exclusive clusters. How well a point fits into a cluster is determined by the distance from that point to the cluster’s center.
Result: Cluster centers
Best Used…
When the number of clusters is known
For fast clustering of large data sets
k-Medoids
The k-medoids algorithm is a clustering algorithm related to the k-means algorithm and the medoidshift algorithm. Both the k-means and k-medoids algorithms are partitional (breaking the dataset up into groups) and both attempt to minimize the distance between points labeled to be in a cluster and a point designated as the center of that cluster.
K-medoids seeks a subset of points out of a given set such that the total costs or distances between each point to the closest point in the chosen subset is minimal. This chosen subset of points are called medoids.
*Result: Cluster centers that coincide with data points*
Best Used…
When the number of clusters is known
For fast clustering of categorical data
To scale to large data sets
Great Job. You made it to the end of day 018. I hope you found this informative. Thank you for taking time out of your schedule and allowing me to be your guide on this journey.
Reference
MathWorks- 90221_80827v00_machine_learning_section4_ebook_v03 pdf
*https://en.wikipedia.org/wiki/Cluster_analysis*
*http://clusteringjl.readthedocs.io/en/latest/kmedoids.html*