When I wrote down the title of this blog post, I first wondered if is worth while writing on. So I performed a quick google search on “the difference between segmentation and clustering” to get an impression. It yields a lot of links to … segmentation and clustering, but very few of them clearly give a distinction between the two.
Let us first look at some definitions we can find :
“the process of organizing objects into groups whose members are similar in some way” (here)
“a number of different algorithms and methods for grouping objects of similar kind into respective categories” (here)
“a way to form ‘natural groupings’ or clusters of patterns” (here)
“the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense” (http://en.wikipedia.org/wiki/Cluster_analysis)
“Customer segmentation is the practice of dividing a customer base into groups of individuals that are similar in specific ways relevant to marketing, such as age, gender, interests, spending habits, and so on” (here)
“A marketing technique that targets a group of customers with specific characteristics” (here)
“A market segment is a sub-set of a market made up of people or organizations sharing with one or more characteristics that cause them to demand similar product and/or services based on qualities of those products such as price or function. A true market segment meets all of the following criteria: it is distinct from other segments (different segments have different needs), it is homogeneous within the segment (exhibits common needs); it responds similarly to a market stimulus, and it can be reached by a market intervention.” (here)
Segmentation groups objects into similar groups The resulting groups contain members that are more similar to each other than they are to other groups(here).
It looks like there is a lot of similarity between the two, but, as I will explain, actually there is no similarity whatsoever !
Let’s begin with segmentation.
My definition is simple : “dividing something into pieces according to some criteria”. And we call each piece a segment.
When we deal with a group of customers, we talk about customer segmentation or market segmentation. It’ s just splitting the whole customer base into groups of customers that have some characteristic in common. It can be gender : men and women.
But it can also be age_category : people under 40 and people over 40, or net income : people who earn less than 10,000$ and those who earn more.
That’s all that is to segmentation : decide on your criteria (actually on the borders between the segments) and assign each customer, or whatever you want to segment, to its segment.
So what about clustering ?
Wikipedia gives your a few dozen of definitions for all sorts of clusters. The meaningful word that occurs the most is “group”. So a cluster is a group.
Even better is when we seach in google for pictures of clusters : We find for example a beautiful picture of the pleiades :
With “cluster analysis” as search term the first picture is :
A lot more simple than the pleiades, but gives the same impression : groups of points ! Or otherwise put : density differences in a two-dimensional space (pictures are essentially two-dimensional).
To conclude : my definition of clustering : finding regions in a (one-, two-, or multi-dimensional) space with a different density of items than the neighboring regions.
And “finding regions” means : finding the borders between the regions, because it is only if you know where the borders are that you can say where the region is situated.
Why then the seeming similarity between segmentation and clustering ?
In order to practice segmentation you have to decide on the borders between the segments.
That is simple if you only have to deal with one or two characteristics. In Marketing we sometimes have hundreds of characteristics of our customers. That makes it a bit more complex and often people use advanced statistical methods (like K-means clustering or Kohonen maps) to find these borders for them in the multidimensional space of their customer database. Eventually they will use these borders for actually segmenting their customer base.
Conclusion : clustering is finding borders between groups, segmenting is using borders to form groups.
And one Final remark :
Segmentation is always possible, even in an extremely homogeneous collection of items. You just decide where you will cut between the groups.
Finding clusters in this extremely homogeneous collection is impossible, since by definition there are no density differences, and hence no clusters to find !