Machine Learning
Identifying New Customer Segments from Personalization Data through Machine Learning
June 07, 2018

Personalization, the ability to engage online visitors and customers one-on-one, is considered one of the differentiators for online marketing and commerce. It can no longer be considered a single feature in your e-commerce and marketing platforms, but must be part of your business as a core competency. Customers are not only allowing personalization but are expecting it.  Customers will turn very loyal if they feel a brand “gets” them. At Coria we have a solid practice based on Sitecore’s Content Management System (CMS) for Contextual Marketing and Personalization. We understand what steps are necessary to build your brands around Contextual Marketing both from a content organization and a technical perspective.


In a previous post, Personalization - Identifying Anonymous Users through On-Site Behavior, we presented the process of categorizing content using Profiles and Profile Keys, which are Sitecore’s terms for tracking visitor intent and interest. As visitors navigate through the site and consume content they are accumulating a score along each of these Profiles and Profile Keys. These are then matched up with predefined Pattern Cards, which is Sitecore’s term for a persona or an expected customer segment.


When a visitor consumes content on an e-commerce site, points are accumulated along Profiles and specifically along their Profile Keys. In this example, content is scored on the basis of how Brand Conscious and how Price Sensitive the content is. The range for each of these goes from 0 to 10 where 10 implies that the visitor has a preference for branded merchandise and 10 implies that the visitor is willing to pay more.


If an Anonymous Visitor were to visit each of these pages, each of which has a score for both Price and Brand, a real-time accumulated score averaged across all the pages would be calculated.



This Anonymous Visitor’s average score was calculated and found to have values of 3.5 for Brand and 3.75 for the Price. This real-time score is then compared with the two predefined Pattern Cards of No Frills and Brand Conscious, which have values of  Brand:3 / Price:3 and Brand:8 / Price:8 respectively.  Calculating which of these Pattern Cards is a closer fit, using a Euclidean  distance formula, results in the Anonymous Visitor being identified as a No Frills customer. This method of personalization allows every visitor to be identified based on their behavior and interests so that personalized content can be offered up once a clear pattern has been identified.

In this article the same information that is used to match a visitor to content will be used to validate the assumed personas and identify any emerging customer segments using Machine Learning Clustering algorithms. Using historical data for visitors, we can plot them against the known personas and validate them. The additional benefit to this is that if a high concentration  of visitors is observed far enough from the known personas then this may be used to identify new customer segments. 

Clustering is a Machine Learning method in which data observations are grouped together. A cluster is a practical way of labeling, organizing and understanding a larger set of data points. It is easier to profile content along a handful of clusters than it is for millions of unique visitors. There are several methods for clustering but one of the most popular is K-Means Clustering. It will be used in this article since it is a relatively simple method to understand and implement.  The basic premise of K-Means Clustering is that all data points may be represented by a predetermined number of clusters.  The “K” in the name is the number of clusters expected to be found from the data.

The distance between each observed data point and cluster’s center is measured. When the algorithm is started, an arbitrary center point is assigned for each cluster. The observations are then grouped with their closest cluster.  Once all the observations have been measured, a new cluster centroid is calculated discarding the previous center point and the process is repeated until the groupings remain constant.

In the example above a single anonymous visitor resulted in a final score of 3.75 for Price and 3.5 for Brand. If all visitor data over a period of time is analyzed then these may be clustered to determine how well No Frills and Brand Conscious is a match to each of our visitors.

Included in this blog post is a CSV file with 350 sample visitor profiles. This file was brought into a Python Jupyter Notebook for analysis and this Notebook is also provided. The Notebook was also exported as a Python file. All charts and data points in this article were produced using this CSV file from the Jupyter Notebook.


Each visitor is shown on this chart along with the two Pattern Cards in larger dark plot points. Visually there is strong evidence that shows that there are many visitors who do end up with reasonably close matches to either of the Pattern Cards of “NoFrills” and “BrandAware”. There is further evidence that some visitors are scoring in areas relatively far as well. The top left corner of the chart is sparse but the bottom right does show some accumulation of visitors.




Taking the 350 sample visitors and clustering them on 2 clusters (K = 2), the visitors are separated along these two clusters:

The plot points have been clustered but again it is obvious that the clustering is relatively sparse. Each clusters’ members appear to be relatively far from each other. There is a method, known as the “Elbow Method”, which calculates the distortion where each cluster member is measured from the cluster center using a method called “Sum of Squares Distance”. The more closely bunched the points are to the center, the lower the distortion.  Imagine if instead of 2 clusters 350 separate clusters were used. Since there happen to be 350 observation points we would have 350 perfect clusters each with a single visitor as a member and the distortion would end up being zero for k=350. This defeats the purpose of clustering so a happy medium is sought. By plotting the distortion across several K Values, a point of inflection is often observed. Using this visitor data and testing for 1 through 10 clusters, the Elbow Method gets us this chart:

The first conclusion from this is that using 2 clusters, as was assumed with the two No Frills and Brand Conscious personas, is not very good. Looking at the chart the “elbow” is observed at K=3.

Using K=3 the K-means clustering algorithm produces the following plot.

The center point for each of these clusters are:

CLUSTER 1, the black dots, corresponds relatively well with the presumed No Frills customer type. The assumption was that this customer cared little about brands and preferred to spend less. The resulting values of Brand: 2.56 / Price: 2.76 are very close to No Frills’ Brand: 3 / Price: 3.  Similarly CLUSTER 3, the cyan dots, also corresponds reasonably well to the customer type Brand Aware. This cluster’s values of Brand: 7.06 / Price: 7.74 correspond quite closely with Brand Aware’s Brand: 8 / Price: 8. This cluster and customer type represents a customer who prefers to shop for brands and is willing to spend a little more.

What is interesting in this analysis is the emergence of CLUSTER 2, which are represented by the dots in red. There is no presumed customer type that it represents. This cluster has values of Brand: 7.20 / Price: 2.88. This suggests a customer who would prefer to purchase branded merchandise but at a lower price. This type of customer has been addressed by existing brands. Big fashion houses such as Givenchy and Oscar de la Renta might not sell garments at low price points but they do offer fragrances that are within reach of many customers.

This type of analysis may prove useful to run periodically to validate your assumptions of your customer types and to determine if you have emerging customer segments. Contact Coria to discuss how we can help you. We also offer workshops to help you make the most of your data. The workshops serve as a sounding board to validate your content and how you make use of the Profile and Pattern information in your CMS, especially Sitecore. If your content offers little variation then it makes it difficult to differentiate your customers’ intentions.

A final word regarding clustering. The purpose of clustering is to group together like objects. In the examples here, visitors with similar interests pertinent to the fictitious e-commerce site were identified and clustered. Determining what is important means not only identifying the relevant metrics and attributes but also discarding those that may incorrectly influence the final outcome. In a future article, the topic of what and how to cluster the metrics available will be covered.