Customer Segmentation using K - Means Clustering Project Documentation Project Summary This project implements customer segmentation using K - Means clustering analysis to identify distinct customer groups based on purchasing behavior. The segmentation enables targeted marketing strategies, enhanced personalization, and optimized profit maximization across different customer segments. 1. Project Objective To segment customers based on their purchasing behavior using K - Means Clustering algorithm, enabling the business to: Tailor marketing strategies for different customer groups Enhance personalization of customer experiences Maximize profit potential from each customer segment Improve customer retention and acquisition strategies 2. Dataset Overview 2.1 D ata Source The analysis utilizes transactional data from an online UK - based retail store, covering the period from December 2010 to December 2011. 2.2 Dataset Structure Column Name Data Type Description InvoiceNo String Unique invoice number for each transaction StockCode String Unique product identification code Description String Product name and description Quantity Integer Number of units purchased per transaction InvoiceDate DateTime Date and time of purchase UnitPrice Float Price per unit in British Pounds (£) CustomerID Integer Unique customer identifier (contains missing values) Country String Customer's country of residence Tables 2.2 Dataset Description 3. Methodology 3.1 Data Preprocessing Data Cleaning Process Missing Data Handling : Removed all records with missing Customer IDs as customer - specific identification is essential for segmentation analysis Quantity Filtering : Excluded transactions with negative or zero quantities, representing returns or data entry errors Price Validatio n : Removed records with UnitPrice ≤ 0, eliminating free products and invalid pricing entries Feature Engineering Total Transaction Value : Calculated as TotalAmount = Quantity × UnitPrice Temporal Features : Extracted YearMonth from InvoiceDate for time - base d analysis Customer Aggregation : Grouped transactional data by CustomerID to create customer - level metrics 3.2 RFM Analysis Implementation RFM Model Overview The analysis incorporates RFM (Recency, Frequency, Monetary) analysis, a proven customer segmentation technique that evaluates customer behavior across three critical dimensions: RFM Feature Engineering Process : Figures 3.2 Feature Engineering code snippet RFM Dimensions Explained RFM Component Calculation Method Business Interpretation Recency (R) Days since last purchase from snapshot date How recently did the customer make a purchase? Frequency (F) Number of unique transactions (invoices) How often does the customer make purchases? Monetary (M) Total amount spent across all transactions How much money does the customer spend? Tables 3.2 RFM Dimension Explanation Data Transformation Results Dataset Consolidation : Transformed from transaction - level to customer - level data Customer Count : 4,339 unique customers identified Featur e Set : 4 columns (CustomerID + 3 RFM metrics) Data Quality : Each row now represents one customer with aggregated behavioral metrics 3.3 Traditional Feature Development (Alternative Approach) For comparison purposes, traditional customer - level features were also developed: Feature Name Description Business Relevance TotalAmount Total monetary value spent by customer Customer lifetime value indicator TotalQuantity Total number of items purchased Purchase volume behavior TotalTransactions Total number of separate purchases Shopping frequency pattern AvgBasketSize Average items per transaction Shopping behavior consistency AvgSpend Average monetary value per transaction Spending pattern analysis Tables 3.3 Traditional Feature Development 3.4 Data Standa rdization Applied StandardScaler normalization to the RFM features to ensure all dimensions contribute equally to distance calculations in the K - Means algorithm. This prevents features with larger scales from dominating the clustering process. Standardization Results : Input : RFM metrics with different scales (days, counts, monetary values) Output : Normalized features with mean=0 and standard deviation=1 Sample standardized data : Customer Recency (scaled) Frequency (scaled) Monetary (scaled) 0 2 .33 - 0.42 8.36 1 - 0.91 0.35 0.25 2 - 0.18 - 0.04 - 0.03 3 - 0.74 - 0.42 - 0.03 4 2.17 - 0.42 - 0.19 Tables 3.4 Standardization Data 3.5 Clustering Implementation Optimal Cluster Determination Implemented the Elbow Method to identify optimal number of clusters Analyzed inertia values across different K values (1 - 10) Identified optimal clustering at K=4 based on elbow curve analysis K - Means Execution Applied K - Means clustering algorithm with K=4 clusters Used standardized features for distance calculatio ns Assigned cluster labels to each customer record 4. Results and Analysis 4.1 RFM - Based Cluster Characteristics The RFM analysis revealed four distinct customer segments with the following distribution and characteristics: Cluster ID Segment Name Distribution RFM Profile Business Interpretation Cluster 3 Recent but Light Buyers 70.0% Low R, Low F, Low M New or infrequent customers with minimal spending Cluster 1 Lost and Inactive 24.5% High R, Low F, Low M Previously active customers who have become disengaged Cluster 0 Loyal Spenders 4.9% Low R, High F, High M Consistent, reliable customers with regular purchase patterns Cluster 2 VIP Customers 0.3% Low R, High F, Very High M Highest - value customers with premium purchase behavior Tables 4.1 RFM - Based Cluster Characteristics 4.2 RFM Segment Interpretation Recency Analysis Low Recency : Recent purchases indicate active, engaged customers High Recency : Long time since last purchase suggests potential churn risk Frequency Analysis Low Frequency : Infrequent purchases may indicate casual or new customers High Frequency : Regular purchases demonstrate strong engagement and loyalty Monetary Analysis Low Monetary : Small spending amounts suggest price - sensitive or limited - need customers High Monetary : L arge spending indicates high customer lifetime value 4.3 Customer Value Distribution Analysis The RFM - based segmentation provides clear insights into customer value distribution: Revenue Concentration : VIP customers (0.3%) likely contribute disproportionately to total revenue despite small numbers Loyal spenders (4.9%) represent the stable revenue foundation Majority segments (94.5%) present significant growth opportunities Engagement Patterns : 70% of customers show light engagement, indicating substantial conversion potential 24.5% represent at - risk customers requiring reactivation strategies Only 5.2% demonstrate strong loyalty and high engagement 5. Marketing Team Insights and Recommendations 5.1 Customer Segment Distribution and S trategic Focus Cluster 3: Recent but Light Buyers (70% of customer base) Customer Profile : New or infrequent customers with low purchase values Key Insight : The majority of your customer base consists of customers who are either new to the brand or make in frequent, low - value purchases. This represents the largest opportunity for growth. Marketing Recommendations : Welcome Series : Implement comprehensive onboarding campaigns for new customers Educational Content : Share product benefits and usage guides to bui ld confidence Incentive Programs : Offer first - purchase discounts or bundle deals to encourage larger orders Nurture Campaigns : Develop 3 - 6 month email sequences to gradually build engagement Social Proof : Use testimonials and reviews to build trust with hesitant buyers Retargeting Ads : Focus digital advertising spend on converting these prospects into regular customers Cluster 1: Lost and Inactive Customers (24.5% of customer base) Customer Profile : Previously active customers who have become disengaged K ey Insight : Nearly a quarter of your customers have reduced their engagement. Understanding why they became inactive is crucial for retention strategy. Marketing Recommendations : Win - Back Campaigns : Create "We miss you" email series with special offers Fee dback Surveys : Conduct research to understand reasons for decreased engagement Reactivation Incentives : Offer exclusive discounts or free shipping to encourage return Product Updates : Inform them about new products or improvements since their last purchase Personal Outreach : For higher - value inactive customers, consider phone calls or personalized messages Preference Centers : Allow customers to adjust communication frequency and content type Cluster 0: Loyal Spenders (4.9% of customer base) Customer Profile : Consistent, reliable customers with regular purchase patterns Key Insight : This small but valuable segment forms your customer backbone. They require retention focus and expansion opportunities. Marketing Recommendations : Loyalty Programs : Create tiered reward systems to recognize their commitment Exclusive Access : Provide early access to new products and sales Referral Programs : Incentivize them to bring in new customers with rewards Upselling Opportunities : Recommend premium or complementary products Co mmunity Building : Create VIP customer groups or exclusive events Churn Prevention : Monitor purchase patterns closely and intervene if frequency decreases Personal Recognition : Send personalized thank - you messages and birthday offers Cluster 2: VIP Customers (0.3% of customer base) Customer Profile : Highest - value customers with premium purchase behavior Key Insight : Though representing less than 1% of customers, this segment likely contributes disproportionately to revenue and requires white - glove tr eatment. Marketing Recommendations : Dedicated Account Management : Assign personal customer success representatives Premium Services : Offer concierge - level customer service and priority support Exclusive Products : Provide access to limited editions or pre - l aunch items VIP Events : Host exclusive customer appreciation events or experiences Customization Options : Offer personalized products or services Direct Communication : Use phone calls or personal emails rather than mass communications Feedback Partnership : Involve them in product development and strategy discussions 5.2 Resource Allocation Strategy Budget Distribution Recommendations : 40% on Cluster 3 : Focus on conversion and nurturing programs 30% on Cluster 1 : Invest in reactivation and win - back campaigns 25% on Cluster 0 : Maintain loyalty and encourage expansion 5% on Cluster 2 : High - touch, personalized service investments Campaign Priority Framework : 1. Immediate Priority : Prevent VIP customer churn (Cluster 2) 2. High Priority : Nurture recent/light buyers into regular customers (Cluster 3) 3. Medium Priority : Maintain and expand loyal customer relationships (Cluster 0) 4. Ongoing Priority : Systematic reactivation of inactive customers (Cluster 1) 6. Key Business Insights 6 .1 Customer Value Distribution Hi gh - value customers (Cluster 2 ) represent a small percentage but contribute disproportionately to total revenue Majority of customers fall into lower - spend categories, indicating significant opportunity for upselling and cross - selling 6 .2 Purchase Behavior Patterns Frequent shoppers (Cluster 2) demonstrate consistent engagement and may respond positively to loyalty incentives Occasional buyers (Cluster 3) present opportunities for reactivation campaigns and increased engagement 6 .3 Revenue Optimization Opportunities Targeted marketing campaigns can be developed based on specific cluster characteristics Personalized offers can be tailored to high - frequency and high - value customer segments Customer retention strategies can be implemented for low - engagement segments 7. Business Recommendations 7 .1 Marketing Strategy Implementation Cluster 0 : Implement value - based marketing with competitive pricing and discount strategies Cluster 1 : Develop premium customer programs with exclusive offers and personalized service Cluster 2 : Create loyalty programs with frequency - based rewards and early access benefits Cluster 3 : Design reactivation campaigns with targeted promotions to increase purchase frequency 7 .2 Customer Experience Enhancement Personalize product recom mendations based on cluster characteristics Tailor communication frequency and channels to match customer preferences Implement segment - specific customer service approaches 8. Future Development Opportunities 8 .1 Enhanced Segmentation Models RFM Analysi s Integration : Incorporate Recency, Frequency, and Monetary dimensions for more comprehensive customer understanding Advanced Clustering Techniques : Explore Hierarchical Clustering and DBSCAN algorithms for alternative segmentation approaches Temporal Anal ysis : Implement time - series clustering to identify seasonal behavior patterns 8 .2 Data Enrichment Possibilities Integrate demographic data for multi - dimensional customer profiling Incorporate channel preference data for omnichannel optimization Add product category preferences for enhanced personalization 8 .3 Performance Monitoring Establish key performance indicators (KPIs) for each customer segment Implement regular model retraining schedules to maintain accuracy Develop automated reporting systems for ongoing cluster performance tracking 9. Technical Implementation Notes 9 .1 Tools and Technologies Programming Language : Python Data Processing : Pandas, NumPy Machine Learning : Scikit - learn Visualization : Matplotlib, Seaborn Scaling : StandardScaler for fea ture normalization 9 .2 Model Parameters Algorithm : K - Means Clustering Number of Clusters : 4 (determined via Elbow Method) Initialization Method : K - means++ Maximum Iterations : 300 Random State : Fixed for reproducibility 10. Conclusion The customer segmenta tion analysis using K - Means clustering with RFM features successfully identified four distinct customer segments with clear behavioral patterns and business implications. The RFM approach provides superior customer understanding by focusing on the three mo st critical dimensions of customer behavior: how recently they purchased, how frequently they engage, and how much they spend. Key Achievements : Data Consolidation : Successfully transformed 4,339 customers from transaction - level to customer - level analysis Clear Segmentation : Identified distinct customer groups with actionable characteristics Business Value : Provided specific marketing recommendations for each segment Scalable Framework : Established methodology for ongoing customer analysis Strategic Impact : The RFM - based segmentation reveals that 70% of customers are recent but light buyers, presenting the largest opportunity for growth through targeted nurturing campaigns. The identification of VIP customers (0.3%) and loyal spenders (4.9%) enables focused retention strategies for the most valuable segments. This analysis establishes a robust foundation for data - driven marketing strategies, personalized customer experiences, and optimized business growth initiatives. The RFM framework provides ongoing value through its ability to adapt to changing customer behaviors and business conditions.