Alvin Hans
Case Study // Case Study 02

Customer Intelligence & Segmentation Platform

A full-stack analytics application turning 8.5M+ transactions into automated marketing intelligence.

Data Volume8.5M+
User Profiles2M+
Catalog Scale70K+

The Strategic Problem

The Context

"The business struggled to operationalize segmentation. Static dashboarding wasn't enough; marketers needed the ability to spin up clustering models dynamically without waiting for Data Science."

Marketing teams were dependent on data requests for every campaign. We needed an in-memory analytics engine that could perform live K-Means clustering and Market Basket Analysis without exposing code.

  • The Trade-off Map: Constrained by operational overhead, meaning pure theoretical accuracy was less important than practical reliability.
  • Constraint 01: Non-Technical End Users: The UI had to completely abstract away algorithm tuning, evaluating Silhouette Scores strictly behind the scenes.
  • Constraint 02: CPU-Aware Processing: Interactive Streamlit apps choke on big data. We had to push heavy aggregations down to ClickHouse and keep only ML arrays in memory.
System Blueprints

The Diamond Centerpiece

Data EngineEngine
ML LayerModeling
Interactive UI SurfaceSurface
Architecture Map: Multi-stage pipeline optimized for throughput and relevance.

Technical Rationale

Core Approach

Built a stateful Streamlit application wrapping Scikit-learn and Apriori modeling, tapping directly into ClickHouse to aggregate millions of receipts on the fly.

Outcome

Provided an interactive A/B testing simulator and ABC-XYZ product matrix, shifting the business from reactive reporting to proactive strategy.

Engine

ClickHouse columnar database for instantly aggregating 8.5M+ retail interactions.

Modeling

Scikit-learn K-Means (RFM) and Apriori (Market Basket) rule mining.

Evaluation Metrics

Quantitative Validation

Observation 01

Cohort Automation: Replaces multi-day SQL ad-hoc requests with instant UI-driven segment discovery.

Observation 02

Bundle Optimization: Identifies specific cross-selling pairs (Support & Lift) across 70K+ SKUs dynamically.

Observation 03

Strategic Matrix: Extends customer analysis into product assessment via a real-time ABC-XYZ matrix.

Delivery & Reflections

Automated Model Selection: The system auto-evaluates K-Means/MiniBatch against Silhouette Scores and serves the winning configuration to the dashboard without user intervention.

In-Memory State Management: Engineered robust Streamlit caching and session keys to persist complex filtering logic across multiple application pages.

Business-Driven Output: Moving beyond 'clustering novelty' to build a dedicated A/B testing simulator that projects actual revenue impact.

Project Repository & Exploration