Module 3 · Statistical Visualization with Seaborn

Section 4: Heatmaps and Correlation Analysis

Correlation Matrix · Heatmaps · Cluster Maps · Pivot Heatmaps · Interpretation

🌡️ Correlation Matrix
  • df.corr() — Pearson by default; add method="spearman"
  • Values range −1 (inverse) to +1 (perfect positive)
  • Diagonal is always 1.0 (self-correlation)
  • Mask upper triangle to avoid redundant information
🔥 Heatmaps
  • sns.heatmap(df.corr(), annot=True, fmt=".2f")
  • Colormaps: coolwarm, RdBu_r, viridis
  • center=0 for diverging palette centered at zero
  • mask parameter hides upper or lower triangle
🧩 Cluster Maps
  • sns.clustermap(df.corr()) — hierarchical clustering
  • Groups similar variables and similar observations
  • Reveals natural structure hidden in flat heatmaps
  • Useful for customer segmentation and feature grouping
📋 Pivot Table Heatmaps
  • df.pivot_table(values, index, columns, aggfunc)
  • Visualize performance across two categorical dimensions
  • E.g., revenue by region × product category
  • Highlight cells with a diverging colormap
💡 Interpreting Correlations
  • |r| > 0.7 = strong; 0.4–0.7 = moderate; < 0.4 = weak
  • Multicollinearity: highly correlated predictors cause issues
  • Confounding variables can inflate apparent correlations
  • Always validate with domain knowledge and scatter plots
🧪 Lab 4 — Heatmap Analysis
  • Part A: Business metrics comprehensive correlation heatmap
  • Part B: Revenue driver deep dive with scatter validation
  • Part C: Marketing performance pivot table heatmap
  • Annotate key correlations with business interpretations