This project is an end-to-end Customer Shopping Analysis transforming raw retail data into actionable business insights through Python-based data cleaning, SQL analytics, and an interactive Power BI dashboard.
The goal of this project is to simulate a corporate-grade, end-to-end data analytics workflow, demonstrating the ability to translate raw data into strategic business intelligence by:
✅ Data Preparation, Modeling & Exploratory Data Analysis (Python): Clean and transform the raw dataset for analysis.
✅ Data Analysis (SQL): Run analytical business queries to extract insights on customer segments, loyalty, and purchase drivers.
✅ Visualization & Insights (Power BI): Interactive dashboard highlighting key patterns and trends for data-driven decision-making.
✅ Report & Presentation: Communicate insights and recommendations in a business-focused manner.
git clone https://github.com/InfiniteLoop360/Customer_Behavior_Analysis.git
cd customer-trends-data-analysis-SQL-Python-PowerBIThis notebook includes:
- 📥 Data Import
- 🔍 Data Exploration (EDA)
- 🧹 Data Cleaning & Feature Engineering
- 🔗 SQL Database Connection (PostgreSQL / MySQL / MS SQL Server)
- ⬆️ Loading Clean Data into SQL Database
- Create database in SQL Server
- Run Python code (from notebook) to populate the database
- Execute queries in customer_behavior_sql_queries.sql to answer business questions
- Connect Power BI to SQL Database
- Open:
customer_behavior_dashboard.pbix - Explore interactive visual insights
- Prepare a business report summarizing findings
- Create a presentation deck using Power BI screenshots & business impact (optionally using Gamma AI)
The final deliverable is an interactive Power BI dashboard that allows business teams to:
- Filter insights by season, location, and customer type
- Track revenue trends and shopping behavior
- Identify high-value segments for marketing
Designed for data-driven decisions and business storytelling.
| Stage | Technology | Purpose |
|---|---|---|
| Data Cleaning | Python (Pandas) | Transform & enrich raw data |
| Database | PostgreSQL | Scalable data storage & querying |
| Connector | SQLAlchemy | Seamless Python ➝ SQL integration |
| Visualization | Power BI | Interactive dashboards & insights |
This project follows a 3‑phase data pipeline:
Raw dataset: customer_shopping_behavior.csv (3,900 rows)
Key transformations performed:
-
🧹 Missing Value Imputation
- 37 missing
review_ratingvalues filled using median rating per product category
- 37 missing
-
🏷️ Column Standardization
- Converted headers to
snake_casefor SQL compatibility
- Converted headers to
-
🧠 Feature Engineering
age_groupvia statistical quartiles → Young Adult / Adult / Middle‑aged / Seniorpurchase_frequency_daysconverted into numeric values (e.g., Weekly → 7)
-
🚫 Redundant Column Removal
- Dropped duplicate column
promo_code_used
- Dropped duplicate column
➡️ Final cleaned dataset loaded into PostgreSQL table: customer
Below are the core SQL queries used to derive key business insights:
SELECT gender, SUM(purchase_amount) AS revenue
FROM customer
GROUP BY gender;
SELECT customer_id, purchase_amount
FROM customer
WHERE discount_applied = 'Yes' AND purchase_amount >= (SELECT AVG(purchase_amount) FROM customer);
SELECT item_purchased, ROUND(AVG(review_rating::numeric),2) AS "Average Product Rating"
FROM customer
GROUP BY item_purchased
ORDER BY AVG(review_rating) DESC
LIMIT 5;
SELECT shipping_type,
ROUND(AVG(purchase_amount),2)
FROM customer
WHERE shipping_type IN ('Standard','Express')
GROUP BY shipping_type;
SELECT subscription_status,
COUNT(customer_id) AS total_customers,
ROUND(AVG(purchase_amount),2) AS avg_spend,
ROUND(SUM(purchase_amount),2) AS total_revenue
FROM customer
GROUP BY subscription_status
ORDER BY total_revenue, avg_spend DESC;
SELECT item_purchased,
ROUND(100.0 * SUM(CASE WHEN discount_applied = 'Yes' THEN 1 ELSE 0 END) / COUNT(*),2) AS discount_rate
FROM customer
GROUP BY item_purchased
ORDER BY discount_rate DESC
LIMIT 5;
WITH customer_type AS (
SELECT customer_id, previous_purchases,
CASE
WHEN previous_purchases = 1 THEN 'New'
WHEN previous_purchases BETWEEN 2 AND 10 THEN 'Returning'
ELSE 'Loyal'
END AS customer_segment
FROM customer
)
SELECT customer_segment, COUNT(*) AS "Number of Customers"
FROM customer_type
GROUP BY customer_segment;
WITH item_counts AS (
SELECT category,
item_purchased,
COUNT(customer_id) AS total_orders,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY COUNT(customer_id) DESC) AS item_rank
FROM customer
GROUP BY category, item_purchased
)
SELECT item_rank, category, item_purchased, total_orders
FROM item_counts
WHERE item_rank <= 3;
SELECT subscription_status,
COUNT(customer_id) AS repeat_buyers
FROM customer
WHERE previous_purchases > 5
GROUP BY subscription_status;
SELECT age_group,
SUM(purchase_amount) AS total_revenue
FROM customer
GROUP BY age_group
ORDER BY total_revenue DESC;
Live connection to PostgreSQL database to ensure real‑time, refreshable insights.

| Question | Insight | Value Delivered |
|---|---|---|
| Revenue by Gender | Men spent far more | Revenue focus on male audience |
| High‑Spending Discount Users | 839 customers identified | Target for profitable promo strategy |
| Top Rated Products | Gloves, Sandals, Boots... | Inventory optimization |
| Shipping Type Impact | Spend nearly identical | Express shipping not revenue driver |
| Subscriber Value | Non‑subscribers = $170k+ revenue | Big conversion opportunity |
| Discount‑Driven Products | Hat & Sneakers most influenced | Price sensitivity segmentation |
| Customer Segmentation | Loyal: 3,116 | Retention program potential |
| Top Products by Category | "Hero" products identified | Perfect for marketing campaigns |
| Repeat Buyers & Subscription | 2,518 not subscribed | Upsell campaign focus |
| Revenue by Age Group | Young Adult leads | Demographic targeting |
Strategic actions based on findings:
✅ Convert 2,518 loyal non‑subscribers → membership program, special onboarding campaigns
✅ Implement loyalty rewards for 3,100+ regular buyers to boost retention
✅ Optimize discount strategy
- Promote price‑sensitive items (Hats, Sneakers)
- Protect margins on high‑demand products
✅ Marketing campaign focus
- "Young Adult" & "Middle-aged" segments → highest spending
- Feature top performing products in ads
-
Python 3.x
-
PostgreSQL installed and running
-
Power BI Desktop
-
Required Python libraries:
pip install pandas sqlalchemy psycopg2-binary
1️⃣ Create a PostgreSQL database → customer_behaviour
2️⃣ Update credentials in main.py
3️⃣ Place customer_shopping_behavior.csv in the same folder
4️⃣ Run the pipeline:
python main.py✅ Automatically loads cleaned data into PostgreSQL
Run queries from analysis.sql in pgAdmin or DBeaver
Open: customer_behaviour.pbix in Power BI Desktop
- Update connection path when prompted
├── main.py # Python data cleaning + PostgreSQL loader
├── analysis.sql # SQL business insights
├── customer_behaviour.pbix # Power BI dashboard file
├── customer_shopping_behavior.csv # Raw dataset
└── README.md # Project documentation
✅ A complete retail analytics system ✅ SQL‑powered decision intelligence ✅ Dashboard‑based story for stakeholders
This project showcases data engineering + analytics + visualization expertise for real‑world business value.
If you'd like to enhance or explore this project further — feel free to contribute or reach out!