🎯 Learning Engagement Recommendation Engine

📑 Table of Contents

🧩 Overview
🚀 Features
📂 Repository Structure
📝 Requirements
🧠 Setup and Usage
🧮 Model Details
🧭 Limitations and Future Work
📬 Contact

🧩 Overview

This project implements a hybrid recommendation system designed to enhance learning experiences by suggesting personalized educational content.
It combines content-based filtering (leveraging metadata like titles, domains, and difficulty levels) with collaborative filtering (using user interaction patterns) to recommend relevant materials such as videos, articles, quizzes, and case studies.

The system aims to improve retention, engagement, and content relevance by factoring in user preferences, seniority, and interaction history.

Datasets are synthetically generated to simulate real-world conditions — including user profiles, content items, and engagement logs.
The core implementation is done in a Jupyter notebook, supported by Python scripts for data generation.

🚀 Features

Personalized Recommendations
Tailors suggestions based on user roles (e.g., students, juniors, seniors), learning styles (visual, auditory, etc.), and past interactions.
Hybrid Model
- Content-based: Uses TF-IDF vectorization on content titles and categorical matching for domains/subtopics.
- Collaborative: Employs SVD (Singular Value Decomposition) to uncover latent user-content interaction patterns.
Data Simulation
Generates realistic datasets with skewed distributions to mimic real-world learning behaviors (e.g., more beginner-level content, casual vs. power users).
Evaluation Metrics
Implements Precision@K, Recall@K, and NDCG@K to evaluate recommendation quality.
Data Insights
Extracts correlations, engagement trends, and content imbalances to refine the model.

📂 Repository Structure

├── createusers.py # Generates users.csv with profiles & seniority
├── create_datasets.py # Generates content.csv & engagements.csv
├── recommedation_engine.ipynb # Core notebook for model building & evaluation
├── Presentation.pdf # Project overview slides (for reference)
└── README.md # Documentation

📝 Requirements

Python: 3.8+

Libraries:

pip install pandas numpy matplotlib seaborn scikit-learn surprise faker

Note: The notebook includes a pin for numpy<2 due to compatibility with surprise.

🧠 Setup and Usage

1. Generate Datasets

Run the following commands in order:

python createusers.py
python create_datasets.py

Description of Scripts

createusers.py → Generates users.csv containing 100,000 users.
create_datasets.py → Generates:
- content.csv with 10,000 items
- engagements.csv with 3,000,000 interactions

These scripts leverage:

Faker → For realistic names and domains
NumPy → For skewed statistical distributions (e.g., durations, user selection probabilities)

2. Run the Notebook

Open recommedation_engine.ipynb in Jupyter Notebook or Google Colab.

Steps inside the notebook:

Load and explore datasets
Build a content-based recommender (TF-IDF + cosine similarity)
Train SVD model on engagement scores (0–10 scale)
Combine into a hybrid model using a tunable α (e.g., α = 0.7 for collaborative emphasis)
Generate recommendations for a sample user
Evaluate model using offline metrics

📊 Example Output

Sample top-5 recommendations:

content_id	title	predicted_score
9621	Vision-oriented regional toolset	7.042
5206	Open-architected contextually-based approach	7.029
7310	Adaptive modular knowledge flow	6.982
1982	Cross-domain learner interface	6.951
8427	Progressive knowledge ecosystem	6.923

📈 Evaluation

The notebook includes helper functions to compute the following metrics:

Precision@5
Recall@5
NDCG@5

Note: Run evaluations on a subset of users for efficiency. Full evaluation may take longer due to the large dataset size.

🧮 Model Details

Data Schemas

Users

Column	Description
user_id	Unique user identifier
title	Job title (e.g., Junior Data Analyst)
department	Department name
seniority_level	Role level (Student → Lead)
learning_style	Visual, Auditory, or Kinesthetic

Content

Column	Description
content_id	Unique content ID
title	Content title
domain	High-level domain (e.g., Data Science)
subtopic	Specific sub-area
difficulty_level	Beginner, Intermediate, Advanced
content_type	Video, Article, Quiz, etc.

Engagements

Column	Description
user_id	Linked to Users
content_id	Linked to Content
timestamp	Interaction timestamp
duration_seconds	Time spent
liked	Boolean or null
engagement_type	Viewed, Completed, etc.

🔄 Simulation Logic

Engagements are simulated over a 1-year span.
Scores blend implicit feedback (e.g., duration) and explicit feedback (e.g., likes).
Addresses data sparsity by focusing on key interaction signals, such as recency and depth.

🔀 Hybrid Blending Formula

The final recommendation score is computed as:

Final Score = α × Collaborative + (1 - α) × Content-Based

Tuning α based on use case:

Higher α → Favors collaborative learning (returning users)
Lower α → Favors content similarity (new users)

💡 Key Insights

Feature Importance: Titles dominate TF-IDF similarity; domains/subtopics enhance diversity.
User Behavior: Casual users (1–5 interactions) dominate, but power users drive content trends.
Content Distribution: Skewed toward beginner videos; advanced materials less frequent.
Challenges Addressed: Mitigates cold-start via metadata, balances personalization vs. diversity.

🧭 Limitations and Future Work

Synthetic data: Replace with real engagement logs for production.
Scalability: Optimize SVD or migrate to distributed engines (e.g., Spark ALS).
Enhancements:
- Integrate deeper learning-style modeling.
- Add real-time recommendation updates.
- Explore deep learning (e.g., Neural Collaborative Filtering).
Evaluation: Current metrics are offline — A/B testing recommended for live platforms.

📬 Contact

For questions or collaboration, reach out via

This project demonstrates a scalable and interpretable approach to educational recommender systems, adaptable for e-learning platforms like Coursera, Udemy, or internal corporate learning ecosystems.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dataset_generation		dataset_generation
NotebookLM Mind Map.png		NotebookLM Mind Map.png
Presentation.pdf		Presentation.pdf
README.md		README.md
recommedation_engine.ipynb		recommedation_engine.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎯 Learning Engagement Recommendation Engine

📑 Table of Contents

🧩 Overview

🚀 Features

📂 Repository Structure

📝 Requirements

🧠 Setup and Usage

1. Generate Datasets

2. Run the Notebook

📊 Example Output

📈 Evaluation

🧮 Model Details

Data Schemas

Users

Content

Engagements

🔄 Simulation Logic

🔀 Hybrid Blending Formula

💡 Key Insights

🧭 Limitations and Future Work

📬 Contact

About

Uh oh!

Releases

Packages

Languages

smshelar/LearnRecEngage

Folders and files

Latest commit

History

Repository files navigation

🎯 Learning Engagement Recommendation Engine

📑 Table of Contents

🧩 Overview

🚀 Features

📂 Repository Structure

📝 Requirements

🧠 Setup and Usage

1. Generate Datasets

2. Run the Notebook

📊 Example Output

📈 Evaluation

🧮 Model Details

Data Schemas

Users

Content

Engagements

🔄 Simulation Logic

🔀 Hybrid Blending Formula

💡 Key Insights

🧭 Limitations and Future Work

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages