AI-Powered Teammate Recommendation & Compatibility Prediction System
B.Tech AI & Data Science โ 2nd Year Academic Project
- Project Overview
- AI Workflow
- Feature Engineering
- Tech Stack
- Project Structure
- Setup & Run
- API Endpoints
- Database Schema
- Viva Questions & Answers
- Future Enhancements
SkillForge is an intelligent system that helps hackathon organizers and participants find the most compatible teammates. It uses:
- NLP (TF-IDF Vectorization) to convert skill descriptions into numerical vectors
- Cosine Similarity to find users with similar skill profiles
- Logistic Regression to predict team compatibility
- Feature Engineering to create structured inputs for the ML model
- Team Balance Analysis to identify skill gaps
All AI is implemented locally using scikit-learn โ no external AI APIs are used.
User Input (name, skills, experience, domain, interest)
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MODULE 1: Skill Vectorization โ
โ TfidfVectorizer from sklearn โ
โ Input: "python, react, flask" โ
โ Output: Sparse TF-IDF vector โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MODULE 2: Similarity Engine โ
โ cosine_similarity from sklearn โ
โ Compare target vs all users โ
โ Output: Top 5 similar users โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MODULE 3: Feature Engineering โ
โ Create structured features: โ
โ โข skill_similarity (float) โ
โ โข experience_difference (int) โ
โ โข domain_match (binary) โ
โ โข interest_match (binary) โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MODULE 4: Compatibility Model โ
โ Logistic Regression from sklearnโ
โ Input: Feature vector [4 dims] โ
โ Output: Probability (0-100%) โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MODULE 5: Team Balance Analyzer โ
โ Maps skills โ roles โ
โ Identifies covered/missing rolesโ
โ Calculates team strength score โ
โ Suggests improvements โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
-
TF-IDF Vectorization: Converts skill text into weighted numerical vectors. Common skills (like "python") get lower weights, while specialized skills (like "computer vision") get higher weights. This ensures that unique expertise is emphasized in matching.
-
Cosine Similarity: Measures the angle between two TF-IDF vectors. A value of 1.0 means identical skill profiles; 0.0 means completely different. It's independent of vector magnitude, so users with different numbers of skills can still be compared fairly.
-
Feature Engineering: Transforms raw user data into ML-ready features. We create 4 features: skill similarity score, experience level difference, domain match flag, and interest match flag.
-
Logistic Regression: A supervised classification model that takes the 4 engineered features and predicts the probability of compatibility. It uses the sigmoid function to map linear combinations to probabilities.
-
Team Balance Analysis: Categorizes skills into 5 roles (AI/ML, Frontend, Backend, Database, UI/UX) and identifies which roles are covered and which are missing.
| Feature | Type | Range | Description |
|---|---|---|---|
skill_similarity |
Float | 0.0 - 1.0 | Cosine similarity between TF-IDF vectors |
experience_difference |
Integer | 0 - 2 | Absolute difference in experience levels |
domain_match |
Binary | 0 or 1 | Whether users share the same domain |
interest_match |
Binary | 0 or 1 | Whether users share the same hackathon interest |
- Skill Similarity (weight: 0.35): The most important factor. Users with complementary skills work better in teams.
- Experience Difference (weight: 0.20): Teams with mixed experience levels (mentor + mentee) often perform well.
- Domain Match (weight: 0.25): People in the same domain understand each other's tools and terminology.
- Interest Match (weight: 0.20): Shared hackathon interests ensure higher motivation and alignment.
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React 19 + TypeScript | UI Components |
| Build Tool | Vite 7 | Fast development server |
| Styling | Vanilla CSS | Dark theme, glassmorphism |
| HTTP Client | Axios | API communication |
| Backend | Flask 3.0 | REST API |
| ML Library | scikit-learn 1.3 | TF-IDF, Cosine Sim, Logistic Regression |
| Data | NumPy, Pandas | Numerical operations |
| Database | SQLite | Persistent storage |
| CORS | Flask-CORS | Cross-origin requests |
SkillForge/
โโโ backend/
โ โโโ app.py # Flask REST API (Application Layer)
โ โโโ model.py # AI Modules (5 ML components)
โ โโโ database.py # SQLite operations (Data Layer)
โ โโโ requirements.txt # Python dependencies
โ โโโ skillforge.db # SQLite database (auto-created)
โ
โโโ frontend/
โ โโโ src/
โ โ โโโ components/
โ โ โ โโโ RegisterForm.tsx # User registration form
โ โ โ โโโ RecommendationCard.tsx # Teammate card with scores
โ โ โ โโโ TeamBalancePanel.tsx # Team analysis view
โ โ โ โโโ Dashboard.tsx # Analytics dashboard
โ โ โโโ api.ts # API client with TypeScript types
โ โ โโโ App.tsx # Main application component
โ โ โโโ App.css # Component styles
โ โ โโโ main.tsx # React entry point
โ โ โโโ index.css # Global design system
โ โโโ index.html # HTML template
โ โโโ package.json # Node.js dependencies
โ โโโ vite.config.ts # Vite configuration
โ โโโ tsconfig.json # TypeScript configuration
โ
โโโ README.md # This file
- Python 3.9+ installed
- Node.js 18+ and npm installed
- VS Code (recommended)
# Navigate to backend directory
cd backend
# Install Python dependencies
pip install -r requirements.txt
# Start the Flask API server
python app.pyThe backend will start on http://localhost:5000.
You should see the AI model training logs in the terminal.
# Navigate to frontend directory (in a new terminal)
cd frontend
# Install Node.js dependencies
npm install
# Start the Vite dev server
npm run devThe frontend will start on http://localhost:5173.
If the default ports are in use, you can specify different ones:
Frontend (React):
cd frontend
npm run dev -- --port 3000Backend (Flask):
If you run the backend on a different port (e.g., 5002), update the frontend configuration by creating a .env file in the frontend folder:
VITE_API_URL=http://localhost:5002/apiVisit http://localhost:5173 (or your custom port) in your browser.
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/health |
Health check |
POST |
/api/register |
Register a new user |
POST |
/api/analyze |
Run AI analysis for a user |
GET |
/api/recommendations?user_id=1 |
Get stored recommendations |
GET |
/api/team-balance?user_ids=1,2,3 |
Analyze team balance |
GET |
/api/users |
List all users |
GET |
/api/stats |
Get dashboard statistics |
curl -X POST http://localhost:5000/api/analyze \
-H "Content-Type: application/json" \
-d '{
"name": "Test User",
"skills": "python, react, machine learning",
"experience": "intermediate",
"domain": "ai_ml",
"interest": "healthcare"
}'{
"success": true,
"recommended_teammates": [
{
"name": "Arjun Sharma",
"similarity_score": 72.5,
"compatibility_score": 85.3,
"feature_details": {
"skill_similarity": 0.725,
"experience_difference": 1,
"domain_match": true,
"interest_match": true
}
}
],
"missing_skills": ["UI/UX"],
"team_strength_level": "Strong"
}CREATE TABLE users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
skills TEXT NOT NULL, -- Comma-separated skill string
experience TEXT DEFAULT 'beginner', -- beginner/intermediate/advanced
domain TEXT DEFAULT 'general', -- ai_ml/web_dev/mobile/etc
interest TEXT DEFAULT 'general', -- healthcare/fintech/edtech/etc
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);CREATE TABLE skills (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER NOT NULL,
skill_name TEXT NOT NULL,
category TEXT DEFAULT 'general', -- ai_ml/frontend/backend/database/ui_ux
FOREIGN KEY (user_id) REFERENCES users(id)
);CREATE TABLE recommendations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER NOT NULL,
recommended_user_id INTEGER NOT NULL,
similarity_score REAL NOT NULL,
compatibility_score REAL NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id),
FOREIGN KEY (recommended_user_id) REFERENCES users(id)
);A: TF-IDF (Term Frequency-Inverse Document Frequency) is a numerical statistic that measures how important a word is to a document in a collection. We use it to convert skill text into vectors. TF measures how often a skill appears, while IDF penalizes skills that are too common across all users. This gives higher weight to specialized skills like "computer vision" vs common ones like "python".
A: Cosine similarity measures the cosine of the angle between two vectors. Formula: cos(ฮธ) = (AยทB) / (||A|| ร ||B||). It ranges from 0 (completely different) to 1 (identical). We chose it over Euclidean distance because it's independent of vector magnitude โ a user with 3 skills can be fairly compared to one with 10 skills.
A: Logistic Regression is ideal because: (1) It outputs probabilities (0-100%), perfect for compatibility scoring. (2) It's interpretable โ we can see which features matter most via coefficients. (3) It works well with small, well-engineered feature sets. (4) It uses the sigmoid function: P = 1/(1+e^(-z)), mapping linear outputs to probabilities.
A: Feature Engineering is the process of creating new input variables from raw data that better represent the underlying patterns. We created 4 features: skill_similarity (cosine score), experience_difference (ordinal encoding), domain_match (binary), and interest_match (binary). These structured features help the ML model make better predictions than raw data alone.
A: We use a Three-Tier Architecture: (1) Presentation Layer โ React frontend with dark theme UI. (2) Application Layer โ Flask backend with 5 AI modules. (3) Data Layer โ SQLite database with 3 tables. The frontend communicates with the backend via REST API, and the backend handles all AI processing locally using scikit-learn.
A: We seed the database with 15 sample users with diverse skills, domains, and experience levels. The Logistic Regression model is trained on 500 synthetic samples generated using domain knowledge about hackathon team dynamics. As real users register, the system improves naturally.
A: StandardScaler transforms features to have zero mean and unit variance (z-score normalization). This is essential because our features have different scales: skill_similarity ranges 0-1, while experience_difference ranges 0-2. Without scaling, features with larger values would dominate the model's learning.
A: For our binary compatibility classifier, we would use: Accuracy (overall correctness), Precision (of those predicted compatible, how many actually are), Recall (of actually compatible pairs, how many did we find), and F1-Score (harmonic mean of precision and recall). The training accuracy logged on startup gives initial confidence.
A: It maps each team member's skills to 5 predefined roles (AI/ML, Frontend, Backend, Database, UI/UX) using keyword matching. It then calculates: covered roles (at least one member), missing roles (no members), coverage percentage per role, overall team strength (Strong/Moderate/Developing/Weak), and generates specific improvement suggestions.
A: Keyword matching just checks exact matches. Our system uses TF-IDF which assigns importance weights, cosine similarity which measures semantic closeness of skill profiles, and an ML model that considers multiple factors simultaneously (skills, experience, domain, interests). This produces much more meaningful recommendations.
-
Deep Learning: Replace Logistic Regression with a neural network for more complex compatibility patterns.
-
Collaborative Filtering: Use user interaction data (who worked well together) to improve recommendations.
-
Real-time Chat: Add WebSocket-based messaging for matched teammates.
-
Skill Embedding: Use Word2Vec or BERT embeddings instead of TF-IDF for richer skill representations.
-
User Feedback Loop: Allow users to rate recommendations, creating labeled training data for model improvement.
-
Graph-based Recommendations: Build a skill graph to understand relationships between skills (e.g., "Python" โ "Flask" โ "REST API").
-
Resume Parsing: Auto-extract skills from uploaded resumes using NLP.
-
Deployment: Deploy on AWS/Heroku with a production database (PostgreSQL).
-
Authentication: Add JWT-based user authentication and session management.
-
Analytics Dashboard: Advanced visualizations with D3.js showing skill networks and team dynamics.
This project is developed for academic purposes as part of B.Tech AI & Data Science curriculum.
Built with โค๏ธ using Python, React, and scikit-learn