This project integrates a YouTube data scraper with an AI-powered processing agent. The system fetches video data from YouTube using the YouTube Data API and processes it using an AI agent to extract meaningful insights. The backend is built using Node.js, and Python handles the scraping logic.
- Fetch YouTube video data based on keywords.
- Extract details like video URL, title, description, channel, keywords, views, likes, and comments.
- Process scraped data using an AI agent for further insights.
- Expose an easy-to-use API for clients.
youtube-llm-agent/
├── backend/
│ ├── app.js # Main Express app setup
│ ├── youtubeAgent.js # Google Gemini-based AI agent logic
│ ├── geminiConfig.js # Configuration for Google Gemini API
│ ├── routes.js # API route for handling requests
│ ├── package.json # Node.js project metadata
│ ├── .env # Environment variables (e.g., API keys)
│ └── README.md # Documentation for backend
├── scraper/
│ ├── youtube_scraper.py # YouTube Data API scraper logic
│ ├── requirements.txt # Python dependencies
│ ├── .env # Environment variables for scraper
│ └── README.md # Documentation for scraper
├── README.md # Root-level documentation
- Node.js (v18 or above)
- npm (Node Package Manager)
- Python (v3.8 or above)
- pip (Python Package Manager)
git clone <repository-url>
cd project- Install dependencies:
npm install
- Create a
.envfile in the root directory and add the following:PORT=3000 GEMINI_API_KEY=your_google_gemini_api_key
- Navigate to the
scraperdirectory:cd scraper - Install Python dependencies:
pip install -r requirements.txt
- Create a
.envfile inside thescraperfolder and add:YOUTUBE_API_KEY=your_youtube_api_key
node app.jsThis will start the backend server at http://localhost:3000.
- Use Postman, curl, or any HTTP client to send POST requests to the backend.
Example Request:
POST /youtube
Content-Type: application/json
{
"keyword": "machine learning",
"maxResults": 5
}Example Response:
{
"success": true,
"data": [
{
"Video URL": "https://www.youtube.com/watch?v=12345",
"Title": "Introduction to Machine Learning",
"Description": "Learn about ML basics...",
"Channel Title": "Tech Channel",
"Keyword Tags": ["machine learning", "AI"],
"Category ID": "27",
"Published At": "2024-11-25T12:00:00Z",
"Duration": "PT10M5S",
"View Count": "50000",
"Like Count": "1000",
"Comment Count": "150"
}
]
}This file integrates with Google Gemini AI to process the scraped YouTube data.
Main Functionality:
fetchAndProcess(keyword, maxResults):- Calls the Python scraper to fetch raw video data based on the given keyword.
- Sends the scraped data to Google Gemini AI for processing.
- Returns processed data to the backend.
Key Sections:
const axios = require("axios");
class YoutubeAgent {
constructor() {
this.apiKey = process.env.GEMINI_API_KEY;
this.endpoint = "https://gemini-api-endpoint.example.com"; // Replace with the actual endpoint
}
async fetchAndProcess(keyword, maxResults) {
// Call the Python scraper
const scrapedData = await callPythonScraper(keyword, maxResults);
// Process data using Gemini AI
const response = await axios.post(`${this.endpoint}/process`, {
data: scrapedData,
apiKey: this.apiKey
});
return response.data;
}
}
module.exports = YoutubeAgent;This Python script interacts with the YouTube Data API to scrape video data.
Key Sections:
def youtube_api_scrape(keyword, max_results=10):
youtube = build("youtube", "v3", developerKey=YOUTUBE_API_KEY)
all_videos = []
# API Call to search videos
search_request = youtube.search().list(
part="id,snippet",
q=keyword,
type="video",
maxResults=max_results,
)
search_response = search_request.execute()
for item in search_response.get("items", []):
video_id = item["id"]["videoId"]
video_details_request = youtube.videos().list(
part="snippet,contentDetails,statistics",
id=video_id,
)
video_details_response = video_details_request.execute()
# Append video details to the result list
video_data = { ... }
all_videos.append(video_data)
return all_videosRequest:
{
"keyword": "machine learning",
"maxResults": 5
}Response:
{
"success": true,
"data": [...]
}- If the YouTube Data API key is invalid or missing, you will see:
{ "error": "YOUTUBE_API_KEY is not set in the .env file." } - If the Gemini AI endpoint is unavailable, you will see:
{ "error": "Error processing data with Gemini." }
- Add caching for frequently searched keywords.
- Integrate additional AI models for more complex video analysis.
- Develop a frontend interface for end-users.
Developed by Ashish. Feel free to contribute or report issues via GitHub!