YouTube Data Scraper and AI Integration

This project integrates a YouTube data scraper with an AI-powered processing agent. The system fetches video data from YouTube using the YouTube Data API and processes it using an AI agent to extract meaningful insights. The backend is built using Node.js, and Python handles the scraping logic.

Features

Fetch YouTube video data based on keywords.
Extract details like video URL, title, description, channel, keywords, views, likes, and comments.
Process scraped data using an AI agent for further insights.
Expose an easy-to-use API for clients.

Project Structure

youtube-llm-agent/
├── backend/                        
│   ├── app.js                   # Main Express app setup
│   ├── youtubeAgent.js          # Google Gemini-based AI agent logic
│   ├── geminiConfig.js          # Configuration for Google Gemini API
│   ├── routes.js                # API route for handling requests
│   ├── package.json             # Node.js project metadata
│   ├── .env                     # Environment variables (e.g., API keys)
│   └── README.md                # Documentation for backend
├── scraper/                         
│   ├── youtube_scraper.py       # YouTube Data API scraper logic
│   ├── requirements.txt         # Python dependencies
│   ├── .env                     # Environment variables for scraper
│   └── README.md                # Documentation for scraper
├── README.md                    # Root-level documentation

Prerequisites

Backend

Node.js (v18 or above)
npm (Node Package Manager)

Scraper

Python (v3.8 or above)
pip (Python Package Manager)

Installation

Clone the Repository

git clone <repository-url>
cd project

Backend Setup

Install dependencies:
```
npm install
```
Create a .env file in the root directory and add the following:
```
PORT=3000
GEMINI_API_KEY=your_google_gemini_api_key
```

Scraper Setup

Navigate to the scraper directory:
```
cd scraper
```
Install Python dependencies:
```
pip install -r requirements.txt
```
Create a .env file inside the scraper folder and add:
```
YOUTUBE_API_KEY=your_youtube_api_key
```

How to Run the Project

Step 1: Start the Backend

node app.js

This will start the backend server at http://localhost:3000.

Step 2: Make API Calls

Use Postman, curl, or any HTTP client to send POST requests to the backend.

Example Request:

POST /youtube
Content-Type: application/json

{
  "keyword": "machine learning",
  "maxResults": 5
}

Example Response:

{
  "success": true,
  "data": [
    {
      "Video URL": "https://www.youtube.com/watch?v=12345",
      "Title": "Introduction to Machine Learning",
      "Description": "Learn about ML basics...",
      "Channel Title": "Tech Channel",
      "Keyword Tags": ["machine learning", "AI"],
      "Category ID": "27",
      "Published At": "2024-11-25T12:00:00Z",
      "Duration": "PT10M5S",
      "View Count": "50000",
      "Like Count": "1000",
      "Comment Count": "150"
    }
  ]
}

How It Works

1. youtubeAgent.js

This file integrates with Google Gemini AI to process the scraped YouTube data.

Main Functionality:

fetchAndProcess(keyword, maxResults):
- Calls the Python scraper to fetch raw video data based on the given keyword.
- Sends the scraped data to Google Gemini AI for processing.
- Returns processed data to the backend.

Key Sections:

const axios = require("axios");

class YoutubeAgent {
  constructor() {
    this.apiKey = process.env.GEMINI_API_KEY;
    this.endpoint = "https://gemini-api-endpoint.example.com"; // Replace with the actual endpoint
  }

  async fetchAndProcess(keyword, maxResults) {
    // Call the Python scraper
    const scrapedData = await callPythonScraper(keyword, maxResults);

    // Process data using Gemini AI
    const response = await axios.post(`${this.endpoint}/process`, {
      data: scrapedData,
      apiKey: this.apiKey
    });

    return response.data;
  }
}

module.exports = YoutubeAgent;

2. youtube_scraper.py

This Python script interacts with the YouTube Data API to scrape video data.

Key Sections:

def youtube_api_scrape(keyword, max_results=10):
    youtube = build("youtube", "v3", developerKey=YOUTUBE_API_KEY)
    all_videos = []

    # API Call to search videos
    search_request = youtube.search().list(
        part="id,snippet",
        q=keyword,
        type="video",
        maxResults=max_results,
    )
    search_response = search_request.execute()

    for item in search_response.get("items", []):
        video_id = item["id"]["videoId"]
        video_details_request = youtube.videos().list(
            part="snippet,contentDetails,statistics",
            id=video_id,
        )
        video_details_response = video_details_request.execute()

        # Append video details to the result list
        video_data = { ... }
        all_videos.append(video_data)

    return all_videos

API Endpoints

POST `/youtube`

Request:

{
  "keyword": "machine learning",
  "maxResults": 5
}

Response:

{
  "success": true,
  "data": [...]
}

Error Handling

If the YouTube Data API key is invalid or missing, you will see:

{
  "error": "YOUTUBE_API_KEY is not set in the .env file."
}

If the Gemini AI endpoint is unavailable, you will see:

{
  "error": "Error processing data with Gemini."
}

Future Enhancements

Add caching for frequently searched keywords.
Integrate additional AI models for more complex video analysis.
Develop a frontend interface for end-users.

Author

Developed by Ashish. Feel free to contribute or report issues via GitHub!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YouTube Data Scraper and AI Integration

Features

Project Structure

Prerequisites

Backend

Scraper

Installation

Clone the Repository

Backend Setup

Scraper Setup

How to Run the Project

Step 1: Start the Backend

Step 2: Make API Calls

How It Works

1. youtubeAgent.js

2. youtube_scraper.py

API Endpoints

POST `/youtube`

Error Handling

Future Enhancements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
backend		backend
scraper		scraper
README.md		README.md

agusain2001/YoutubeAgent

Folders and files

Latest commit

History

Repository files navigation

YouTube Data Scraper and AI Integration

Features

Project Structure

Prerequisites

Backend

Scraper

Installation

Clone the Repository

Backend Setup

Scraper Setup

How to Run the Project

Step 1: Start the Backend

Step 2: Make API Calls

How It Works

1. youtubeAgent.js

2. youtube_scraper.py

API Endpoints

POST /youtube

Error Handling

Future Enhancements

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

POST `/youtube`

Packages