Skip to content

diabahmed/sykell-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

34 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Sykell Web Crawler Platform

Go Version Next.js React TypeScript License Docker MySQL

Screenshot 2025-07-16 103733 Screenshot 2025-07-16 103726 Screenshot 2025-07-16 103759 Screenshot 2025-07-16 103815

A comprehensive, full-stack web crawling platform that provides powerful website analysis capabilities through a modern web interface. Built with Go backend and Next.js frontend, this platform offers real-time crawling, detailed analytics, and an exceptional user experience.

⚑ Rapid Development Achievement: This entire full-stack application was built after learning Go in just one day! It showcases the power of modern development tools, clean architecture patterns, and the effectiveness of well-structured frameworks for building robust applications quickly.

🌟 Platform Overview

Sykell is a multi-tenant web crawling platform that combines:

  • Powerful Backend: High-performance Go API with clean architecture
  • Modern Frontend: React-based dashboard with real-time updates
  • Scalable Infrastructure: Docker-containerized deployment ready for production
  • Real-time Features: WebSocket integration for live status updates
  • Comprehensive Analytics: Detailed website analysis and reporting

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Frontend (Next.js)                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Dashboard  β”‚  β”‚  Real-time   β”‚  β”‚   Authentication    β”‚ β”‚
β”‚  β”‚     UI      β”‚  β”‚   Updates    β”‚  β”‚        UI           β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                         HTTP/WebSocket
                              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Backend API (Go)                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚    RESTful  β”‚  β”‚   WebSocket  β”‚  β”‚   Authentication    β”‚ β”‚
β”‚  β”‚     API     β”‚  β”‚     Hub      β”‚  β”‚   & Authorization   β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚   Crawler   β”‚  β”‚   Business   β”‚  β”‚    Data Access      β”‚ β”‚
β”‚  β”‚   Engine    β”‚  β”‚    Logic     β”‚  β”‚      Layer          β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Database (MySQL)                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚    Users    β”‚  β”‚    Crawls    β”‚  β”‚    Audit Logs       β”‚ β”‚
β”‚  β”‚   Tables    β”‚  β”‚   Results    β”‚  β”‚   & Sessions        β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Features

πŸ•·οΈ Web Crawling

  • Comprehensive Analysis: HTML version detection, title extraction, heading structure analysis
  • Link Analysis: Internal vs. external link classification with broken link detection
  • Form Detection: Login form presence identification
  • Performance Metrics: Processing time tracking and optimization insights
  • Real-time Processing: Background job processing with live status updates

πŸ‘¨β€πŸ’» User Experience

  • Multi-tenant System: Complete user registration and authentication
  • Modern Dashboard: Responsive design with dark mode support
  • Real-time Updates: WebSocket integration for live crawl notifications
  • Data Visualization: Interactive tables with advanced filtering and sorting
  • Bulk Operations: Manage multiple crawls efficiently

πŸ› οΈ Technical Excellence

  • Clean Architecture: Domain-driven design with clear separation of concerns
  • Type Safety: Full TypeScript coverage across the frontend
  • Security: JWT authentication with secure session management
  • Performance: Optimized concurrent processing and caching systems
  • Scalability: Docker containerization ready for production deployment

πŸ“¦ Repository Structure

sykell-crawler/
β”œβ”€β”€ πŸ“ client/                       # Next.js Frontend Application
β”‚   β”œβ”€β”€ πŸ“ app/                     # Next.js App Router
β”‚   β”œβ”€β”€ πŸ“ components/              # React Components
β”‚   β”œβ”€β”€ πŸ“ store/                   # State Management (Zustand)
β”‚   β”œβ”€β”€ πŸ“ hooks/                   # Custom React Hooks
β”‚   β”œβ”€β”€ πŸ“ lib/                     # Utility Libraries
β”‚   β”œβ”€β”€ πŸ“ types/                   # TypeScript Definitions
β”‚   β”œβ”€β”€ πŸ“ tests/                   # E2E Tests (Playwright)
β”‚   β”œβ”€β”€ πŸ“„ Dockerfile               # Frontend Container Config
β”‚   └── πŸ“„ README.md                # Frontend Documentation
β”œβ”€β”€ πŸ“ server/                       # Go Backend API
β”‚   β”œβ”€β”€ πŸ“ cmd/api/                 # Application Entry Point
β”‚   β”œβ”€β”€ πŸ“ internal/                # Private Application Code
β”‚   β”‚   β”œβ”€β”€ πŸ“ application/         # Business Logic Services
β”‚   β”‚   β”œβ”€β”€ πŸ“ domain/              # Domain Entities & Interfaces
β”‚   β”‚   β”œβ”€β”€ πŸ“ infrastructure/      # External Integrations
β”‚   β”‚   └── πŸ“ presentation/        # HTTP/WebSocket Handlers
β”‚   β”œβ”€β”€ πŸ“ tests/                   # API Tests & Test Utilities
β”‚   β”œβ”€β”€ πŸ“„ Dockerfile               # Backend Container Config
β”‚   └── πŸ“„ README.md                # Backend Documentation
β”œβ”€β”€ πŸ“„ docker-compose.yml           # Multi-service Orchestration
β”œβ”€β”€ πŸ“„ .env.example                 # Environment Configuration Template
β”œβ”€β”€ πŸ“„ LICENSE                      # MIT License
└── πŸ“„ README.md                    # This File

πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose (Recommended)
  • Go 1.24.2+ (for local development)
  • Node.js 20+ (for local development)
  • MySQL 8.0 (if running locally)

🐳 Docker Deployment (Recommended)

  1. Clone the repository

    git clone https://github.com/diabahmed/sykell-crawler.git
    cd sykell-crawler
  2. Configure environment

    cp .env.example .env

    Edit .env with your configuration:

    # Database Configuration
    DB_PASSWORD=your_secure_password
    DB_NAME=web_crawler_db
    DB_SOURCE="root:your_secure_password@tcp(db:3306)/web_crawler_db?charset=utf8mb4&parseTime=True&loc=Local"
    
    # Frontend Configuration
    NEXT_PUBLIC_API_BASE_URL=http://localhost:8088/api/v1
    NEXT_PUBLIC_WS_BASE_URL=ws://localhost:8088/api/v1/ws
    
    # JWT Configuration
    TOKEN_SYMMETRIC_KEY="your_32_character_secret_key_here"
    ACCESS_TOKEN_DURATION="24h"
  3. Launch the platform

    docker-compose up --build -d
  4. Access the application

πŸ”§ Local Development

For detailed local development instructions, refer to the component-specific READMEs:

πŸ“– Documentation

Component Documentation

API Documentation

πŸ” Security

Authentication & Authorization

  • JWT-based authentication with HTTP-only cookies
  • Multi-tenant user isolation
  • Secure password hashing with bcrypt
  • Session management and automatic logout

API Security

  • Input validation and sanitization
  • CORS configuration
  • Rate limiting capabilities
  • SQL injection prevention via ORM

Infrastructure Security

  • Container security best practices
  • Secure environment variable handling
  • Network isolation with Docker

Environment Configurations

  • Development: Local development with hot reload
  • Staging: Production-like environment for testing
  • Production: Optimized for performance and security

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

For detailed component documentation, please refer to:

About

A robust, scalable, and production-ready web crawler full stack application built with Go and Next.js

Topics

Resources

License

Stars

Watchers

Forks

Contributors