An autonomous infrastructure health-monitoring and diagnostic system powered by AI Agents and n8n.
- Situation: Modern Data Centers generate millions of telemetry points, leading to extreme alert fatigue. Critical infrastructure failures were often buried under thousands of low-priority logs, delaying response times and increasing operational risks.
- Task: To architect an autonomous system capable of filtering telemetry noise, performing real-time Root Cause Analysis (RCA), and dispatching actionable intelligence to on-site engineers—all while ensuring the environment is portable and scalable.
- Action: I orchestrated a multi-layered AIOps solution containerized with Docker for seamless deployment. I used n8n as the central engine, integrated PostgreSQL for time-series data, and leveraged OpenAI's GPT-4o to serve as an "AI Diagnostic Engineer" that contextually analyzes alerts and generates remediation plans.
- Result: Developed a production-ready framework that reduces initial diagnostic time from minutes to seconds, providing high-fidelity incident reports with 100% automated RCA coverage for critical failures.
The system operates in three distinct layers to ensure reliability and intelligence:
- Ingestion Layer: Real-time metrics (CPU, Temp, Disk, UPS) are streamed into a PostgreSQL database.
- Intelligence Layer (n8n + AI):
- Monitor Workflow: Polls for critical thresholds and triggers the AI Agent.
- AI Diagnostic: GPT-4o receives the full context of the alert and performs a Root Cause Analysis (RCA).
- Action Layer: Results are pushed to an Executive Dashboard and dispatched via high-priority Gmail alerts with actionable remediation steps.
graph TD
A[Telemetry Sources] -->|Streaming| B[(PostgreSQL DB)]
B -->|Check Thresholds| C{n8n Orchestrator}
C -->|Query Context| B
C -->|RCA Request| D[OpenAI GPT-4o Agent]
D -->|Analysis & Steps| C
C -->|Store Incident| B
C -->|Alert| E[Email/Slack Notification]
B -->|Live Feed| F[Streamlit Dashboard]
This is the heart of the system. Every 5 minutes, n8n queries the database for any device reporting a critical status.
- The Prompt: The AI isn't just chatting; it's primed with a specific persona: Senior Data Center Engineer.
- The Output: It generates a structured 3-point report: Root Cause, Business Impact, and a 3-step Remediation Plan.
Once an incident is analyzed, this workflow ensures the right people know immediately.
- Smart Filtering: Only alerts with a severity of 4/5 or higher trigger the emergency email.
- Data Sanitization: Uses custom JavaScript nodes to clean the AI output, formatting it into a beautiful, readable HTML email for mobile and desktop.
-
Configura tu API Key de OpenAI
# Edita el archivo .env OPENAI_API_KEY=sk-tu-clave-aqui -
Inicia el proyecto
./start.sh
-
Accede a n8n: http://localhost:5678 (admin / admin123)
| Servicio | URL | Credenciales |
|---|---|---|
| n8n | http://localhost:5678 | admin / admin123 |
| Dashboard | http://localhost:8501 | - |
| PostgreSQL | localhost:5432 | datacenter_user / datacenter_pass_2024 |
📖 Documentación completa: DEPLOYMENT.md | QUICKSTART.md
- Orchestration: n8n (Low-code workflow automation)
- Infrastructure: Docker (Containerization for portable deployment)
- Artificial Intelligence: OpenAI GPT-4o (LLMs for RCA)
- Database: PostgreSQL (Structured incident logging)
- Frontend: Streamlit (Real-time monitoring UI)
- Scripting: Python 3.11 & JavaScript (Node.js)
- LinkedIn: daniel-garcía-belman-99a298aa
- Portfolio: danieljcvv-portfolio.vercel.app
- Email: danielgb331@outlook.com
Developed by Daniel-jcVv | Powered by n8n, OpenAI & PostgreSQL
Soli Deo Gloria.

