| ๐ Content | ๐ Quick Link |
|---|---|
| Introduction to AI Agents | ๐ Explore |
| Building LLMs for Production | ๐ Explore |
| Building High-Performance, Private AI Infrastructure for the Enterprise | ๐ Explore |
| Mastering the Model Context Protocol (MCP) | ๐ Explore |
| Agent Memory Part I (A Survey of Memory) | ๐ Explore |
| Agent Memory Part II (Building Memory Modules for Agentic AI Systems) | ๐ Explore |
| Agent Evaluation (Eval) Engineering | ๐ Explore |
๐ฅ Download High-Resolution Mind Map (.jpg)
๐ Click here to unfold the full Mind Map (agents-architecture-operations-and-evolution-mindmap.jpg)
(็นๅปๅฑๅผๅฎๆดๆ็ปดๅฏผๅพ)
๐ก Tip: Press
Ctrl+Click(or Command + Click) to open in a new tab.
๐ฅ View the "Introduction to AI Agents" Slides (PDF)
๐ฅ Download PDF (Direct Link)
๐ View the AI Agent Project in the LLMs-Lab repository on the Eric-LLMs GitHub profile.
To bridge theory with practice, I developed a modular AI Agent project that implements autonomous reasoning and task execution:
- Architecture: Utilizes a decoupled structure with dedicated directories for
Agentlogic,Tools,Utils, andPrompts. - Reasoning Loop: Features an
AutoGPT.pyimplementation using ReAct (Reasoning and Acting) logic to handle complex, multi-step goal decomposition. - Functional Tools: Includes custom tools for deep data analysis (Excel processing via Pandas), automated communication via email, PDF-based QA interrogation (FileQATool), requirements-driven document generation (WriterTool), and dynamic script-based auditing of structured files using custom heuristics and thresholds (PythonTool).
- End-to-End Workflow: Supports real-world scenarios, such as identifying underperforming suppliers from sales records and autonomously drafting/sending notifications.
โฌ๏ธ Back to Top : Table of Contents
This guide covers LLM production, from Transformer architectures to advanced techniques like RAG and Fine-Tuning. It explores frameworks like LangChain, methods to mitigate hallucinations, and optimization via quantization. Learn to build autonomous agents for real-world use.
๐ฅ Download High-Resolution Mind Map (.jpg)
๐ Click here to unfold the full Mind Map (building-llms-for-production-mindmap.jpg)
(็นๅปๅฑๅผๅฎๆดๆ็ปดๅฏผๅพ)
๐ก Tip: Press
Ctrl+Click(or Command + Click) to open in a new tab.
๐ฅ View the "Building LLMs for Production" Slides (PDF)
๐ฅ Download PDF (Direct Link)
๐ Explore Practical LLM Implementations in the LLMs-Lab repository on the Eric-LLMs GitHub profile.
The production-grade principles discussed in this bookโincluding Fine-Tuning, RAG optimization, LangChain, Prompt Engineering, Function-Calling, Agent, etc.โhave each been researched as a standalone module, and each module features multiple project implementations.
โฌ๏ธ Back to Top : Table of Contents
๐ฅ Download High-Resolution Mind Map (.jpg)
๐ก Tip: Press
Ctrl+Click(or Command + Click) to open in a new tab.
๐ฅ View the "Building High-Performance, Private AI Infrastructure for the Enterprise" Slides (PDF)
๐ฅ Download PDF (Direct Link)
๐ doning ....
โฌ๏ธ Back to Top : Table of Contents
๐ฅ Download High-Resolution Mind Map (.jpg)
๐ Click here to unfold the full Mind Map (mastering-the-model-context-protocol-mindmap.jpg)
(็นๅปๅฑๅผๅฎๆดๆ็ปดๅฏผๅพ)
๐ก Tip: Press
Ctrl+Click(or Command + Click) to open in a new tab.
๐ฅ View the "Mastering the Model Context Protocol (MCP)" Slides (PDF)
๐ฅ Download PDF (Direct Link)
๐ Explore Model Context Protocol (MCP) Projects on GitHub A curated collection of industry-standard Model Context Protocol (MCP) server implementations.
โฌ๏ธ Back to Top : Table of Contents
๐ฅ Download High-Resolution Mind Map (.jpg)
๐ Click here to unfold the full Mind Map (unforgettable_agents_architecting_ai_memory-mindmap.jpg)
(็นๅปๅฑๅผๅฎๆดๆ็ปดๅฏผๅพ)
๐ก Tip: Press
Ctrl+Click(or Command + Click) to open in a new tab.
๐ฅ View the "A Blueprint for Memory in Agentic Intelligence" Slides (PDF)
๐ฅ Download PDF (Direct Link)
๐ก Tip: Press
Ctrl+Click(or Command + Click) to open in a new tab.
๐ฅ View the "Unforgettable Agents Architecting AI Memory" Slides (PDF)
๐ฅ Download PDF (Direct Link)
For a comprehensive list of papers related to Agent Memory, we highly recommend checking out:
๐ * Agent-Memory-Paper-List by Shichun-Liu.
โฌ๏ธ Back to Top : Table of Contents
A comprehensive guide on designing memory systems for AI Agents. This document synthesizes academic surveys with practical implementation strategies, covering: * Theory: Taxonomy of agent memory (Forms, Functions, Dynamics). * Frameworks: Deep dive into Mem0, Letta (MemGPT), and LangMem. * Practice: Enterprise-grade solutions using Amazon Bedrock AgentCore
๐ฅ Download High-Resolution Mind Map (mindmap.png)
๐ก Tip: Press
Ctrl+Click(or Command + Click) to open in a new tab.
๐ฅ View Slides (PDF)
๐ฅ Download PDF (Direct Link)
The following frameworks and repositories are discussed in this guide, representing the current state-of-the-art in Agentic Memory:
- Mem0: A dual-layer memory framework supporting working, factual, and semantic memory types for agent state persistence.
- Letta (MemGPT): Manages infinite context by treating agents like an OS with virtual memory and recursive summarization.
- LangMem: A LangChain library that implements Semantic, Episodic, and Procedural memory integration for LangGraph agents.
- Amazon Bedrock Samples Comprehensive collection of examples for using Amazon Bedrock, including various implementations of Agentic workflows and memory patterns.
โฌ๏ธ Back to Top : Table of Contents
"In the age of Agents, your product is only as good as your ability to measure it."
Evaluating AI Agents requires a fundamental shift from simple output checks ("vibe checks") to analyzing multi-step trajectories, environment changes, and tool usage. This repository consolidates frameworks and engineering practices for moving from intuition to instrumentation.
It synthesizes industry standards from Anthropic, LangChain, and real-world engineering practices to build a robust Evaluation Harness.
- The Intuition Trap: Why manual "vibe checks" fail as complexity scales.
- The Harness: Building a standardized environment for agent execution composed of Inputs, Tasks, and Graders.
- Trajectory vs. Outcome: Evaluating the journey (reasoning logs, tool calls) rather than just the destination (final answer).
- Reliability Metrics:
- Pass@k (Creativity): Can the agent succeed at least once in k tries? (Good for brainstorming).
- Pass^k (Reliability): Can the agent succeed every single time in k tries? (Critical for autonomous agents).
- Swiss Cheese Model: Layering defenses (Automated Evals โ Human Review โ Production Monitoring) to ensure reliability.
๐ฅ Download High-Resolution Mind Map (mindmap.png)
A Comprehensive Guide to Evaluating AI Agents Focuses on the engineering framework for testing, including the "Clean Room" methodology, reliability metrics (Pass@k), and the "Harness" architecture. It treats evaluation as a core development practice.
๐ก Tip: Press
Ctrl+Click(or Command + Click) to open in a new tab.
๐ฅ View Slides (PDF)
๐ฅ Download PDF (Direct Link)
Implementing a robust evaluation pipeline requires specific infrastructure. The following tools are referenced and utilized in this framework:
| Tool | Category | Key Features |
|---|---|---|
| LangSmith | Tracing & Debugging | Full trajectory tracing, runnableConfig tagging for A/B testing, and dataset management. |
| LangFuse | Observability | Open-source alternative for observability, prompt management, and lightweight evaluation. |
| DeepEval | Unit Testing | "Pytest for LLMs". Specific metrics for RAG (Hallucination, Answer Relevancy) and Agents. |
| OpenEvals | Graders | A library of pre-built "LLM-as-a-judge" prompts (Conciseness, Correctness, Coherence) compatible with LangSmith. |
To balance cost and performance, we implement a Hybrid Agent Architecture:
- Reactive Layer (System 1): Handles simple, direct queries (e.g., "What is the stock price?") with low latency.
- Deliberative Layer (System 2): Activated for complex planning or multi-step reasoning tasks.
- Coordination Layer: A router that classifies intent and dispatches tasks.
To prevent "cheating" through shared state, every evaluation trial runs in a fresh container/sandbox.
- Isolation: Fresh container for every trial.
- Mocking: Simulate external APIs to control latency and deterministic outputs.
- Cleanup: Aggressive state teardown (no shared history).






