Skip to content

The Full-Stack LLM Engineering Playbook. Architectural patterns for Agents (MCP) & RAG, coupled with advanced Post-Training recipes (SFT, DPO, QLoRA) for domain adaptation. Covers Data Pipelines, Evaluation Frameworks, and System Design.

Notifications You must be signed in to change notification settings

Eric-LLMs/Awesome-AI-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

63 Commits
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“˜ Awesome AI Engineering

The Full-Stack LLM Engineering Playbook.

๐Ÿ“‘ Table of Contents

๐Ÿ“š Content ๐Ÿ”— Quick Link
Introduction to AI Agents ๐Ÿ” Explore
Building LLMs for Production ๐Ÿ” Explore
Building High-Performance, Private AI Infrastructure for the Enterprise ๐Ÿ” Explore
Mastering the Model Context Protocol (MCP) ๐Ÿ” Explore
Agent Memory Part I (A Survey of Memory) ๐Ÿ” Explore
Agent Memory Part II (Building Memory Modules for Agentic AI Systems) ๐Ÿ” Explore
Agent Evaluation (Eval) Engineering ๐Ÿ” Explore


๐Ÿ“š Introduction to AI Agents

๐Ÿ”‘ Key Concepts

๐Ÿง  Mind Map (Key Concepts)

๐Ÿ“ฅ Download High-Resolution Mind Map (.jpg)

๐Ÿ” Click here to unfold the full Mind Map (agents-architecture-operations-and-evolution-mindmap.jpg)
(็‚นๅ‡ปๅฑ•ๅผ€ๅฎŒๆ•ดๆ€็ปดๅฏผๅ›พ)

Introduction to AI Agents Mindmap

๐Ÿ“‘ Presentation Slides

๐Ÿ’ก Tip: Press Ctrl + Click (or Command + Click) to open in a new tab.
๐Ÿ“ฅ View the "Introduction to AI Agents" Slides (PDF)
๐Ÿ“ฅ Download PDF (Direct Link)

๐Ÿš€ Practical Implementation: Task-Oriented AI Agent

๐Ÿ‘‰ View the AI Agent Project in the LLMs-Lab repository on the Eric-LLMs GitHub profile.

To bridge theory with practice, I developed a modular AI Agent project that implements autonomous reasoning and task execution:

  • Architecture: Utilizes a decoupled structure with dedicated directories for Agent logic, Tools, Utils, and Prompts.
  • Reasoning Loop: Features an AutoGPT.py implementation using ReAct (Reasoning and Acting) logic to handle complex, multi-step goal decomposition.
  • Functional Tools: Includes custom tools for deep data analysis (Excel processing via Pandas), automated communication via email, PDF-based QA interrogation (FileQATool), requirements-driven document generation (WriterTool), and dynamic script-based auditing of structured files using custom heuristics and thresholds (PythonTool).
  • End-to-End Workflow: Supports real-world scenarios, such as identifying underperforming suppliers from sales records and autonomously drafting/sending notifications.

โฌ†๏ธ Back to Top : Table of Contents



๐Ÿ“š Building LLMs for Production

This guide covers LLM production, from Transformer architectures to advanced techniques like RAG and Fine-Tuning. It explores frameworks like LangChain, methods to mitigate hallucinations, and optimization via quantization. Learn to build autonomous agents for real-world use.

๐Ÿ”‘ Key Concepts

๐Ÿง  Mind Map (Key Concepts)

๐Ÿ“ฅ Download High-Resolution Mind Map (.jpg)

๐Ÿ” Click here to unfold the full Mind Map (building-llms-for-production-mindmap.jpg)
(็‚นๅ‡ปๅฑ•ๅผ€ๅฎŒๆ•ดๆ€็ปดๅฏผๅ›พ)

Building LLMs for Production Mind Map

๐Ÿ“‘ Presentation Slides

๐Ÿ’ก Tip: Press Ctrl + Click (or Command + Click) to open in a new tab.
๐Ÿ“ฅ View the "Building LLMs for Production" Slides (PDF)
๐Ÿ“ฅ Download PDF (Direct Link)

๐Ÿ› ๏ธ Hands-on Lab & Examples

๐Ÿ‘‰ Explore Practical LLM Implementations in the LLMs-Lab repository on the Eric-LLMs GitHub profile.

The production-grade principles discussed in this bookโ€”including Fine-Tuning, RAG optimization, LangChain, Prompt Engineering, Function-Calling, Agent, etc.โ€”have each been researched as a standalone module, and each module features multiple project implementations.

โฌ†๏ธ Back to Top : Table of Contents



๐Ÿ“š Building High-Performance, Private AI Infrastructure for the Enterprise

๐Ÿ”‘ Key Concepts

๐Ÿง  Mind Map (Key Concepts)

๐Ÿ“ฅ Download High-Resolution Mind Map (.jpg)

๐Ÿ” Click here to unfold the full Mind Map (mindmap.jpg)
(็‚นๅ‡ปๅฑ•ๅผ€ๅฎŒๆ•ดๆ€็ปดๅฏผๅ›พ)

Building High-Performance, Private AI Infrastructure for the Enterprise

๐Ÿ“‘ Presentation Slides

๐Ÿ’ก Tip: Press Ctrl + Click (or Command + Click) to open in a new tab.
๐Ÿ“ฅ View the "Building High-Performance, Private AI Infrastructure for the Enterprise" Slides (PDF)
๐Ÿ“ฅ Download PDF (Direct Link)

๐Ÿ› ๏ธ Hands-on Projects and Examples

๐Ÿ‘‰ doning ....

โฌ†๏ธ Back to Top : Table of Contents



๐Ÿ“š Mastering the Model Context Protocol (MCP)

๐Ÿ”‘ Key Concepts

๐Ÿง  Mind Map (Key Concepts)

๐Ÿ“ฅ Download High-Resolution Mind Map (.jpg)

๐Ÿ” Click here to unfold the full Mind Map (mastering-the-model-context-protocol-mindmap.jpg)
(็‚นๅ‡ปๅฑ•ๅผ€ๅฎŒๆ•ดๆ€็ปดๅฏผๅ›พ)

Mastering the Model Context Protocol (MCP)

๐Ÿ“‘ Presentation Slides

๐Ÿ’ก Tip: Press Ctrl + Click (or Command + Click) to open in a new tab.
๐Ÿ“ฅ View the "Mastering the Model Context Protocol (MCP)" Slides (PDF)
๐Ÿ“ฅ Download PDF (Direct Link)

๐Ÿ› ๏ธ Hands-on Projects and Examples

๐Ÿ‘‰ Explore Model Context Protocol (MCP) Projects on GitHub A curated collection of industry-standard Model Context Protocol (MCP) server implementations.

โฌ†๏ธ Back to Top : Table of Contents



๐Ÿ“š Agent Memory Part I

๐Ÿ”‘ Key Concepts

๐Ÿง  Mind Map (Key Concepts)

๐Ÿ“ฅ Download High-Resolution Mind Map (.jpg)

๐Ÿ” Click here to unfold the full Mind Map (unforgettable_agents_architecting_ai_memory-mindmap.jpg)
(็‚นๅ‡ปๅฑ•ๅผ€ๅฎŒๆ•ดๆ€็ปดๅฏผๅ›พ)

Unforgettable Agents Architecting AI Memory

๐Ÿ“‘ Presentation Slides

A Blueprint for Memory in Agentic Intelligence

๐Ÿ’ก Tip: Press Ctrl + Click (or Command + Click) to open in a new tab.
๐Ÿ“ฅ View the "A Blueprint for Memory in Agentic Intelligence" Slides (PDF)
๐Ÿ“ฅ Download PDF (Direct Link)

Unforgettable Agents Architecting AI Memory

๐Ÿ’ก Tip: Press Ctrl + Click (or Command + Click) to open in a new tab.
๐Ÿ“ฅ View the "Unforgettable Agents Architecting AI Memory" Slides (PDF)
๐Ÿ“ฅ Download PDF (Direct Link)

๐Ÿ“‘ Further Reading / Resources

For a comprehensive list of papers related to Agent Memory, we highly recommend checking out:
๐Ÿ‘‰ * Agent-Memory-Paper-List by Shichun-Liu.

โฌ†๏ธ Back to Top : Table of Contents



๐Ÿ“š Building Memory Modules for Agentic AI Systems

A comprehensive guide on designing memory systems for AI Agents. This document synthesizes academic surveys with practical implementation strategies, covering: * Theory: Taxonomy of agent memory (Forms, Functions, Dynamics). * Frameworks: Deep dive into Mem0, Letta (MemGPT), and LangMem. * Practice: Enterprise-grade solutions using Amazon Bedrock AgentCore

๐Ÿ”‘ Key Concepts

๐Ÿง  Mind Map (Key Concepts)

๐Ÿ“ฅ Download High-Resolution Mind Map (mindmap.png)

๐Ÿ” Click here to unfold the full Mind Map
(็‚นๅ‡ปๅฑ•ๅผ€ๅฎŒๆ•ดๆ€็ปดๅฏผๅ›พ)

memory solution in production

๐Ÿ“‘ Presentation Slides

Building Memory for Agentic AI: Theory, Frameworks, and Practice

๐Ÿ’ก Tip: Press Ctrl + Click (or Command + Click) to open in a new tab.
๐Ÿ“ฅ View Slides (PDF)
๐Ÿ“ฅ Download PDF (Direct Link)

๐Ÿ“‘ Key Frameworks & Code Samples

The following frameworks and repositories are discussed in this guide, representing the current state-of-the-art in Agentic Memory:

  • Mem0: A dual-layer memory framework supporting working, factual, and semantic memory types for agent state persistence.
  • Letta (MemGPT): Manages infinite context by treating agents like an OS with virtual memory and recursive summarization.
  • LangMem: A LangChain library that implements Semantic, Episodic, and Procedural memory integration for LangGraph agents.
  • Amazon Bedrock Samples Comprehensive collection of examples for using Amazon Bedrock, including various implementations of Agentic workflows and memory patterns.

โฌ†๏ธ Back to Top : Table of Contents



๐Ÿ“š Agent Evaluation (Eval) Engineering

"In the age of Agents, your product is only as good as your ability to measure it."

Evaluating AI Agents requires a fundamental shift from simple output checks ("vibe checks") to analyzing multi-step trajectories, environment changes, and tool usage. This repository consolidates frameworks and engineering practices for moving from intuition to instrumentation.

It synthesizes industry standards from Anthropic, LangChain, and real-world engineering practices to build a robust Evaluation Harness.


๐Ÿ”‘ Key Concepts

  • The Intuition Trap: Why manual "vibe checks" fail as complexity scales.
  • The Harness: Building a standardized environment for agent execution composed of Inputs, Tasks, and Graders.
  • Trajectory vs. Outcome: Evaluating the journey (reasoning logs, tool calls) rather than just the destination (final answer).
  • Reliability Metrics:
    • Pass@k (Creativity): Can the agent succeed at least once in k tries? (Good for brainstorming).
    • Pass^k (Reliability): Can the agent succeed every single time in k tries? (Critical for autonomous agents).
  • Swiss Cheese Model: Layering defenses (Automated Evals โ†’ Human Review โ†’ Production Monitoring) to ensure reliability.

๐Ÿง  Mind Map (Framework Overview)

๐Ÿ“ฅ Download High-Resolution Mind Map (mindmap.png)

๐Ÿ” Click here to unfold the full Mind Map
(็‚นๅ‡ปๅฑ•ๅผ€ๅฎŒๆ•ดๆ€็ปดๅฏผๅ›พ)

Agent Evaluation Framework


๐Ÿ“‘ Presentation Slides

A Comprehensive Guide to Evaluating AI Agents Focuses on the engineering framework for testing, including the "Clean Room" methodology, reliability metrics (Pass@k), and the "Harness" architecture. It treats evaluation as a core development practice.

๐Ÿ’ก Tip: Press Ctrl + Click (or Command + Click) to open in a new tab.
๐Ÿ“ฅ View Slides (PDF)
๐Ÿ“ฅ Download PDF (Direct Link)


๐Ÿ› ๏ธ Key Frameworks & Code Samples

1. The Tooling Stack (Ecosystem)

Implementing a robust evaluation pipeline requires specific infrastructure. The following tools are referenced and utilized in this framework:

Tool Category Key Features
LangSmith Tracing & Debugging Full trajectory tracing, runnableConfig tagging for A/B testing, and dataset management.
LangFuse Observability Open-source alternative for observability, prompt management, and lightweight evaluation.
DeepEval Unit Testing "Pytest for LLMs". Specific metrics for RAG (Hallucination, Answer Relevancy) and Agents.
OpenEvals Graders A library of pre-built "LLM-as-a-judge" prompts (Conciseness, Correctness, Coherence) compatible with LangSmith.

2. Architecture: Hybrid Agent (Fast vs. Slow)

To balance cost and performance, we implement a Hybrid Agent Architecture:

  • Reactive Layer (System 1): Handles simple, direct queries (e.g., "What is the stock price?") with low latency.
  • Deliberative Layer (System 2): Activated for complex planning or multi-step reasoning tasks.
  • Coordination Layer: A router that classifies intent and dispatches tasks.

3. Evaluation Strategy: The "Clean Room"

To prevent "cheating" through shared state, every evaluation trial runs in a fresh container/sandbox.

  • Isolation: Fresh container for every trial.
  • Mocking: Simulate external APIs to control latency and deterministic outputs.
  • Cleanup: Aggressive state teardown (no shared history).

โฌ†๏ธ Back to Top : Table of Contents

About

The Full-Stack LLM Engineering Playbook. Architectural patterns for Agents (MCP) & RAG, coupled with advanced Post-Training recipes (SFT, DPO, QLoRA) for domain adaptation. Covers Data Pipelines, Evaluation Frameworks, and System Design.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published