Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

- **Design**. Here, you'll find a lot about DeepFlow's design and get a first look at what's going on behind it.

- **[eBPF-cBPF-Overview.md](./eBPF-cBPF-Overview.md)**. Comprehensive guide explaining how DeepFlow uses eBPF and cBPF technologies for data collection, including architecture diagrams, protocol support, and kernel requirements.

- **FAQ**. Under this directory is a list of some common problems encountered by most users, please read here before you have any questions.

- **Guides**. If this repositories brings you unexpected surprises or help, and you want to leave your mark on it, start with these guides.
Expand Down
7 changes: 7 additions & 0 deletions docs/design/data-flow.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# 1. Data Collection

DeepFlow uses two complementary data collection technologies:

- **cBPF/AF_PACKET (Dispatcher)**: Classic BPF for network packet capture at L3/L4
- **eBPF (EbpfCollector)**: Extended BPF for syscall tracing and L7 protocol parsing

For a detailed explanation of these technologies, see [eBPF and cBPF Overview](../eBPF-cBPF-Overview.md).

## 1.1. Overview

```mermaid
Expand Down
224 changes: 224 additions & 0 deletions docs/eBPF-cBPF-Overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
# eBPF and cBPF in DeepFlow

This document provides a clear overview of how DeepFlow leverages both eBPF (extended Berkeley Packet Filter) and cBPF (classic Berkeley Packet Filter) technologies for comprehensive observability data collection.

## Overview

DeepFlow uses two complementary packet filtering technologies:

| Technology | Purpose | Data Source | Use Case |
|------------|---------|-------------|----------|
| **cBPF** | Network packet capture | AF_PACKET socket | Layer 3/4 flow metrics, network-level observability |
| **eBPF** | System call and application tracing | Kernel probes, tracepoints | Layer 7 protocol parsing, distributed tracing, profiling |

## What is cBPF (Classic BPF)?

cBPF is the original Berkeley Packet Filter, a technology for capturing and filtering network packets at the kernel level.

### How DeepFlow Uses cBPF

DeepFlow's **Dispatcher** component uses cBPF with AF_PACKET sockets to:

1. **Capture Network Traffic**: Intercepts packets from network interfaces
2. **Extract Flow Metadata**: Parses packet headers for IP addresses, ports, protocols
3. **Generate Flow Metrics**: Calculates throughput, latency, connection statistics
4. **Feed the Pipeline**: Sends captured packets to FlowGenerator for processing

```
┌─────────────────────────────────────────────────────────┐
│ Linux Kernel │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Network Stack │ │
│ │ ┌───────────────────────────────────────┐ │ │
│ │ │ cBPF Filter (AF_PACKET socket) │ │ │
│ │ │ - Packet capture │ │ │
│ │ │ - Header parsing │ │ │
│ │ │ - Traffic filtering │ │ │
│ │ └───────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────┐
│ Dispatcher │
│ (deepflow-agent) │
└─────────────────────────┘
```

### cBPF Capabilities

- **Zero Application Changes**: Works without modifying applications
- **Low Overhead**: Efficient kernel-level filtering
- **Protocol Agnostic**: Captures any network protocol
- **Full Packet Access**: Can inspect entire packet contents

## What is eBPF (Extended BPF)?

eBPF is a modern, programmable extension to BPF that allows running sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules.

### How DeepFlow Uses eBPF

DeepFlow's **EbpfCollector** component uses eBPF to:

1. **Trace System Calls**: Hooks into read/write syscalls to capture application data
2. **Parse L7 Protocols**: Identifies and parses HTTP, gRPC, MySQL, Redis, Kafka, etc.
3. **Distributed Tracing**: Correlates requests across services without code instrumentation
4. **Continuous Profiling**: Captures CPU, memory, and off-CPU profiling data

```
┌─────────────────────────────────────────────────────────┐
│ Linux Kernel │
│ ┌─────────────────────────────────────────────────┐ │
│ │ eBPF Subsystem │ │
│ │ ┌───────────────────────────────────────┐ │ │
│ │ │ Kprobes/Tracepoints │ │ │
│ │ │ - syscall entry/exit hooks │ │ │
│ │ │ - process lifecycle events │ │ │
│ │ └───────────────────────────────────────┘ │ │
│ │ ┌───────────────────────────────────────┐ │ │
│ │ │ Uprobes │ │ │
│ │ │ - TLS/SSL interception │ │ │
│ │ │ - Go runtime hooks │ │ │
│ │ │ - HTTP2/gRPC parsing │ │ │
│ │ └───────────────────────────────────────┘ │ │
│ │ ┌───────────────────────────────────────┐ │ │
│ │ │ Perf Events │ │ │
│ │ │ - CPU profiling │ │ │
│ │ │ - Stack trace collection │ │ │
│ │ └───────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────┐
│ EbpfCollector │
│ (deepflow-agent) │
└─────────────────────────┘
```

### eBPF Probe Types Used by DeepFlow

| Probe Type | Overhead (ns) | Use Case |
|------------|---------------|----------|
| Kprobe | ~76 | Kernel function entry |
| Kretprobe | ~212 | Kernel function return |
| Tracepoint (entry) | ~96 | Stable kernel event hooks |
| Tracepoint (exit) | ~93 | Stable kernel event hooks |
| Uprobe | ~1287 | User-space function entry |
| Uretprobe | ~1931 | User-space function return |

### Supported Protocols via eBPF

DeepFlow's eBPF probes automatically detect and parse:

- **HTTP/HTTPS**: HTTP/1.x, HTTP/2
- **RPC**: gRPC, Dubbo, SOFARPC
- **Databases**: MySQL, PostgreSQL, Redis, MongoDB, Oracle
- **Messaging**: Kafka, MQTT, RocketMQ
- **Infrastructure**: DNS, FastCGI
- **Encrypted Traffic**: TLS handshake analysis

## Data Flow Architecture

The following diagram shows how cBPF and eBPF data flows through DeepFlow:

```
┌─────────────────────────────────────────────────────────────────────┐
│ Linux Kernel │
├─────────────────────────────┬───────────────────────────────────────┤
│ cBPF/AF_PACKET │ eBPF │
│ │ │
│ • Packet capture │ • Syscall tracing (read/write) │
│ • L3/L4 header parsing │ • L7 protocol inference │
│ • Flow identification │ • TLS/SSL decryption hooks │
│ │ • Process lifecycle tracking │
│ │ • CPU/Memory profiling │
└──────────────┬──────────────┴──────────────────┬────────────────────┘
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────────────┐
│ Dispatcher │ │ EbpfCollector │
│ │ │ │
│ Generates: │ │ Generates: │
│ • MetaPacket │ │ • MetaPacket │
│ • L4 Flow data │ │ • L7 Flow data │
│ │ │ • Process events │
│ │ │ • Profiling data │
└──────────────┬───────────┘ └─────────────────┬─────────────────┘
│ │
└─────────────────┬───────────────────┘
┌─────────────────────────────────────┐
│ FlowGenerator │
│ │
│ Aggregates and correlates: │
│ • L4 flows from Dispatcher │
│ • L7 flows from EbpfCollector │
│ • Creates unified flow view │
└─────────────────┬───────────────────┘
┌─────────────────────────────────────┐
│ deepflow-server │
│ │
│ Stores in ClickHouse: │
│ • flow_metrics │
│ • flow_log (L4FlowLog, L7FlowLog) │
│ • profile │
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some flowcharts in this doc that are not properly aligned.

└─────────────────────────────────────┘
```

## Comparison: When Each Technology is Used

| Aspect | cBPF (Dispatcher) | eBPF (EbpfCollector) |
|--------|-------------------|----------------------|
| **Data Source** | Network packets | System calls, process events |
| **Protocol Layer** | L3/L4 (IP, TCP, UDP) | L7 (HTTP, gRPC, SQL, etc.) |
| **Visibility** | Network flows between hosts | Application request/response |
| **Encrypted Traffic** | Sees encrypted packets | Can decrypt via TLS hooks |
| **Kernel Version** | Works on all kernels | Requires Linux 4.14+ |
| **Performance Impact** | Very low | Low (< 1% CPU typically) |
| **Code Changes** | None required | None required |

## Key Differences

### cBPF Strengths
- **Universal compatibility**: Works on any Linux kernel
- **Network-centric view**: Ideal for flow-level metrics
- **Simple and efficient**: Low overhead packet filtering

### eBPF Strengths
- **Application awareness**: Understands L7 protocols
- **Distributed tracing**: Correlates requests across services
- **Encrypted traffic**: Can access data before/after encryption
- **Profiling**: CPU and memory profiling without agents

## Kernel Requirements

### For cBPF (Dispatcher)
- Any Linux kernel with AF_PACKET socket support
- No special kernel configuration required

### For eBPF (EbpfCollector)
- **Minimum**: Linux 4.14+
- **Recommended**: Linux 5.x+ for full feature support
- **Required kernel options**:
```
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_KPROBES=y
CONFIG_UPROBES=y
CONFIG_UPROBE_EVENTS=y
```

For detailed kernel version compatibility, see [kernel-versions.md](../agent/src/ebpf/docs/kernel-versions.md).

## Further Reading

- [eBPF Implementation Details](../agent/src/ebpf/README.md)
- [Probes and Maps Reference](../agent/src/ebpf/docs/probes-and-maps.md)
- [Data Flow Architecture](./design/data-flow.md)
- [Official eBPF Documentation](https://ebpf.io/)