This document provides comprehensive configuration guidance for HugeGraph PD, including parameter descriptions, deployment scenarios, and production tuning recommendations.
- Configuration File Overview
- Core Configuration Parameters
- Deployment Scenarios
- Production Tuning
- Logging Configuration
- Monitoring and Metrics
PD uses the following configuration files (located in conf/ directory):
| File | Purpose |
|---|---|
application.yml |
Main PD configuration (gRPC, Raft, storage, etc.) |
log4j2.xml |
Logging configuration (log levels, appenders, rotation) |
verify-license.json |
License verification configuration (optional) |
application.yml
├── spring # Spring Boot framework settings
├── management # Actuator endpoints and metrics
├── logging # Log configuration file location
├── license # License verification (optional)
├── grpc # gRPC server settings
├── server # REST API server settings
├── pd # PD-specific settings
├── raft # Raft consensus settings
├── store # Store node management settings
└── partition # Partition management settings
Controls the gRPC server for inter-service communication.
grpc:
host: 127.0.0.1 # gRPC bind address
port: 8686 # gRPC server port| Parameter | Type | Default | Description |
|---|---|---|---|
grpc.host |
String | 127.0.0.1 |
IMPORTANT: Must be set to actual IP address (not 127.0.0.1) for distributed deployments. Store and Server nodes connect to this address. |
grpc.port |
Integer | 8686 |
gRPC server port. Ensure this port is accessible from Store and Server nodes. |
Production Notes:
- Set
grpc.hostto the node's actual IP address (e.g.,192.168.1.10) - Avoid using
0.0.0.0as it may cause service discovery issues - Ensure firewall allows incoming connections on
grpc.port
Controls the REST API server for management and monitoring.
server:
port: 8620 # REST API port| Parameter | Type | Default | Description |
|---|---|---|---|
server.port |
Integer | 8620 |
REST API port for health checks, metrics, and management operations. |
Endpoints:
- Health check:
http://<host>:8620/actuator/health - Metrics:
http://<host>:8620/actuator/metrics - Prometheus:
http://<host>:8620/actuator/prometheus
Controls Raft consensus for PD cluster coordination.
raft:
address: 127.0.0.1:8610 # This node's Raft address
peers-list: 127.0.0.1:8610 # All PD nodes in the cluster| Parameter | Type | Default | Description |
|---|---|---|---|
raft.address |
String | 127.0.0.1:8610 |
Raft service address for this PD node. Format: <ip>:<port>. Must be unique across all PD nodes. |
raft.peers-list |
String | 127.0.0.1:8610 |
Comma-separated list of all PD nodes' Raft addresses. Used for cluster formation and leader election. |
Critical Rules:
raft.addressmust be unique for each PD noderaft.peers-listmust be identical on all PD nodesraft.peers-listmust contain all PD nodes (including this node)- Use actual IP addresses, not
127.0.0.1, for multi-node clusters - Cluster size should be odd (3, 5, 7) for optimal Raft quorum
Example (3-node cluster):
# Node 1
raft:
address: 192.168.1.10:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610
# Node 2
raft:
address: 192.168.1.11:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610
# Node 3
raft:
address: 192.168.1.12:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610Controls PD-specific behavior.
pd:
data-path: ./pd_data # Metadata storage path
patrol-interval: 1800 # Partition rebalancing interval (seconds)
initial-store-count: 1 # Minimum stores for cluster availability
initial-store-list: 127.0.0.1:8500 # Auto-activated stores| Parameter | Type | Default | Description |
|---|---|---|---|
pd.data-path |
String | ./pd_data |
Directory for RocksDB metadata storage and Raft logs. Ensure sufficient disk space and fast I/O (SSD recommended). |
pd.patrol-interval |
Integer | 1800 |
Interval (in seconds) for partition health patrol and automatic rebalancing. Lower values = more frequent checks. |
pd.initial-store-count |
Integer | 1 |
Minimum number of Store nodes required for cluster to be operational. Set to expected initial store count. |
pd.initial-store-list |
String | 127.0.0.1:8500 |
Comma-separated list of Store gRPC addresses to auto-activate on startup. Useful for bootstrapping. |
Production Recommendations:
pd.data-path: Use dedicated SSD with at least 50GB free spacepd.patrol-interval:- Development:
300(5 minutes) for fast testing - Production:
1800(30 minutes) to reduce overhead - Large clusters:
3600(1 hour)
- Development:
pd.initial-store-count: Set to expected initial store count (e.g.,3for 3 stores)
Controls how PD monitors and manages Store nodes.
store:
max-down-time: 172800 # Store permanent failure threshold (seconds)
monitor_data_enabled: true # Enable metrics collection
monitor_data_interval: 1 minute # Metrics collection interval
monitor_data_retention: 1 day # Metrics retention period| Parameter | Type | Default | Description |
|---|---|---|---|
store.max-down-time |
Integer | 172800 |
Time (in seconds) after which a Store is considered permanently offline and its partitions are reallocated. Default: 48 hours. |
store.monitor_data_enabled |
Boolean | true |
Enable collection of Store metrics (CPU, memory, disk, partition count). |
store.monitor_data_interval |
Duration | 1 minute |
Interval for collecting Store metrics. Format: <value> <unit> (second, minute, hour). |
store.monitor_data_retention |
Duration | 1 day |
Retention period for historical metrics. Format: <value> <unit> (day, month, year). |
Production Recommendations:
store.max-down-time:- Development:
300(5 minutes) for fast failover testing - Production:
86400(24 hours) to avoid false positives during maintenance - Conservative:
172800(48 hours) for network instability
- Development:
store.monitor_data_interval:- High-frequency monitoring:
10 seconds - Standard:
1 minute - Low overhead:
5 minutes
- High-frequency monitoring:
store.monitor_data_retention:- Short-term:
1 day - Standard:
7 days - Long-term:
30 days(requires more disk space)
- Short-term:
Controls partition allocation and replication.
partition:
default-shard-count: 1 # Replicas per partition
store-max-shard-count: 12 # Max partitions per store| Parameter | Type | Default | Description |
|---|---|---|---|
partition.default-shard-count |
Integer | 1 |
Number of replicas per partition. Typically 3 in production for high availability. |
partition.store-max-shard-count |
Integer | 12 |
Maximum number of partition replicas a single Store can hold. Used for initial partition allocation. |
Initial Partition Count Calculation:
initial_partitions = (store_count * store_max_shard_count) / default_shard_count
Example:
- 3 stores,
store-max-shard-count=12,default-shard-count=3 - Initial partitions:
(3 * 12) / 3 = 12partitions - Each store hosts:
12 * 3 / 3 = 12shards (4 partitions as leader + 8 as follower)
Production Recommendations:
partition.default-shard-count:- Development/Testing:
1(no replication) - Production:
3(standard HA configuration) - Critical systems:
5(maximum fault tolerance)
- Development/Testing:
partition.store-max-shard-count:- Small deployment:
10-20 - Medium deployment:
50-100 - Large deployment:
200-500 - Limit based on Store disk capacity and expected data volume
- Small deployment:
Controls Spring Boot Actuator endpoints for monitoring.
management:
metrics:
export:
prometheus:
enabled: true # Enable Prometheus metrics export
endpoints:
web:
exposure:
include: "*" # Expose all actuator endpoints| Parameter | Type | Default | Description |
|---|---|---|---|
management.metrics.export.prometheus.enabled |
Boolean | true |
Enable Prometheus-compatible metrics at /actuator/prometheus. |
management.endpoints.web.exposure.include |
String | "*" |
Actuator endpoints to expose. "*" = all, or specify comma-separated list (e.g., "health,metrics"). |
Minimal configuration for local development.
grpc:
host: 127.0.0.1
port: 8686
server:
port: 8620
raft:
address: 127.0.0.1:8610
peers-list: 127.0.0.1:8610
pd:
data-path: ./pd_data
patrol-interval: 300 # Fast rebalancing for testing
initial-store-count: 1
initial-store-list: 127.0.0.1:8500
store:
max-down-time: 300 # Fast failover for testing
monitor_data_enabled: true
monitor_data_interval: 10 seconds
monitor_data_retention: 1 day
partition:
default-shard-count: 1 # No replication
store-max-shard-count: 10Characteristics:
- Single PD node (no HA)
- No replication (
default-shard-count=1) - Fast rebalancing for quick testing
- Suitable for development, not for production
Recommended configuration for production deployments.
grpc:
host: 192.168.1.10
port: 8686
server:
port: 8620
raft:
address: 192.168.1.10:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610
pd:
data-path: /data/pd/metadata
patrol-interval: 1800
initial-store-count: 3
initial-store-list: 192.168.1.20:8500,192.168.1.21:8500,192.168.1.22:8500
store:
max-down-time: 86400 # 24 hours
monitor_data_enabled: true
monitor_data_interval: 1 minute
monitor_data_retention: 7 days
partition:
default-shard-count: 3 # Triple replication
store-max-shard-count: 50grpc:
host: 192.168.1.11
port: 8686
server:
port: 8620
raft:
address: 192.168.1.11:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610
pd:
data-path: /data/pd/metadata
patrol-interval: 1800
initial-store-count: 3
initial-store-list: 192.168.1.20:8500,192.168.1.21:8500,192.168.1.22:8500
store:
max-down-time: 86400
monitor_data_enabled: true
monitor_data_interval: 1 minute
monitor_data_retention: 7 days
partition:
default-shard-count: 3
store-max-shard-count: 50grpc:
host: 192.168.1.12
port: 8686
server:
port: 8620
raft:
address: 192.168.1.12:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610
pd:
data-path: /data/pd/metadata
patrol-interval: 1800
initial-store-count: 3
initial-store-list: 192.168.1.20:8500,192.168.1.21:8500,192.168.1.22:8500
store:
max-down-time: 86400
monitor_data_enabled: true
monitor_data_interval: 1 minute
monitor_data_retention: 7 days
partition:
default-shard-count: 3
store-max-shard-count: 50Characteristics:
- 3 PD nodes for high availability
- Tolerates 1 PD node failure
- Triple replication (
default-shard-count=3) - 3 Store nodes specified in
initial-store-list - Standard monitoring and metrics collection
Network Requirements:
- Low latency (<5ms) between PD nodes for Raft
- Open ports: 8620 (REST), 8686 (gRPC), 8610 (Raft)
Configuration for mission-critical deployments requiring maximum fault tolerance.
# Node 1: 192.168.1.10
grpc:
host: 192.168.1.10
port: 8686
raft:
address: 192.168.1.10:8610
peers-list: 192.168.1.10:8610,192.168.1.11:8610,192.168.1.12:8610,192.168.1.13:8610,192.168.1.14:8610
pd:
data-path: /data/pd/metadata
patrol-interval: 3600 # Lower frequency for large clusters
initial-store-count: 5
initial-store-list: 192.168.1.20:8500,192.168.1.21:8500,192.168.1.22:8500,192.168.1.23:8500,192.168.1.24:8500
store:
max-down-time: 172800 # 48 hours (conservative)
monitor_data_enabled: true
monitor_data_interval: 1 minute
monitor_data_retention: 30 days # Long-term retention
partition:
default-shard-count: 3 # Or 5 for extreme HA
store-max-shard-count: 100Characteristics:
- 5 PD nodes for maximum HA
- Tolerates 2 PD node failures
- 5 Store nodes for data distribution
- Lower patrol frequency to reduce overhead
- Long-term metrics retention (30 days)
JVM options are specified via the startup script (bin/start-hugegraph-pd.sh).
# Option 1: Via startup script flag
bin/start-hugegraph-pd.sh -j "-Xmx8g -Xms8g"
# Option 2: Edit start-hugegraph-pd.sh directly
JAVA_OPTIONS="-Xmx8g -Xms8g -XX:+UseG1GC"Recommendations by Cluster Size:
| Cluster Size | Partitions | Heap Size | Notes |
|---|---|---|---|
| Small (1-3 stores, <100 partitions) | <100 | -Xmx2g -Xms2g |
Development/testing |
| Medium (3-10 stores, 100-1000 partitions) | 100-1000 | -Xmx4g -Xms4g |
Standard production |
| Large (10-50 stores, 1000-10000 partitions) | 1000-10000 | -Xmx8g -Xms8g |
Large production |
| X-Large (50+ stores, 10000+ partitions) | 10000+ | -Xmx16g -Xms16g |
Enterprise scale |
Key Principles:
- Set
-Xmsequal to-Xmxto avoid heap resizing - Reserve at least 2GB for OS and off-heap memory
- Monitor GC pause times and adjust accordingly
G1GC (Default, Recommended):
bin/start-hugegraph-pd.sh -g g1 -j "-Xmx8g -Xms8g \
-XX:MaxGCPauseMillis=200 \
-XX:G1HeapRegionSize=16m \
-XX:InitiatingHeapOccupancyPercent=45"- MaxGCPauseMillis: Target GC pause time (200ms recommended)
- G1HeapRegionSize: Region size (16m for 8GB heap)
- InitiatingHeapOccupancyPercent: When to trigger concurrent GC (45% recommended)
ZGC (Low-Latency, Java 11+):
bin/start-hugegraph-pd.sh -g ZGC -j "-Xmx8g -Xms8g \
-XX:ZCollectionInterval=30"- Ultra-low pause times (<10ms)
- Recommended for latency-sensitive deployments
- Requires Java 11+ (Java 15+ for production)
-Xlog:gc*:file=logs/gc.log:time,uptime,level,tags:filecount=10,filesize=100MRaft parameters are typically sufficient with defaults, but can be tuned for specific scenarios.
Increase election timeout for high-latency networks.
Default: 1000ms (1 second)
Tuning (requires code changes in RaftEngine.java):
// In hg-pd-core/.../raft/RaftEngine.java
nodeOptions.setElectionTimeoutMs(3000); // 3 secondsWhen to Increase:
- Network latency >10ms between PD nodes
- Frequent false leader elections
- Cross-datacenter deployments
Control how often Raft snapshots are created.
Default: 3600 seconds (1 hour)
Tuning (in RaftEngine.java):
nodeOptions.setSnapshotIntervalSecs(7200); // 2 hoursRecommendations:
- Frequent snapshots (1800s): Faster recovery, more I/O overhead
- Infrequent snapshots (7200s): Less I/O, slower recovery
PD uses RocksDB for metadata storage. Optimize for your workload.
SSD Optimization (default, recommended):
- RocksDB uses default settings optimized for SSD
- No configuration changes needed
HDD Optimization (not recommended): If using HDD (not recommended for production):
// In MetadataRocksDBStore.java, customize RocksDB options
Options options = new Options()
.setCompactionStyle(CompactionStyle.LEVEL)
.setWriteBufferSize(64 * 1024 * 1024) // 64MB
.setMaxWriteBufferNumber(3)
.setLevelCompactionDynamicLevelBytes(true);Key Metrics to Monitor:
- Disk I/O utilization
- RocksDB write stalls
- Compaction backlog
For high-throughput scenarios, tune gRPC connection pool size.
Client-Side (in PDClient):
PDConfig config = PDConfig.builder()
.pdServers("192.168.1.10:8686,192.168.1.11:8686,192.168.1.12:8686")
.maxChannels(5) // Number of gRPC channels per PD node
.build();Recommendations:
- Low traffic:
maxChannels=1 - Medium traffic:
maxChannels=3-5 - High traffic:
maxChannels=10+
Optimize OS-level TCP settings for low latency.
# Increase TCP buffer sizes
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"
# Reduce TIME_WAIT connections
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_fin_timeout=30| Metric | Threshold | Action |
|---|---|---|
| PD Leader Changes | >2 per hour | Investigate network stability, increase election timeout |
| Raft Log Lag | >1000 entries | Check follower disk I/O, network latency |
| Store Heartbeat Failures | >5% | Check Store node health, network connectivity |
| Partition Imbalance | >20% deviation | Reduce patrol-interval, check rebalancing logic |
| GC Pause Time | >500ms | Tune GC settings, increase heap size |
Disk Usage (pd.data-path) |
>80% | Clean up old snapshots, expand disk, increase monitor_data_retention |
scrape_configs:
- job_name: 'hugegraph-pd'
static_configs:
- targets:
- '192.168.1.10:8620'
- '192.168.1.11:8620'
- '192.168.1.12:8620'
metrics_path: '/actuator/prometheus'
scrape_interval: 15sKey panels to create:
- PD Cluster Status: Leader, follower count, Raft state
- Store Health: Online/offline stores, heartbeat success rate
- Partition Distribution: Partitions per store, leader distribution
- Performance: QPS, latency (p50, p95, p99)
- System Resources: CPU, memory, disk I/O, network
Located at conf/log4j2.xml.
<Loggers>
<!-- PD application logs -->
<Logger name="org.apache.hugegraph.pd" level="INFO"/>
<!-- Raft consensus logs (verbose, set to WARN in production) -->
<Logger name="com.alipay.sofa.jraft" level="WARN"/>
<!-- RocksDB logs -->
<Logger name="org.rocksdb" level="WARN"/>
<!-- gRPC logs -->
<Logger name="io.grpc" level="WARN"/>
<!-- Root logger -->
<Root level="INFO">
<AppenderRef ref="RollingFile"/>
<AppenderRef ref="Console"/>
</Root>
</Loggers>Recommendations:
- Development: Set PD logger to
DEBUGfor detailed tracing - Production: Use
INFO(default) orWARNfor lower overhead - Troubleshooting: Temporarily set specific package to
DEBUG
<RollingFile name="RollingFile" fileName="logs/hugegraph-pd.log"
filePattern="logs/hugegraph-pd-%d{yyyy-MM-dd}-%i.log.gz">
<PatternLayout>
<Pattern>%d{ISO8601} [%t] %-5level %logger{36} - %msg%n</Pattern>
</PatternLayout>
<Policies>
<TimeBasedTriggeringPolicy interval="1" modulate="true"/>
<SizeBasedTriggeringPolicy size="100 MB"/>
</Policies>
<DefaultRolloverStrategy max="30"/>
</RollingFile>Configuration:
- Size: Rotate when log file reaches 100MB
- Time: Rotate daily
- Retention: Keep last 30 log files
curl http://localhost:8620/actuator/healthResponse (healthy):
{
"status": "UP"
}curl http://localhost:8620/actuator/metricsAvailable Metrics:
pd.raft.state: Raft state (0=Follower, 1=Candidate, 2=Leader)pd.store.count: Number of stores by statepd.partition.count: Total partitionsjvm.memory.used: JVM memory usagejvm.gc.pause: GC pause times
curl http://localhost:8620/actuator/prometheusSample Output:
# HELP pd_raft_state Raft state
# TYPE pd_raft_state gauge
pd_raft_state 2.0
# HELP pd_store_count Store count by state
# TYPE pd_store_count gauge
pd_store_count{state="Up"} 3.0
pd_store_count{state="Offline"} 0.0
# HELP pd_partition_count Total partitions
# TYPE pd_partition_count gauge
pd_partition_count 36.0
-
grpc.hostset to actual IP address (not127.0.0.1) -
raft.addressunique for each PD node -
raft.peers-listidentical on all PD nodes -
raft.peers-listcontains all PD node addresses -
pd.data-pathhas sufficient disk space (>50GB) -
pd.initial-store-countmatches expected store count -
partition.default-shard-count= 3 (for production HA) - Ports accessible from Store/Server nodes (8620, 8686, 8610)
- NTP synchronized across all nodes
# Check Raft configuration
grep -A2 "^raft:" conf/application.yml
# Verify peers list on all nodes
for node in 192.168.1.{10,11,12}; do
echo "Node $node:"
ssh $node "grep peers-list /path/to/conf/application.yml"
done
# Check port accessibility
nc -zv 192.168.1.10 8620 8686 8610Key configuration guidelines:
- Single-node: Use defaults with
127.0.0.1addresses - 3-node cluster: Standard production setup with triple replication
- 5-node cluster: Maximum HA with increased fault tolerance
- JVM tuning: Allocate 4-8GB heap for typical production deployments
- Monitoring: Enable Prometheus metrics and create Grafana dashboards
For architecture details, see Architecture Documentation.
For API usage, see API Reference.
For development, see Development Guide.