This file provides guidance to AI coding assistants when working with code in this repository.
This is the Apache HugeGraph-Computer repository containing two distinct graph computing systems:
- computer (Java/Maven): A distributed BSP/Pregel-style graph processing framework that runs on Kubernetes or YARN
- vermeer (Go): A high-performance in-memory graph computing platform with master-worker architecture
Both integrate with HugeGraph for graph data input/output.
Prerequisites:
- JDK 11 for building/running
- JDK 8 for HDFS dependencies
- Maven 3.5+
- For K8s module: run
mvn clean installfirst to generate CRD classes under computer-k8s
Build:
cd computer
mvn clean compile -Dmaven.javadoc.skip=trueTests:
# Unit tests
mvn test -P unit-test
# Integration tests
mvn test -P integrate-testRun single test:
# Run specific test class
mvn test -P unit-test -Dtest=ClassName
# Run specific test method
mvn test -P unit-test -Dtest=ClassName#methodNameLicense check:
mvn apache-rat:checkPackage:
mvn clean package -DskipTestsPrerequisites:
- Go 1.23+
curlandunzip(for downloading binary dependencies)
First-time setup:
cd vermeer
make init # Downloads supervisord and protoc binaries, installs Go depsBuild:
make # Build for current platform
make build-linux-amd64
make build-linux-arm64Development build with hot-reload UI:
go build -tags=devClean:
make clean # Remove built binaries and generated assets
make clean-all # Also remove downloaded toolsRun:
# Using binary directly
./vermeer --env=master
./vermeer --env=worker
# Using script (configure in vermeer.sh)
./vermeer.sh start master
./vermeer.sh start workerRegenerate protobuf (if proto files changed):
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.28.0
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@v1.2.0
# Generate (adjust protoc path for your platform)
vermeer/tools/protoc/linux64/protoc vermeer/apps/protos/*.proto --go-grpc_out=vermeer/apps/protos/. --go_out=vermeer/apps/protos/. # please note remove license header if anyModule Structure:
computer-api: Public interfaces for graph processing (Computation, Vertex, Edge, Aggregator, Combiner, GraphFactory)computer-core: Runtime implementation (WorkerService, MasterService, messaging, BSP coordination, managers)computer-algorithm: Built-in algorithms (PageRank, LPA, WCC, SSSP, TriangleCount, etc.)computer-driver: Job submission and driver-side coordinationcomputer-k8s: Kubernetes deployment integrationcomputer-yarn: YARN deployment integrationcomputer-k8s-operator: Kubernetes operator for job managementcomputer-dist: Distribution packagingcomputer-test: Integration and unit tests
Key Design Patterns:
-
API/Implementation Separation: Algorithms depend only on
computer-apiinterfaces;computer-coreprovides runtime implementation. Algorithms are dynamically loaded via config. -
Manager Pattern:
WorkerServicecomposes multiple managers (MessageSendManager, MessageRecvManager, WorkerAggrManager, DataServerManager, SortManagers, SnapshotManager, etc.) with lifecycle hooks:initAll(),beforeSuperstep(),afterSuperstep(),closeAll(). -
BSP Coordination: Explicit barrier synchronization via etcd (EtcdBspClient). Each superstep follows:
workerStepPrepareDone→waitMasterStepPrepareDone- Local compute (vertices process messages)
workerStepComputeDone→waitMasterStepComputeDone- Aggregators/snapshots
workerStepDone→waitMasterStepDone(master returns SuperstepStat)
-
Computation Contract: Algorithms implement
Computation<M extends Value>:compute0(context, vertex): Initialize at superstep 0compute(context, vertex, messages): Process messages in subsequent supersteps- Access to aggregators, combiners, and message sending via
ComputationContext
Important Files:
- Algorithm contract:
computer/computer-api/src/main/java/org/apache/hugegraph/computer/core/worker/Computation.java - Runtime orchestration:
computer/computer-core/src/main/java/org/apache/hugegraph/computer/core/worker/WorkerService.java - BSP coordination:
computer/computer-core/src/main/java/org/apache/hugegraph/computer/core/bsp/Bsp4Worker.java - Example algorithm:
computer/computer-algorithm/src/main/java/org/apache/hugegraph/computer/algorithm/centrality/pagerank/PageRank.java
Directory Structure:
algorithms/: Go algorithm implementations (pagerank.go, sssp.go, louvain.go, etc.)apps/:bsp/: BSP coordination helpersgraphio/: HugeGraph I/O adapters (reads via gRPC to store/pd, writes via HTTP REST)master/: Master scheduling, HTTP endpoints, worker managementcompute/: Worker-side compute logicprotos/: Generated protobuf/gRPC definitionscommon/: Utilities, logging, metrics
client/: Client librariestools/: Binary dependencies (supervisord, protoc)ui/: Web UI assets
Key Patterns:
-
Maker/Registry Pattern: Graph loaders/writers register themselves via init() (e.g.,
LoadMakers[LoadTypeHugegraph] = &HugegraphMaker{}). Master selects loader by type. -
HugeGraph Integration:
hugegraph.goimplements HugegraphMaker, HugegraphLoader, HugegraphWriter- Queries PD via gRPC for partition metadata
- Streams vertex/edge data via gRPC from store (ScanPartition)
- Writes results back via HugeGraph HTTP REST API
-
Master-Worker: Master schedules LoadPartition tasks to workers, manages worker lifecycle via WorkerManager/WorkerClient, exposes HTTP admin endpoints.
Important Files:
- HugeGraph integration:
vermeer/apps/graphio/hugegraph.go - Master scheduling:
vermeer/apps/master/tasks/tasks.go - Worker management:
vermeer/apps/master/workers/workers.go - HTTP endpoints:
vermeer/apps/master/services/http_master.go - Scheduler:
vermeer/apps/master/bl/scheduler_bl.go
Computer (Java):
WorkerInputManagerreads vertices/edges from HugeGraph viaGraphFactoryabstraction- Graph data is partitioned and distributed to workers via input splits
Vermeer (Go):
- Directly queries HugeGraph PD (metadata service) for partition information
- Uses gRPC to stream graph data from HugeGraph store
- Writes computed results back via HugeGraph HTTP REST API (adds properties to vertices)
Adding a New Algorithm (Computer):
- Create class in
computer-algorithmimplementingComputation<MessageType> - Implement
compute0()for initialization andcompute()for message processing - Use
context.sendMessage()orcontext.sendMessageToAllEdges()for message passing - Register aggregators in
beforeSuperstep(), read/write incompute() - Configure algorithm class name in job config
K8s-Operator Development:
- CRD classes are auto-generated; run
mvn clean installincomputer-k8s-operatorfirst - Generated classes appear in
computer-k8s/target/generated-sources/ - CRD generation script:
computer-k8s-operator/crd-generate/Makefile
Vermeer Asset Updates:
- Web UI assets must be regenerated after changes:
cd asset && go generate - Or use
make generate-assetsfrom vermeer root - For dev mode with hot-reload:
go build -tags=dev
Computer:
- Integration tests require etcd, HDFS, HugeGraph, and Kubernetes (see
.github/workflows/computer-ci.yml) - Test environment setup scripts in
computer-dist/src/assembly/travis/ - Unit tests run in isolation without external dependencies
Vermeer:
- Test scripts in
vermeer/test/,withvermeer_test.goandvermeer_test.sh - Configuration files in
vermeer/config/(master.ini, worker.ini templates)
CI pipeline (.github/workflows/computer-ci.yml) runs:
- License check (Apache RAT)
- Setup HDFS (Hadoop 3.3.2)
- Setup Minikube/Kubernetes
- Load test data into HugeGraph
- Compile with Java 11
- Run integration tests (
-P integrate-test) - Run unit tests (
-P unit-test) - Upload coverage to Codecov
- Computer K8s module: Must run
mvn clean installbefore editing to generate CRD classes - Java version: Build requires JDK 11; HDFS dependencies require JDK 8
- Vermeer binary deps: First-time builds need
make initto download supervisord/protoc - BSP coordination: Computer uses etcd for barrier synchronization (configure via
BSP_ETCD_URL) - Memory management: Both systems auto-manage memory by spilling to disk when needed