Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions blog/27.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: "Stream4Graph: Incremental Computation on Dynamic Graphs"
date: "2025-3-11"
---

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/8237/1740982328260-3a0ff09e-920b-4f55-af14-326b5d0a358c.png)
![](/graph/1740982328260-3a0ff09e-920b-4f55-af14-326b5d0a358c.png)

> Author: Zhang Qi

Expand All @@ -23,15 +23,15 @@ Stream graph computing engine [GeaFlow](https://github.com/TuGraph-family/tugrap
- <font style="color:rgb(51, 51, 51);">Web pages</font><font style="color:rgb(51, 51, 51);">:Nodes represent web pages, and edges represent hyperlinks.</font>
- <font style="color:rgb(51, 51, 51);">Transportation networks: Nodes represent cities, and edges represent roads or air routes.</font>

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740386529457-b43e2d49-6000-4acf-862c-314ae4f23dbc.png)
![](/graph/1740386529457-b43e2d49-6000-4acf-862c-314ae4f23dbc.png)

Graphs inherently represent the connections between nodes, and based on these relationships, we can use nodes and edges to process, analyze, and mine information, helping us understand relationships and patterns in complex systems. The computational activities conducted on graphs are referred to as graph computing. Graph computing has many applications, such as identifying user connections and discovering community structures through social network analysis, calculating web page rankings by analyzing hyperlink relationships, and recommending relevant content and products by building relationship graphs based on user behavior and preferences.

<font style="color:rgb(51, 51, 51);"></font>

Let's take a simple social network analysis algorithm—Weakly Connected Components (WCC)—as an example. WCC helps us identify "friend circles" or "communities" among users. For instance, on a social platform, a group of users who interact through likes, comments, or follows forms a large weakly connected component, while some users may not be connected to this large component, forming smaller weakly connected components.

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740386998582-16f67c8e-ee45-48d2-bb5f-f45ec3956273.png)
![](/graph/1740386998582-16f67c8e-ee45-48d2-bb5f-f45ec3956273.png)

If we were to build a WCC algorithm based solely on the small graph above, it would be very simple—we could just construct a basic node-edge structure on a personal PC and perform graph traversal. However, if the graph scale expands to hundreds of billions or even trillions, we would need to use large-scale distributed graph computing engines to handle it.

Expand Down Expand Up @@ -80,7 +80,7 @@ How does Spark GraphX handle graph algorithms? GraphX extends Spark RDD by intro

In summary, users first need to convert raw tabular data from storage into node-edge data types in GraphX and then let Spark handle the processing. This is for offline processing of static graphs. However, in the real world, both the scale of graph data and the relationships between nodes are constantly changing, especially in the era of big data where changes occur rapidly. How to efficiently and real-time process dynamic graph data is a significant challenge.

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740454568183-6d42716b-fc84-41a8-945c-c97b81d61135.png)
![](/graph/1740454568183-6d42716b-fc84-41a8-945c-c97b81d61135.png)

## 3. Dynamic Graph Computing: Spark Streaming

Expand Down Expand Up @@ -112,7 +112,7 @@ Using the WCC algorithm as an example, for the connected components algorithm, i

The GeaFlow engine consists of three main parts: DSL, Framework, and State. It also provides users with Stream API, Static Graph API, and Dynamic Graph API. The DSL layer is responsible for parsing and optimizing graph query languages like SQL+ISO/GQL, as well as schema inference. It also supports various Connectors such as Hive, Hudi, Kafka, and ODPS. The Framework layer handles runtime scheduling, fault tolerance, shuffle, and coordination of components. The State layer is responsible for storing underlying graph data and persistence, as well as performance optimizations like indexing and predicate pushdown.

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1739276186744-96d40e95-4e29-48ef-8892-1b7dfa60c726.png)
![](/graph/1739276186744-96d40e95-4e29-48ef-8892-1b7dfa60c726.png)

## 6. GeaFlow Performance Testing

Expand Down
2 changes: 1 addition & 1 deletion blog/28.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Principles and Applications of Incremental Match in Streaming Graph Compu
date: 2025-6-3
---

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/23857192/1743162676746-973d8e75-11b5-43d7-8832-724e7332b964.png)
![](/graph/1743162676746-973d8e75-11b5-43d7-8832-724e7332b964.png)

## Problem Background
In streaming computing, data rarely arrives all at once but is continuously input and processed. Similarly, in graph computing/graph querying scenarios, vertices and edges are constantly read from data sources to construct graphs incrementally. In incremental graph queries, the graph evolves continuously, leading to different query results across graph versions. When new vertices/edges form an updated graph version, recomputing through the entire graph incurs high overhead and duplicates historical computations. Since historical data has already been processed, ideally only the delta-affected portions should be computed/queried without full-graph re-execution.
Expand Down
6 changes: 3 additions & 3 deletions blog/30.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: "Join Performance Revolution: Graph Data Warehouse Makes SQL Analysis Fas
date: 2025-5-15
---

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/8237/1741854036431-9d80b215-73fc-4838-bdda-905d59ebf08e.png)
![](/graph/1741854036431-9d80b215-73fc-4838-bdda-905d59ebf08e.png)

> Author: Lin Litao

Expand All @@ -28,7 +28,7 @@ date: 2025-5-15

**Innovation Constraint**: Business analysts often abandon graph technology stacks due to the need to learn GQL (Graph Query Language). The fragmented toolchain keeps graph analytics confined to technical departments, failing to empower front-line business teams.

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741674805947-d91bf10a-02eb-427c-acea-3cb96094f164.png?x-oss-process=image/format,png)
![](/graph/1741674805947-d91bf10a-02eb-427c-acea-3cb96094f164.png?x-oss-process=image/format,png)

**Figure 2: JOIN vs GQL Expression Examples**

Expand Down Expand Up @@ -64,7 +64,7 @@ The Graph Data Warehouse Schema Converter automatically transforms the ER model

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741683049495-d75ae87b-9510-40ee-b22f-c6140570b1f1.png)

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741683063811-75a42c49-8b17-482b-9dd2-98be89ec63b0.png)
![](/graph/1741683063811-75a42c49-8b17-482b-9dd2-98be89ec63b0.png)

**Figure 3: ER to Graph Schema Conversion Example Series**

Expand Down
10 changes: 5 additions & 5 deletions blog/31.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: "Graph4Stream: Accelerating Stream Computing with Graph-Based Approaches"
date: 2025-3-25
---

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/8237/1741852109942-9310f385-a0c2-4c32-987f-77b5c9df911a.png)
![](/graph/1741852109942-9310f385-a0c2-4c32-987f-77b5c9df911a.png)

> Author: Kunyu; Reviewer: Dongshuo.

Expand Down Expand Up @@ -66,15 +66,15 @@ ON `e`.`dst` = `v`.`vid`;

The execution plan is shown below. It consists of operators such as Aggregate, Calc, and Join. Data flows through each operator to yield incremental results. The core operator, Join, is responsible for relationship lookups. Let's examine how the Join operator works.

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1740550683257-198617cb-b66b-4997-86f7-d41df94f0fb1.png)
![](/graph/1740550683257-198617cb-b66b-4997-86f7-d41df94f0fb1.png)

Flink Execution Plan

As shown below, the Join operator has two input streams: LeftInput and RightInput, corresponding to the left and right tables of the join. When data arrives from upstream, the operator begins computation. Taking the left input stream as an example, the data is first stored in LeftStateView. Then, the operator queries RightStateView for data that satisfies the join condition. This querying process requires scanning through RightStateView, and the resulting joined data is passed to the next operator.

The main performance bottleneck lies in scanning RightStateView. LeftStateView and RightStateView store the left and right tables of the join, respectively. As data continuously flows in, the size of StateViews grows, causing scan times to increase dramatically and severely degrading system performance.

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1741589034132-1969e973-94dd-42ca-b506-ebe4594d87a8.png)
![](/graph/1741589034132-1969e973-94dd-42ca-b506-ebe4594d87a8.png)

Flink Join Operator Implementation

Expand All @@ -84,7 +84,7 @@ Flink Join Operator Implementation

Graph computing is a computational paradigm based on graph data structures. A graph G(V,E) consists of a set of vertices V and edges E, where edges represent relationships between data. Using the public dataset web-Google as an example, each line contains two numbers representing a hyperlink between two web pages. As shown below, the left side shows raw data, which is traditionally modeled as a two-column table. In contrast, graph modeling treats web pages as vertices and hyperlinks as edges, forming a web link graph. In the tabular model, relationship computation is done via joins, which require scanning tables. In graph computing, relationships are directly stored in edges, eliminating the need for scans.

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1741544451333-05f87f7e-8c8b-41fb-a27b-051b6df8e5da.png)
![](/graph/1741544451333-05f87f7e-8c8b-41fb-a27b-051b6df8e5da.png)

Table Modeling vs. Graph Modeling

Expand All @@ -104,7 +104,7 @@ Taking k-Hop as an example, the incremental algorithm works as follows: In the f

The diagram below illustrates the two-hop case. In the first iteration, the edge B->C creates incoming and outgoing paths, sent to B and C, respectively. In the second iteration, B receives an incoming path, adds its own incoming edges, and forms a 2-hop incoming path, which it sends to itself. Similarly, C forms a 2-hop outgoing path and sends it to B. In the final iteration, B combines the incoming and outgoing paths to produce the new paths. Unlike Flink, which must scan all historical relationships, GeaFlow's computation is proportional to the incremental paths, not the historical data.

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1741710927215-b6be1398-7485-432b-b8f7-c4cde5366302.png)
![](/graph/1741710927215-b6be1398-7485-432b-b8f7-c4cde5366302.png)

Two-Hop Incremental Path Computation

Expand Down
4 changes: 2 additions & 2 deletions blog/32.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ In this update, GeaFlow adds support for Paimon storage (currently **experimenta
- Configure the storage path via the parameter `geaflow.store.paimon.options.warehouse` (default: `"file:///tmp/paimon/"`).

The current GeaFlow storage architecture is shown below:
![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583755-e264437e-59a4-4483-81b9-6b1b26a49279.webp)
![](/graph/1756792583755-e264437e-59a4-4483-81b9-6b1b26a49279.webp)

### 🍀 Graph Data Warehouse Capability Expansion: Supports Relational Access to Graph Entities

Expand Down Expand Up @@ -126,4 +126,4 @@ Key features introduced in v0.6.3 (building on v0.5.2) include:
## ✨ Acknowledgments

Thank you to all contributors for making this release possible!
![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583799-ea7feea1-1279-4089-bbb9-61fc0b6331b2.webp)
![](/graph/1756792583799-ea7feea1-1279-4089-bbb9-61fc0b6331b2.webp)
23 changes: 10 additions & 13 deletions community/en/community.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,26 @@
### Support

A brief introduction to the project and contact information.
Apache GeaFlow (Incubating): A Streaming Graph Computing Engine.

### Team

Team introduction. You can refer to [https://answer.apache.org/community/team](https://answer.apache.org/community/team).

Anyone who has contributed to the project can be included.
Contact us through the following mailing list.
[dev@geaflow.apache.org](mailto:dev@geaflow.apache.org)

### Security
If you are interested in GeaFlow, please give our project a [⭐](https://github.com/apache/geaflow)

Rules for handling security issues and publicly disclosed CVEs.
### Team

### How to Contribute
Team introduction. You can refer to [https://github.com/apache/geaflow/graphs/contributors](https://github.com/apache/geaflow/graphs/contributors).

Introduction on how to contribute.
Anyone who has contributed to the project can be included.

### Feature Request

Link to the GitHub issues page, making it easy for users to submit requests via GitHub issues (you can also create a page explaining how to submit via the mailing list).
Link to the GitHub issues page. [https://github.com/apache/geaflow/issues](https://github.com/apache/geaflow/issues)

### Roadmap

Link to the Roadmap file on GitHub, mainly explaining the future plans of the project.
Link to the Roadmap file on GitHub. [https://github.com/apache/geaflow/issues/532](https://github.com/apache/geaflow/issues/532)

### Logos

Provide downloadable project logos and other resources
![](/img/logo.png)
32 changes: 17 additions & 15 deletions community/zh/community.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,25 @@
### Support
简单的项目方面的介绍,并提供联系方式
### 支持
Apache GeaFlow(孵化中):一个流式图计算引擎。

### Team
团队介绍 可以参考 [https://answer.apache.org/community/team](https://answer.apache.org/community/team)
通过以下邮件列表联系我们:
[dev@geaflow.apache.org](mailto:dev@geaflow.apache.org)

对项目有贡献的同学都可以加进来
如果您对 GeaFlow 感兴趣,请为我们的项目点一个 [⭐](https://github.com/apache/geaflow)

### Security
针对安全方面的处理规则以及已经处理的公开的 CVE
### 团队

### How to Contribute
介绍如何贡献
团队介绍。您可参考 [https://github.com/apache/geaflow/graphs/contributors](https://github.com/apache/geaflow/graphs/contributors)。

### Feature Request
链接到 GitHub issues 页面,方便用户以 GitHub issues 的形式提交(也可以做个页面,说明通过加入邮件列表的形式提交)
任何为项目作出过贡献的人均可列入。

### Roadmap
链接到 GitHub 页面的 Roadmap 文件,主要说明项目后续的路线图
### 功能建议

### Logos
提供项目的 logo 等资源的下载,方便社区协作使用
跳转到 GitHub 议题页面的链接:[https://github.com/apache/geaflow/issues](https://github.com/apache/geaflow/issues)

### 路线图

跳转到 GitHub 路线图文件的链接:[https://github.com/apache/geaflow/issues/532](https://github.com/apache/geaflow/issues/532)

### 徽标(Logo)

![](/img/logo.png)
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,17 @@ date: 2023-06-11

<!-- truncate -->

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/96961/1755592247311-ef91bb78-19ae-43f2-8e7a-701001b7759d.png)
![](/graph/1755592247311-ef91bb78-19ae-43f2-8e7a-701001b7759d.png)

<font style="color:rgb(69, 69, 69);">去年 9 月,蚂蚁集团开源了 TuGraph 图计算平台中的图数据库 TuGraph DB。这次开源是 TuGraph 图计算平台的又一次开源升级,进一步加大了蚂蚁在图计算基础软件领域的开放力度,也是通过开放协同促进科技创新的实际行动。</font>

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/96961/1755592259385-b4377a0d-7c78-4ce4-a19c-d21e6aea6290.png)
![](/graph/1755592259385-b4377a0d-7c78-4ce4-a19c-d21e6aea6290.png)

<font style="color:rgb(69, 69, 69);">图(Graph)是一种抽象的数据结构,由顶点和边构成。图计算是一种以图结构建模的算法模型,可对大规模数据进行关系挖掘和复杂计算,实现知识推理和事件溯源。图计算目前已广泛应用在金融、政务、医疗等领域,备受全球研发机构和顶尖科技公司关注。流式图计算是一种将流式计算和图计算结合的交叉创新,融合了流式计算的高度实效性和图计算的灵活性,攻坚难度极高。</font>

<font style="color:rgb(69, 69, 69);">据了解,蚂蚁从 2015 年开始探索图计算,布局了图数据库、流式图计算引擎、图学习等相关技术,打造了世界规模领先的图计算集群,于业界首创了工业级流式图计算引擎,多次问鼎图数据库行业权威测试 LDBC 世界冠军并保持世界纪录。此次开源的工业级流式图计算引擎是蚂蚁从 2017 年开始布局打造,经过五年多工业级应用大考,流式图计算做到了在千亿数据规模的“图”上秒级延迟计算,是蚂蚁风控的核心基础技术,成功解决了金融场景风险分析难、识别率低、时效性差等业界难题。</font>

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/96961/1755592294749-1c359609-0c5c-464f-8181-b60e6d740773.png)
![](/graph/1755592294749-1c359609-0c5c-464f-8181-b60e6d740773.png)

<font style="color:rgb(69, 69, 69);">图计算是下一代人工智能关键核心技术。中国工程院院士郑纬民曾指出,“高性能图计算是当前全球人工智能竞争的战略性制高点,我们要加快攻克技术、突破产业瓶颈,防止在高性能图计算这一关键技术领域再被卡脖子”。而开源是共享科技成果,加速先进技术落地的最快路径。</font>

Expand All @@ -37,6 +37,6 @@ date: 2023-06-11

## <font style="color:rgb(69, 69, 69);">微信群</font>

![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/96961/1755592229183-6e880de0-ebb8-476b-9531-bbe277bbe705.png)
![](/graph/1755592229183-6e880de0-ebb8-476b-9531-bbe277bbe705.png)

<font style="color:rgb(69, 69, 69);">请点击项目链接下方微信二维码添加微信用户群:</font>[<font style="color:rgb(255, 81, 0);">https://github.com/TuGraph-family/tugraph-analytics</font>](https://github.com/TuGraph-family/tugraph-analytics)
Loading