diff --git a/.asf.yaml b/.asf.yaml index 77ffef2db..69057380f 100644 --- a/.asf.yaml +++ b/.asf.yaml @@ -21,7 +21,7 @@ publish: whoami: asf-site github: - description: "Apache GeaFlow: A Streaming Graph Computing Engine." + description: "Apache GeaFlow (Incubating): A Streaming Graph Computing Engine." homepage: https://geaflow.apache.org/ features: issues: true diff --git a/blog/27.md b/blog/27.md index 3990f4846..0cba78575 100644 --- a/blog/27.md +++ b/blog/27.md @@ -1,47 +1,47 @@ --- -title: "Stream4Graph:动态图上的增量计算" +title: "Stream4Graph: Incremental Computation on Dynamic Graphs" date: "2025-3-11" --- ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/8237/1740982328260-3a0ff09e-920b-4f55-af14-326b5d0a358c.png) -> 作者:张奇 +> Author: Zhang Qi -众所周知,当我们需要对数据做关联性分析的时候,一般会采用表连接(SQL join)的方式完成。但是 SQL join 时的笛卡尔积计算需要维护大量的中间结果,从而对整体的数据分析性能带来巨大影响。相比而言,基于图的方式维护数据的关联性,原本的关联性分析可以转换为图上的遍历操作,从而大幅降低数据分析的成本。 +It's well known that when we need to perform correlation analysis on data, we typically use SQL join operations. However, Cartesian product calculations during SQL joins require maintaining a large number of intermediate results, which significantly impacts overall data analysis performance. In contrast, graph-based approaches maintain data correlations, transforming correlation analysis into graph traversal operations and greatly reducing the cost of data analysis. -然而,随着数据规模的不断增长,以及对数据处理更强的实时性需求,如何高效地解决大规模图数据上的实时计算问题,就变得越来越紧迫。传统的计算引擎,如 Spark、Flink 对于图数据的处理已经逐渐不能满足业务日益增长的诉求,因此设计一套面向大规模图数据的实时处理引擎,将会对大数据处理技术革新带来巨大的帮助。 +However, with the continuous growth in data scale and increasing demand for real-time processing, efficiently solving real-time computation problems on large-scale graph data has become increasingly urgent. Traditional computing engines such as Spark and Flink are gradually falling short of meeting the growing business demands for graph data processing. Therefore, designing a real-time processing engine tailored for large-scale graph data will bring significant advancements to big data processing technologies. -蚂蚁图计算团队开源的流图计算引擎[GeaFlow](https://github.com/TuGraph-family/tugraph-analytics),结合了图处理和流处理的技术优势,实现了动态图上的增量计算能力,在高性能关联性分析的基础上,进一步提升了图计算的实时性。接下来向大家介绍图计算技术的特点,业内如何解决大规模实时图计算问题,以及 GeaFlow 在动态图上的计算性能表现。 +Stream graph computing engine [GeaFlow](https://github.com/TuGraph-family/tugraph-analytics), which combines the technical advantages of graph processing and stream processing. It implements incremental computation capabilities on dynamic graphs, enhancing real-time performance in high-performance correlation analysis. In the following sections, we will introduce the characteristics of graph computing technology, how the industry addresses large-scale real-time graph computing challenges, and GeaFlow's performance in dynamic graph computation. -## 1. 图计算 +## 1. Graph Computing -图是一种数学结构,由节点和边组成。节点代表各种实体,比如人、地点、事物或概念,而边则表示这些节点之间的关系。例如: +A graph is a mathematical structure composed of nodes and edges. Nodes represent various entities such as people, locations, objects, or concepts, while edges represent the relationships between these nodes. For example: -- 社交媒体:节点可以代表用户,边可以表示朋友关系。 -- 网页:节点代表网页,边代表超链接。 -- 交通网络:节点代表城市,边代表道路或航线。 +- Social media:Nodes can represent users, and edges can represent friendships. +- Web pages:Nodes represent web pages, and edges represent hyperlinks. +- Transportation networks: Nodes represent cities, and edges represent roads or air routes. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740386529457-b43e2d49-6000-4acf-862c-314ae4f23dbc.png) -图本身代表了节点与节点之间的链接关系,而针对这些关系,我们可以利用图中的节点和边来进行信息处理、分析和挖掘,帮助我们理解复杂系统中的关系和模式。在图上开展的计算活动就是图计算。图计算有很多应用场景,比如通过社交网络分析可以识别用户之间的联系,发现社群结构;通过分析网页间的链接关系来计算网页排名;通过用户的行为和偏好构建关系图,推荐相关内容和产品。 +Graphs inherently represent the connections between nodes, and based on these relationships, we can use nodes and edges to process, analyze, and mine information, helping us understand relationships and patterns in complex systems. The computational activities conducted on graphs are referred to as graph computing. Graph computing has many applications, such as identifying user connections and discovering community structures through social network analysis, calculating web page rankings by analyzing hyperlink relationships, and recommending relevant content and products by building relationship graphs based on user behavior and preferences. -我们就以简单的社交网络分析算法,弱联通分量(Weakly Connected Components, WCC)为例。弱联通分量可以帮助我们识别用户之间的“朋友圈”或“社区”,比如某个社交平台上,一群用户通过点赞、评论或关注形成一个大的弱联通分量,而某些用户可能没有连接到这个大分量,形成更小的弱联通分量。 +Let's take a simple social network analysis algorithm—Weakly Connected Components (WCC)—as an example. WCC helps us identify "friend circles" or "communities" among users. For instance, on a social platform, a group of users who interact through likes, comments, or follows forms a large weakly connected component, while some users may not be connected to this large component, forming smaller weakly connected components. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740386998582-16f67c8e-ee45-48d2-bb5f-f45ec3956273.png) -如果仅仅针对上面这张小图来构建弱联通分量算法,那么非常简单,我们只需要在个人 PC 上构建简单的点边结构然后走图遍历即可。但如果图的规模扩展的千亿甚至万亿,这时就需要用到大规模分布式图计算引擎来处理了。 +If we were to build a WCC algorithm based solely on the small graph above, it would be very simple—we could just construct a basic node-edge structure on a personal PC and perform graph traversal. However, if the graph scale expands to hundreds of billions or even trillions, we would need to use large-scale distributed graph computing engines to handle it. -## 2. 分布式图计算:Spark GraphX +## 2. Distributed Graph Computing: Spark GraphX -针对图的处理一般有图计算引擎和图数据库两大类,图数据库有Neo4j‌、TigerGraph‌ 等,图计算引擎有 Spark GraphX、Pregel 等。在本文我们主要讨论图计算引擎,以 Spark GraphX 为例,Spark GraphX 是 Apache Spark 的一个组件,专门用于图计算和图分析。GraphX 结合了 Spark 的强大数据处理能力与图计算的灵活性,扩展了 Spark 的核心功能,为用户提供了一个统一的 API,便于处理图数据。 +Graph processing generally falls into two categories: graph computing engines and graph databases. Graph databases include Neo4j, TigerGraph, etc., while graph computing engines include Spark GraphX, Pregel, etc. In this article, we mainly discuss graph computing engines, using Spark GraphX as an example. Spark GraphX is a component of Apache Spark specifically designed for graph computing and analysis. GraphX combines Spark’s powerful data processing capabilities with the flexibility of graph computing, extending Spark’s core functionality and providing users with a unified API for processing graph data. -那么在 Spark GraphX 上是如何处理图算法的呢?GraphX 通过引入一种点和边都附带属性的有向多图扩展了 Spark RDD 这种抽象数据结构,为用户提供了一个类似于 Pregel 计算模型的以点为中心的并行抽象。用户需要为 GraphX 提供原始图 graph、初始消息 initialMsg、核心计算逻辑 vprog、发送消息控制组件 sendMsg、合并消息组件 mergeMsg,计算开始时,GraphX 初始阶段会激活所有点进行初始化,然后按照用户提供的发送消息组件确定接下来向那些点发送消息。在之后的迭代里,只有收到消息的点才会被激活,进行接下来的计算,如此循环往复直到链路中没有被新激活的点或者到达最大迭代次数,最后输出计算结果。 +How does Spark GraphX handle graph algorithms? GraphX extends Spark RDD by introducing a directed multigraph where both nodes and edges carry attributes, providing users with a vertex-centric parallel abstraction similar to the Pregel computing model. Users need to provide GraphX with the original graph, initial messages, core computation logic (vprog), message-sending component (sendMsg), and message-merging component (mergeMsg). At the start of the computation, GraphX activates all vertices for initialization. Then, based on the user-provided sendMsg component, it determines which vertices to send messages to. In subsequent iterations, only vertices that receive messages are activated for further computation, repeating the process until no new vertices are activated or the maximum iteration count is reached, finally outputting the results. @@ -78,47 +78,47 @@ date: "2025-3-11" } ``` -总的来说,用户首先需要将存储介质中原始的表结构数据转换为 GraphX 中的点边数据类型,然后交给 Spark 进行处理,这是针对静态图进行离线处理。但是我们知道,现实世界中,图数据的规模和数据内节点之间的关系都不是一成不变的,并且在大数据时代其变化非常快。如何实时高效的处理不断变化的图数据(动态图),是一个值得深思的问题。 +In summary, users first need to convert raw tabular data from storage into node-edge data types in GraphX and then let Spark handle the processing. This is for offline processing of static graphs. However, in the real world, both the scale of graph data and the relationships between nodes are constantly changing, especially in the era of big data where changes occur rapidly. How to efficiently and real-time process dynamic graph data is a significant challenge. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740454568183-6d42716b-fc84-41a8-945c-c97b81d61135.png) -## 3. 动态图计算:Spark Streaming +## 3. Dynamic Graph Computing: Spark Streaming -针对动态图的处理,常见的解决方案是 Spark Streaming 框架,它可以从很多数据源消费数据并对数据进行处理。它是是 Spark 核心 API 的一个扩展,可以实现高吞吐量的、具备容错机制的实时流数据的处理。 +For processing dynamic graphs, a common solution is the Spark Streaming framework, which can consume data from various sources and process it. It extends Spark’s core API to enable high-throughput, fault-tolerant real-time stream data processing. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740470405961-05389aa3-1b67-4cdf-9c65-ea28641ef89c.png) -如上图所示是 Spark Streaming 对实时数据进行处理的流程。首先 Spark 中的每个 Receiver 接收到实时消息流后,对实时消息进行解析和切分,之后将生成的图数据存储在每个 Executor 中。每当数据累积到一定的批次,就会触发一次全量计算,最后将计算出的结果输出给用户,这也称之为基于快照的图计算方案。 +As shown in the figure above, this is the process of Spark Streaming handling real-time data. First, each Receiver in Spark receives a real-time message stream, parses and segments the messages, and then stores the generated graph data in each Executor. When data accumulates to a certain batch, a full-scale computation is triggered, and the final results are output to the user. This is known as the snapshot-based graph computing approach. -但这种方案有一个比较大的缺点,就是它存在着重复计算的问题,假如我们需要以 1 小时一个窗口做一次计算,那么在使用 Spark 进行计算时,不仅要将当前窗口的数据计算进去,历史所有数据也需要进行回溯,存在大量重复计算,这样做效率不高,因此我们需要一套能够进行增量计算的图计算方案。 +However, this approach has a significant drawback: it involves redundant computation. For example, if we need to compute once per hour, using Spark would require not only computing the current window’s data but also backtracking all historical data, leading to a large amount of redundant computation. Therefore, we need a graph computing solution that supports incremental computation. -## 4. 动态图增量计算:GeaFlow +## 4. Incremental Dynamic Graph Computing: GeaFlow -我们知道在传统的流计算引擎中,如 Flink,其处理模型允许系统能够处理不断流入的数据事件。处理每个事件时,Flink 可以评估变化并仅针对变化的部分执行计算。这意味着在增量计算过程中,Flink 会关注最新到达的数据,而不是整个数据集。于是受到 Flink 增量计算的启发,我们自研了增量图计算系统 GeaFlow(也叫流图计算引擎),能够很好的支持增量图迭代计算。 +We know that in traditional stream computing engines like Flink, the processing model allows the system to handle continuously incoming data events. When processing each event, Flink can evaluate changes and execute computations only on the changed parts. This means that in incremental computing, Flink focuses on the latest incoming data rather than the entire dataset. Inspired by Flink’s incremental computing, we developed the incremental graph computing system GeaFlow (also known as the stream graph computing engine), which effectively supports incremental graph iterative computation. -那么 GeaFlow 是如何实现增量图计算的呢?首先,实时数据通过 connector 消息源输入的 GeaFlow 中,GeaFlow 依据实时数据,生成内部的点边结构数据,并且将点边数据插入进底图中。当前窗口的实时数据涉及到的点会被激活,触发图迭代计算。 +How does GeaFlow implement incremental graph computing? First, real-time data is input into GeaFlow through connectors. GeaFlow generates internal node-edge structure data based on the real-time data and inserts this data into the underlying graph. Nodes involved in the real-time data within the current window are activated, triggering graph iterative computation. -这里以 WCC 算法为例,对联通分量算法而言,在一个时间窗口内每条边对应的 src id 和 tar id 对应的顶点会被激活,第一次迭代需要将其 id 信息通知其邻居节点。如果邻居节点收到消息后,发现需要更新自己的信息,那么它需要继续将更新消息通知给它的邻居节点;如果说邻居节点不需要更新自己的信息,那么它就不需要通知其邻居节点,它对应的迭代终止。 +Using the WCC algorithm as an example, for the connected components algorithm, in a time window, each edge’s src id and tar id vertices are activated. In the first iteration, their id information is sent to neighboring nodes. If a neighboring node receives the message and finds that it needs to update its information, it continues to notify its neighbors; otherwise, its iteration terminates. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740471552771-36ee8f06-d58e-4cb7-914d-c44e151575a0.png) -## 5. GeaFlow 架构简析 +## 5. GeaFlow Architecture Overview -GeaFlow 引擎主要由三大主要部分组成,DSL、Framework 和 State,同时向上为用户提供了 Stream API、静态图 API 和动态图 API。DSL 主要负责图查询语言 SQL+ISO/GQL 的解析和执行计划的优化,同时负责 schema 的推导,也向外部承接了多种 Connector,比如 hive、hudi、kafka、odps 等。Framework 层负责运行时的调度和容灾,shuffle 以及框架内各个组件的管理协调。State 层负责存储底层图数据和数据的持久化,同时也负责索引、下推等众多性能优化工作。 +The GeaFlow engine consists of three main parts: DSL, Framework, and State. It also provides users with Stream API, Static Graph API, and Dynamic Graph API. The DSL layer is responsible for parsing and optimizing graph query languages like SQL+ISO/GQL, as well as schema inference. It also supports various Connectors such as Hive, Hudi, Kafka, and ODPS. The Framework layer handles runtime scheduling, fault tolerance, shuffle, and coordination of components. The State layer is responsible for storing underlying graph data and persistence, as well as performance optimizations like indexing and predicate pushdown. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1739276186744-96d40e95-4e29-48ef-8892-1b7dfa60c726.png) -## 6. GeaFlow 性能测试 +## 6. GeaFlow Performance Testing -为了验证 GeaFlow 的增量图计算性能,我们设计了这样的实验。一批数据按照固定时间窗口实时输入到计算引擎中,我们分别用 Spark 和 GeaFlow 对全图做联通分量算法计算,比较两者计算耗时。实验在 3 台 24 核内存 128G 的机器上开展,使用的数据集是公开数据集[soc-Livejournal](https://snap.stanford.edu/data/soc-LiveJournal1.html),测试的图算法是弱联通分量算法。我们以 50w 条数据作为一个计算窗口,每输入到引擎中 50w 条数据,就触发一次图计算。 +To verify GeaFlow’s incremental graph computing performance, we designed the following experiment. A batch of data is input into the computing engine in fixed time windows. We use both Spark and GeaFlow to compute the connected components algorithm on the full graph and compare the computation time. The experiment was conducted on 3 machines with 24 cores and 128G memory each. The dataset used is the public [soc-Livejournal](https://snap.stanford.edu/data/soc-LiveJournal1.html) dataset, and the graph algorithm tested is the Weakly Connected Components (WCC) algorithm. We use 500,000 data entries as a computation window, and each time 500,000 entries are input, a graph computation is triggered. -Spark 作为批处理引擎,对于每一批窗口来的数据,不管窗口规模是大是小,都需要对增量图数据连同历史图数据进行全量计算。在 Spark 上,可以直接调用 Spark GraphX 内部内置的 WCC 算法进行计算。 +As a batch processing engine, Spark must perform full computations on both incremental and historical data for each batch window, regardless of the window size. On Spark, we can directly call the built-in WCC algorithm in Spark GraphX for computation. ```scala object SparkTest { @@ -147,7 +147,7 @@ object SparkTest { } ``` -GeaFlow 上支持 SQL+ISO/GQL 的图查询语言,我们使用图查询语言调用 GeaFlow 内置的增量联通分量图算法进行测试,图查询语言代码如下: +GeaFlow supports SQL+ISO/GQL graph query languages. We used the graph query language to call GeaFlow’s built-in incremental WCC algorithm for testing. The graph query language code is as follows: ```sql CREATE TABLE IF NOT EXISTS tables ( @@ -202,29 +202,22 @@ RETURN vid, component ; ``` -下图是对两者进行联通分量算法实验时得到的实验结果。以 50w 条数据为一个窗口进行迭代计算,Spark 中存在大量的重复计算,因为其还要回溯全量的历史数据进行计算。而 GeaFlow 只会激活当前窗口中涉及到的点边进行增量计算,计算可在秒级别完成,每个窗口的计算时间基本稳定。随着数据量的不断增大,Spark 进行计算时所需要回溯的历史数据就越多,在其机器容量没有达到上限的情况下,其计算时延和数据量呈正相关分布。相同情况下 GeaFlow 的计算时间也会略微增大,但基本可以在秒级别完成。 +The figure below shows the experimental results of the connected components algorithm using both methods. With 500,000 data entries per window, Spark involves a lot of redundant computation because it must backtrack all historical data. GeaFlow, however, only activates the nodes and edges involved in the current window for incremental computation, completing each window's computation within seconds with stable performance. As data volume increases, Spark's computation delay grows proportionally due to the increasing amount of historical data to backtrack. In contrast, GeaFlow’s computation time also slightly increases but remains at the second level. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740537488877-eb89b886-7c4c-4c5a-8e27-06356b15afa0.png) -## 7. 总结 +## 7. Summary -传统的图计算方案(如 Spark GraphX)在近实时场景中存在重复计算问题,受 Flink 流处理模型和传统图计算的启发,我们给出了一套能够支持增量图计算的方案。总的来说 GeaFlow 主要有以下几个方面的优势: +Traditional graph computing solutions (e.g., Spark GraphX) have redundant computation issues in near real-time scenarios. Inspired by Flink's stream processing model and traditional graph computing, we have proposed a solution that supports incremental graph computing. Overall, GeaFlow has the following advantages: -1. GeaFlow 在处理增量实时计算时,性能优于 Spark Streaming + GraphX 方案,尤其是在大规模数据集上。 -2. GeaFlow 通过增量计算避免了全量数据的重复处理,计算效率更高,计算时间更短性能不明显下降。 -3. GeaFlow 支持 SQL+GQL 混合处理语言,更适合开发复杂的图数据处理任务。 +1. GeaFlow outperforms the Spark Streaming + GraphX solution in incremental real-time computing, especially on large-scale datasets. +2. GeaFlow avoids redundant processing of full data through incremental computation, resulting in higher efficiency and shorter computation time without significant performance degradation. +3. GeaFlow supports SQL+GQL hybrid processing languages, making it more suitable for developing complex graph data processing tasks. -GeaFlow 项目代码已全部开源,我们完成了部分流图引擎基础能力的构建,未来希望基于 GeaFlow 构建面向图数据的统一湖仓处理引擎,以解决多样化的大数据关联性分析诉求。同时我们也在积极筹备加入 Apache 基金会,丰富大数据开源生态,因此非常欢迎对图技术有浓厚兴趣同学加入社区共建。 +The GeaFlow project is fully open-sourced. We have built some of the foundational capabilities of the stream graph engine. In the future, we hope to build a unified lakehouse processing engine for graph data based on GeaFlow to meet diverse big data correlation analysis needs. We are also actively preparing to join the Apache Foundation to enrich the open-source big data ecosystem. Therefore, we warmly welcome those interested in graph technology to join our community. -社区中有诸多有趣的工作尚待完成,你可以从如下简单的「Good First Issue」开始,期待你加入同行。 +## References -- 支持 Paimon Connector 插件,连接数据湖生态。([Issue 361](https://github.com/TuGraph-family/tugraph-analytics/issues/361)) -- 优化 GQL match 语句性能。([Issue 363](https://github.com/TuGraph-family/tugraph-analytics/issues/363)) -- 新增 ISO/GQL 语法,支持 same 谓词。([Issue 368](https://github.com/TuGraph-family/tugraph-analytics/issues/368)) -- ... - -## 参考链接 - -1. GeaFlow 项目地址:[https://github.com/TuGraph-family/tugraph-analytics](https://github.com/TuGraph-family/tugraph-analytics) -2. soc-Livejournal 数据集地址:[https://snap.stanford.edu/data/soc-LiveJournal1.html](https://snap.stanford.edu/data/soc-LiveJournal1.html) +1. GeaFlow Project:[https://github.com/TuGraph-family/tugraph-analytics](https://github.com/TuGraph-family/tugraph-analytics) +2. soc-Livejournal Dataset:[https://snap.stanford.edu/data/soc-LiveJournal1.html](https://snap.stanford.edu/data/soc-LiveJournal1.html) 3. GeaFlow Issues:[https://github.com/TuGraph-family/tugraph-analytics/issues](https://github.com/TuGraph-family/tugraph-analytics/issues) diff --git a/blog/28.md b/blog/28.md index 47d1d328a..90d534682 100644 --- a/blog/28.md +++ b/blog/28.md @@ -1,49 +1,49 @@ --- -title: 流图计算之增量match原理与应用 +title: Principles and Applications of Incremental Match in Streaming Graph Computing date: 2025-6-3 --- ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/23857192/1743162676746-973d8e75-11b5-43d7-8832-724e7332b964.png) -## 问题背景 - -在流式计算中,数据往往不是全部一批到来,而会源源不断地进行输入和计算,在图计算/图查询领域,也存在类似的场景,图的点边不断地从数据源读取,进行构图,从而形成增量图。在增量图查询中,图随时发生着变化,在不同的图版本中,进行图查询的结果也会有所不同。对于某一次新增的点边,构成了一个新的版本的图,如果重新对全图(即当前所有点边)进行图遍历,开销较大,并且也会和历史数据有重复。由于历史的数据已经计算过一遍,理想情况下,只需要对增量所影响的部分进行计算/查询,而不需要对全图重新进行查询。 +## Problem Background +In streaming computing, data rarely arrives all at once but is continuously input and processed. Similarly, in graph computing/graph querying scenarios, vertices and edges are constantly read from data sources to construct graphs incrementally. In incremental graph queries, the graph evolves continuously, leading to different query results across graph versions. When new vertices/edges form an updated graph version, recomputing through the entire graph incurs high overhead and duplicates historical computations. Since historical data has already been processed, ideally only the delta-affected portions should be computed/queried without full-graph re-execution. -GQL(Graph Query Language)是国际标准化组织(ISO)为标准化图查询语言所制定的一个标准,用于在图上执行查询的语言。Geaflow 是蚂蚁图计算团队开源的流图计算引擎,专注于处理动态变化的图数据,支持大规模、高并发的实时图计算场景。本文将介绍在 Geaflow 引擎中,对增量图使用 GQL 进行增量 Match 的方法,目的尽可能地只对增量的数据进行查询,避免冗余的全量计算。 +GQL (Graph Query Language) is an international standard developed by ISO for graph query languages, used to execute queries on graphs. Geaflow is an open-source streaming graph engine by Ant Group’s graph computing team, specializing in dynamically changing graph data and supporting large-scale, high-concurrency real-time graph computing scenarios. This article introduces Geaflow’s approach to incremental GQL-based Match queries on dynamic graphs, aiming to execute queries solely on delta data while avoiding redundant full computations. ![画板](https://intranetproxy.alipay.com/skylark/lark/0/2025/jpeg/23857192/1741574572676-ff7e2c56-14d0-470c-b21d-604f928c6ec9.jpeg) -## 当前问题 - -Geaflow 引擎基于点中心框架(vertex center),通过迭代的方式,每一轮迭代中,每个点向其他点发送消息,并在下一轮收到消息时进行处理、分析。在 Geaflow 的框架中,GQL 的查询需要从前往后进行 Traversal 遍历走图,即从起始节点开始出发,进行扩散,依次进行点边匹配,直到匹配到所需要的查询 pattern。在动态图里场景,如果只使用当前批次新增的点边触发计算,增量的结果会有缺失,例如下面例子所示。 - -
-画板
- -如上问题关键在于如果只考虑增量的部分,则点 A1 无法触发计算,但是点 A1 实际包含于增量结果中。所以需要设法让点 A1 参与计算,我们考虑一种从新增点扩充子图的方法,将 a 触发。将整个查询分为 2 个阶段,Evolve 扩展阶段和 Traversal 阶段。在 Evolve 阶段中,从起始点开始,向邻居发送 EvolveMessage,后续的 iteration 中,收到 EvolveMessage 的点加入到 EvolveVertices 集合中。而后的 Traversal 阶段则会使用 EvolveVertices 里的点触发遍历,即表示当前窗口的触发点。 +## Current Challenges +The Geaflow engine adopts a vertex-centric framework, where each vertex sends messages iteratively. Vertices process received messages in subsequent iterations. For GQL queries, traversal starts from initial vertices for pattern matching (e.g., from node `A` to `B` to `C`). In dynamic graphs, if only newly added vertices/edges trigger computation, results may be incomplete, as illustrated below: -## 方案步骤 +
+画板 +
-整体流程示例图如下: +The key issue is that **Vertex A1 cannot trigger computation if only the delta is considered**, yet it belongs to the incremental results. To resolve this, we propose a subgraph expansion method from new vertices. The query is divided into two phases: +1. **Evolve Phase**: Propagate `EvolveMessage` from new vertices to neighbors, adding recipients to the `EvolveVertices` set. +2. **Traversal Phase**: Use `EvolveVertices` as traversal triggers for the current window. +## Solution Workflow +Overall process: ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/23857192/1741599519420-37fd1d9f-6623-44b3-87e4-5ac5275b876f.png) -1. 首先得到 query 的计划的迭代次数 N,需向外扩充 N-1 度(maxEvolveIteration=N-1),即可覆盖当前 query。框架的最大迭代数将设置为 N + maxEvolveIteration(N>2) +**Steps:** +1. Determine the query’s iteration count `N`. Expand `N-1` hops outward (`maxEvolveIteration = N-1`) to cover the query. The max iteration becomes `N + maxEvolveIteration` (when `N>2`). ```sql -例如 -match(a)迭代数为1,此时不需要Evolve逻辑 -match(a)-[e]->(b)迭代数为2,此时不需要Evolve逻辑 -match(a)-[e]->(b)->[e2]->(c)迭代数为3 最大迭代数5 +For example +match(a) iteration count is 1, no Evolve logic needed at this time +match(a)-[e]->(b) iteration count is 2, no Evolve logic needed at this time +match(a)-[e]->(b)->[e2]->(c) iteration count is 3, maximum iteration count is 5 ``` -2. 由于当迭代数较大时,扩充子图可能可能扩充到全图,设置一个阈值 T, 当 N<=T 才执行这个增量逻辑。 -3. 在每个 window 数据加入图中后,对于新增的点边,每个点会向邻居发送 EvolveVertexMessage,执行 N-1 次迭代,将 N-1 度子图扩充进来。即当前迭代小于 maxEvolveIteration(N-1)时,发送 EvolveVertexMessage。 -4. 每个点在向邻居点发送 EvolveMessage 时,需要将自己的 id 放在消息中,收到消息的点记录其发送点的 id, 添加到 targetIdList,在后续 traversal 阶段中使用。此步骤作用是下游节点将增量信息反向传递给上游,上游点在进行遍历时可以得知下游的增量影响部分,从而只遍历这些含有动态信息的下游点,而不需要再遍历所有邻居点。 +2. Set threshold `T`: Only execute incremental logic when `N <= T` (to avoid expanding to the full graph). +3. After new window data is added, vertices send `EvolveVertexMessage` to neighbors for `N-1` iterations. +4. When sending `EvolveMessage`, vertices include their ID. Receiving vertices store these IDs in `targetIdList` for later traversal. This propagates delta information upstream, allowing vertices to traverse only neighbors affected by changes. -反向扩展的主要逻辑在 GeaFlowDynamicVCTraversalFunction 中,GeaFlowDynamicVCTraversalFunction 继承自 IncVertexCentricFunction,在 Geaflow 中 IncVertexCentricFunction 是一个表示增量 VC 方法(点中心)的接口,在每次迭代中,都会对当前收到消息的点进行触发,执行 compute 方法中的逻辑。 +Core logic in `GeaFlowDynamicVCTraversalFunction`: ```java @Override @@ -72,20 +72,17 @@ public void compute(Object vertexId, Iterator messageIterator) { } ``` -具体示例如下: - +**Visualization:** ![画板](https://intranetproxy.alipay.com/skylark/lark/0/2024/jpeg/23857192/1734590557540-5f3f4528-fa07-4208-8425-bc514ea5e06b.jpeg) -总结进行 Evolve 扩展的条件: - -1. query 的迭代次数>2:当 match 小于两跳时不需要 Evolve。 -2. query 的迭代次数<=Threshold:如果迭代数太多可能扩展到全图。 -3. windowId>1:第一次构图不需要进行 Evolve 阶段。 -4. GQL 语句中没有起始点:如果有起始点,则只需使用起始点计算,不需要扩展子图,例如查询语句 Match(a:person where a.id = 1))return a.name。 - -## Demo 示例 +**Evolve Conditions:** +- Query iterations `>2` (no Evolve needed for ≤2 hops). +- Query iterations `≤ Threshold`. +- `windowId >1` (skip initial graph construction). +- No starting vertex filter in GQL (e.g., `Match(a:person where a.id=1)` excludes Evolve). -在 Geaflow 中,通过设置点表或边表的 windowSize 来默认实现增量逻辑,即每一批读入 windowSize 大小的点边数据,来构建增量图。 +## Demo +In Geaflow, configure incremental graphs via `windowSize` for vertex/edge tables: ```sql CREATE GRAPH modern ( @@ -159,8 +156,8 @@ INSERT INTO tbl_result ; ``` -在 Demo 中,设置点 windowSize 为 20,边 windowSize 为 3,即构图时每个 window 导入 20 个点,3 条边。并执行 3 跳的查询语句。**示例 Demo 在 IncrMatchTest.java 中, 可直接运行执行 Demo。** +In this demo, vertex window size is 20, and edge window size is 3, meaning each window loads 20 vertices and 3 edges. A 3-hop query is executed. The demo is available in IncrMatchTest.java and can be run directly. -## 总结和展望 +## Conclusion and Outlook -在动态图/流图的场景中,图的点边是在实时变化的,在进行图查询时,对于不同窗口数据的图,我们往往可以根据一些历史信息,只对增量的部分触发计算,来进行增量地计算,避免触发全图的遍历。Geaflow 使用了一种基于子图扩展的增量 match 方法,应用于点中心分布式图计算框架,在动态图场景下进行增量的查询,未来期望实现更多更复杂场景下的增量匹配逻辑。 +In dynamic/streaming graph scenarios, graph nodes and edges change in real time. When querying such graphs, we can often trigger computation only on the incremental part using historical information, avoiding full graph traversal. Geaflow uses a subgraph expansion-based incremental match method, applied within a vertex-centric distributed graph computing framework, to support incremental querying in dynamic graph scenarios. In the future, we aim to implement more complex incremental matching logic for advanced use cases. \ No newline at end of file diff --git a/blog/29.md b/blog/29.md index c311f5a62..b537fceaa 100644 --- a/blog/29.md +++ b/blog/29.md @@ -1,488 +1,211 @@ --- -title: GeaFlow 时序能力探秘——让时间数据焕发新生! +title: "Exploring GeaFlow's Temporal Capabilities — Breathing New Life into Time-Series Data!" date: 2025-6-25 --- -## 为什么时序能力如此重要? +## Why Are Temporal Capabilities So Crucial? -** -**    在当今数字化时代,数据已经成为驱动决策和创新的核心资源。然而,数据不仅仅是静态的数字或关系,它会随着时间不断变化。无论是股票市场的实时波动、社交网络中的动态互动,还是物联网设备的状态更新,时间维度都是理解这些数据的关键,例如: +In today's digital era, data has become a core resource driving decisions and innovation. However, data is not just static numbers or relationships—it constantly evolves over time. Whether tracking real-time fluctuations in stock markets, dynamic interactions in social networks, or status updates from IoT devices, the temporal dimension is key to understanding this data. For example: -- 在金融领域,交易的时间顺序决定了资金流动的方向。 -- 在社交网络中,用户的互动行为(如点赞、评论)随时间演变。 -- 在物联网中,传感器采集的数据带有时间戳,反映了设备状态的变化。 +- In finance, the sequence of transactions determines the direction of capital flow. +- In social networks, user interactions (likes, comments) evolve over time. +- In IoT, timestamped sensor data reflects changes in device status. - + -    尽管数据的重要性毋庸置疑,但传统的图数据分析工具往往难以应对动态数据的挑战 +    Despite data's undeniable importance, traditional graph analytics tools often struggle with dynamic data challenges: -- **静态分析的局限性** +- **Limitations of Static Analysis** +     Static analysis captures only a snapshot of data at a single moment, failing to reflect trends. For instance, in device monitoring, it may overlook gradual transitions from normal to faulty states. -    静态分析只能捕捉某一时刻的数据快照,无法反映数据的变化趋势。例如,在监控设备状态时,静态分析可能忽略设备从正常到故障的渐变过程。**** +- **Inefficient Processing** +     Traditional tools are inefficient when handling large-scale temporal data and may not meet real-time requirements. In financial risk control, delays can mean missing critical signals. -- **处理效率低下** +- **Lack of Flexibility** +     Many tools support only one type of analysis and cannot concurrently process real-time streams and historical data. -    传统工具在处理大规模时序数据时效率低下,甚至无法满足实时需求。例如,在金融风控场景中,延迟可能导致错过关键的风险信号。 +    To address these issues, GeaFlow innovatively introduces temporal graph computing. As a distributed stream-graph engine designed for dynamic data, GeaFlow efficiently tackles challenges posed by evolving datasets. For dynamically changing graph structures, users can seamlessly perform operations like graph traversal, pattern matching, and computations—meeting complex analytical needs. By integrating temporal dimensions with dynamic graph processing, GeaFlow offers a groundbreaking solution for real-time analytics, empowering users to extract deeper value from dynamic data. -- **缺乏灵活性** +## What Is GeaFlow? -    很多工具只支持单一类型的数据分析,无法同时处理实时流数据和历史数据。 +GeaFlow is a powerful distributed computing platform that combines graph computing and stream processing to handle dynamic graphs and temporal data efficiently. It supports complex graph algorithms and real-time analytics, making it ideal for dynamic scenarios. Key features include: -    为了解决上述问题,GeaFlow 创新性地提出了时序图计算的概念。作为一款专为动态图数据处理设计的分布式流图计算引擎,GeaFlow 能够高效应对动态数据带来的挑战。针对实时变化的图结构,用户可以轻松进行图遍历、图匹配和图计算等操作,从而满足复杂场景下的分析需求。通过结合时间维度与动态图处理能力,GeaFlow 为实时数据分析提供了全新的解决方案,帮助用户更精准地挖掘动态数据中的价值。 +- **Distributed Architecture** + GeaFlow's framework processes ultra-large dynamic graphs (e.g., billions of nodes/edges) with high availability and scalability via partitioning and replication. -## 什么是 GeaFlow? +- **Seamless Integration of Stream Graphs and Temporal Graphs** + Stream graphs enable real-time updates for dynamic data, while temporal graphs add precise timestamping. Their synergy supports simultaneous real-time analysis and historical tracing. -GeaFlow 是一个强大的分布式计算平台,结合了图计算和流处理的优势,能够高效处理动态图和时序数据。它不仅支持复杂的图算法,还具备实时分析能力,适用于各种动态场景。其主要特点包括: +- **Flexible Time-Window Mechanism** + Users can configure sliding or tumbling windows to analyze data trends over specific time ranges. -- 分布式架构 +## How Do Stream Graphs and Temporal Graphs Relate? -GeaFlow 基于分布式计算框架,能够高效处理超大规模的动态图数据(例如数十亿节点和边)。通过分区和副本机制,GeaFlow 确保了系统的高可用性和可扩展性。 +### **1. Stream Graph** +A specialized graph structure representing evolving data dynamics. Core features: +- **Dynamic Update Mechanism** + Supports real-time CRUD operations on nodes/edges. E.g., new edges form for fund flows in financial networks, while obsolete edges disappear. +- **Event-Driven Model** + Treats each data unit (node/edge) as an event to efficiently capture changes. +- **Incremental Computation** + Computes only new/modified parts instead of reprocessing entire graphs. E.g., updating friend relationships in social networks without recalculating the entire graph. -- 流图与时序图的无缝集成 +### **2. Temporal Graph** +A timestamp-augmented graph where each edge/node records event timing. Core features: +- **Timestamp Management** + Assigns timestamps to all data. E.g., friendship formation times in social networks. +- **Time-Window Analysis** + Supports sliding windows (e.g., last 5 minutes) to track trends. +- **Historical Traceability** + Retains historical timestamps for retrospective analysis. E.g., auditing past anomalous transactions in risk control. -流图提供了动态数据的实时更新能力,而时序图则引入了时间维度的精确记录能力。两者的结合使得 GeaFlow 能够同时支持实时分析和历史追溯。 +### **3. Synergy Between Stream and Temporal Graphs** +They complement each other: +- **Stream Graphs as the Foundation** + Stream graphs handle real-time updates; temporal graphs add time-based recording. +- **Temporal Graphs Enhance Stream Analysis** + Timestamps enable complex operations like trend prediction and window-based analytics. -- 灵活的时间窗口机制 +### **4. GeaFlow’s Implementation** +GeaFlow unifies stream and temporal graphs through: +- **Timestamp Assignment** + Assigns *processing time* or *event time* to all data. +- **Dynamic Updates & Historical Retention** + Updates graphs in real-time while preserving historical timestamps in distributed storage. +- **Time-Window Optimization** + Uses indexing and caching (e.g., sliding-window indexes) to accelerate time-range queries. -GeaFlow 支持基于时间窗口的动态分析,用户可以根据需求设置滑动窗口或固定窗口,分析特定时间段内的数据变化趋势。 +## Use Case: Tracking Indirect Social Relationships -## 流图与时序图的关系? +As social platforms grow, analyzing dynamic user interactions in real-time becomes critical for recommendations and risk detection (e.g., fake accounts). -### **1. 流图(Stream Graph)** - -流图是一种特殊的图结构,用于表示动态数据的演化过程。其核心特性包括: - -- **动态更新机制** - -流图支持节点和边的动态增删改操作,能够实时反映数据的变化。例如,在金融交易网络中,资金流动会生成新的边,而交易完成后某些边可能会消失。**** - -- **事件驱动模型** - -流图采用事件驱动模型,每条数据(节点或边)都被视为一个事件。通过事件驱动的方式,流图能够高效捕捉数据的变化。 - -- **增量计算** - -为了提高计算效率,流图采用了增量计算策略。即每次只计算新增或修改的部分,而不是重新计算整个图结构。例如,在社交网络中,当用户建立新的好友关系时,GeaFlow 只需更新相关部分,而无需重新计算整个网络。 - -### **2. 时序图(Temporal Graph)** - -时序图是一种带时间属性的图结构,每条边或节点都带有时间戳,用于记录事件发生的时间。其核心特性包括: - -- **时间戳管理** - -每条数据(节点或边)都分配一个时间戳,确保所有操作都能精确记录时间信息。例如,在社交网络中,好友关系的建立时间可以用一条带时间戳的边表示。**** - -- **时间窗口分析** - -时序图支持基于时间窗口的分析功能。例如,用户可以设置一个滑动窗口(如最近 5 分钟),并分析窗口内的数据变化趋势。**** - -- **历史追溯能力** - -时序图保留了历史数据的时间戳信息,支持回溯历史数据。例如,在金融风控场景中,用户可以通过时序图分析过去一段时间内的异常交易行为。 - -### **3. 流图与时序图的关系** - -流图和时序图并不是相互独立的概念,而是相辅相成的: - -- **流图是时序图的基础** - -流图提供了动态数据的实时更新能力,而时序图则在此基础上增加了时间维度的记录能力。换句话说,流图关注的是数据的实时变化,而时序图关注的是这些变化的时间属性。 - -- **时序图增强了流图的分析能力** - -通过引入时间戳,时序图使得流图能够进行更复杂的分析,例如时间窗口分析、趋势预测等。 - -### **4. GeaFlow 的实现细节** - -GeaFlow 通过以下技术手段实现了流图与时序图的无缝结合: - -- **时间戳分配机制** - -GeaFlow 为每条数据(节点或边)分配具体时间戳, 具体分为两种:处理时间和事件时间,确保所有数据都能精确记录时间信息。**** - -- **动态更新与历史保留** - -GeaFlow 支持实时更新流图结构,同时保留历史数据的时间戳信息,方便后续分析。例如,在金融交易网络中,GeaFlow 会记录每笔交易的时间戳,并将其存储在分布式存储系统中。**** - -- **时间窗口优化** - -GeaFlow 采用高效的索引机制和缓存策略,优化时间窗口分析的性能。例如,通过滑动窗口索引,GeaFlow 能够快速定位特定时间段内的数据。 - -## 示例 - -随着社交媒体平台的快速发展,用户之间的互动和关系链变得越来越复杂。为了更好地理解用户行为、优化推荐系统以及识别潜在的风险(如虚假账号或恶意传播),我们需要对用户之间的动态关系进行实时分析。 - -假设某社交平台希望实现一个功能:实时追踪用户的“间接好友关系”,即分析用户 A 是否通过某个共同好友 B 认识了另一个用户 C,并确保这种认识关系的时间顺序是合理的(A 先认识 B,B 再认识 C)。这一功能可以帮助平台发现潜在的社交圈层,优化好友推荐算法,同时为风险控制提供数据支持。 +**Scenario**: A platform tracks "indirect friendships"—e.g., whether user A met user C via user B, with strict time validation (A→B before B→C). This optimizes recommendations and identifies risks. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/220029/1749448299226-d23a5d01-5a5c-4cbb-bd99-f1e476f808be.png) - -具体需求 - -**1、实时性要求** - -用户的行为(如添加好友)是动态变化的,需要实时捕获并更新用户关系图。 - -**2、时间敏感性** - -好友关系的建立是有时间顺序的,例如用户 A 在 10:00 添加了用户 B 为好友,而用户 B 在 10:05 添加了用户 C 为好友。只有在这种情况下,我们才能认为 A 通过 B 间接认识了 C。 - -**3、高效查询** - -平台需要快速查询出所有符合条件的三元关系(A -> B -> C),并将结果存储到文件系统中,供后续分析或可视化使用。 - -**4、扩展性** - -系统需要能够处理大规模用户数据,并支持未来的扩展需求,例如引入更多维度的关系权重(如亲密度、互动频率等)。 - -下面是完整的 DSL 示例: - -```plain -CREATETABLE vertex_source ( - id long, - name varchar, - age int -) WITH ( - type='kafka', - geaflow.dsl.kafka.servers ='localhost:9092', - geaflow.dsl.kafka.topic ='vertex_source', - geaflow.dsl.kafka.data.operation.timeout.seconds =5, - geaflow.dsl.time.window.size=10, - geaflow.dsl.start.time='${startTime}' -); - -CREATETABLE edge_source ( - src_id long, - tar_id long, - weight double, - ts long --knowing time -) WITH ( - type='kafka', - geaflow.dsl.kafka.servers ='localhost:9092', - geaflow.dsl.kafka.topic ='edge_source', - geaflow.dsl.kafka.data.operation.timeout.seconds =5, - geaflow.dsl.time.window.size=10, - geaflow.dsl.start.time='${startTime}' -); - -CREATE GRAPH community ( - Vertex person ( - id bigint ID, - name varchar, - age int - ), - Edge knows ( - src_id bigint SOURCE ID, - tar_id bigint DESTINATION ID, - weight double, - ts long TIMESTAMP--定义时间戳字段 - ) -) WITH ( - storeType='rocksdb' -); - -INSERTINTO community.person -SELECT id, name, age -FROM vertex_source; - -INSERTINTO community.knows -SELECT src_id, tar_id, weight, ts -FROM edge_source; - -CREATETABLE tbl_result ( - a_id long, - e1_ts long, - b_id long, - e2_ts long, - c_id long -) WITH ( - type='file', - geaflow.dsl.file.path='${target}' -); - -USE GRAPH community; - -INSERTINTO tbl_result -SELECT - a_id, - e1_ts, - b_id, - e2_ts, - c_id -FROM ( -MATCH (a:person)-[e1:knows]->(b:person)-[e2:knows]-> (c:person) -where e2.ts > e1.ts -RETURN a.id as a_id, e1.ts as e1_ts, b.id as b_id, e2.ts as e2_ts, c.id as c_id -); -``` - - - -上述 DSL(Domain-Specific Language)代码定义了一个基于图计算的流处理任务,主要目的是通过 Kafka 实时接收用户节点和关系边的数据流,构建一个动态社区图(`community`),并分析其中的时间敏感关系(如“谁先认识谁”)。最终结果将输出到文件系统中,用于进一步分析或可视化。 - -以下是对每个部分的详细解释: +**Requirements**: +1. **Real-Time Processing**: Capture friend-add events instantly. +2. **Time Sensitivity**: Validate sequence (e.g., A adds B at 10:00; B adds C at 10:05). +3. **Efficient Queries**: Rapidly identify valid triads (A→B→C) and export results. +4. **Scalability**: Handle massive user data and future expansions (e.g., adding relationship weights). - - -### **1. 点源表定义** - -```plain -CREATETABLE vertex_source ( +**DSL Implementation**: +```sql +CREATE TABLE vertex_source ( id long, name varchar, age int ) WITH ( type='kafka', - geaflow.dsl.kafka.servers ='localhost:9092', - geaflow.dsl.kafka.topic ='vertex_source', - geaflow.dsl.kafka.data.operation.timeout.seconds =5, - geaflow.dsl.time.window.size=10, - geaflow.dsl.start.time='${startTime}' + geaflow.dsl.kafka.servers='localhost:9092', + geaflow.dsl.kafka.topic='vertex_source', + geaflow.dsl.time.window.size=10 ); -``` - -- **功能** - - - 定义了一个名为vertex_source的表,表示点数据的来源。 - \_ 数据通过 Kafka 消费,主题为 vertex_source \* 每条记录包含三个字段:id(节点唯一标识符)、name(节点名称)、age(节点年龄)。**** - -- **时间窗口:** - - 使用了滑动窗口机制,窗口大小为 10 秒(geaflow.dsl.time.window.size=10)。 - - 数据流按时间窗口分批处理,窗口内的数据会被用于后续的图构建和计算。**** -- **启动时间:** \* ${startTime}是一个占位符,表示流处理任务的起始时间。 - -### **2. 边源表定义** - -```plain CREATE TABLE edge_source ( src_id long, tar_id long, weight double, - ts long + ts long -- Timestamp of relationship ) WITH ( type='kafka', - geaflow.dsl.kafka.servers = 'localhost:9092', - geaflow.dsl.kafka.topic = 'edge_source', - geaflow.dsl.kafka.data.operation.timeout.seconds = 5, - geaflow.dsl.time.window.size=10, -- 滑动窗口大小 - geaflow.dsl.start.time='${startTime}' + geaflow.dsl.kafka.servers='localhost:9092', + geaflow.dsl.kafka.topic='edge_source', + geaflow.dsl.time.window.size=10 ); -``` - -- **功能:** - - 定义了一个名为 edge_source的表,表示边数据的来源。 - - 数据通过 Kafka 消费,主题为 edge_source - - 每条记录包含四个字段: - src_idtar_id:分别表示边的起点和终点;weight:边的权重;ts:边的时间戳,表示关系建立的时间。 -- **时间窗口:** - - 同样使用 10 秒的滑动窗口机制。 - -### **3. 图 Schema 定义** -```plain CREATE GRAPH community ( - Vertex person ( - id bigint ID, - name varchar, - age int - ), + Vertex person (id bigint ID, name varchar, age int), Edge knows ( - src_id bigint SOURCE ID, - tar_id bigint DESTINATION ID, - weight double, - ts long TIMESTAMP-- 定义时间戳字段 + src_id bigint SOURCE ID, + tar_id bigint DESTINATION ID, + weight double, + ts long TIMESTAMP -- Timestamp field ) -) WITH ( - storeType='rocksdb' -); -``` - -- **功能:** - - 定义了一个名为community的图结构。 - - 图包含两种元素: - 1. **点类型 **person - - 每个点有三个属性:id(唯一标识符)、name(名称)、age(年龄)。 - 2. **边类型 **knows - - 每条边有四个属性:src_idtar_id:分别表示边的起点和终点;weight:边的权重;ts:边的时间戳,标记关系建立的时间。 -- **存储方式:** - - 图数据存储在 RocksDB 中(storeType='rocksdb')。 - -### **4. 插入点数据到图** - -```plain - -INSERTINTO community.person -SELECT id, name, age -FROM vertex_source; -``` - -- **功能:** - - vertex_source表中的点数据插入到图 communityperson点集合中。 - - 每条记录对应一个person节点。 +) WITH (storeType='rocksdb'); -### **5. 插入边数据到图** +INSERT INTO community.person +SELECT id, name, age FROM vertex_source; -```plain - -INSERTINTO community.knows -SELECT src_id, tar_id, weight, ts -FROM edge_source; -``` +INSERT INTO community.knows +SELECT src_id, tar_id, weight, ts FROM edge_source; -- **功能:** - - edge_source表中的边数据插入到图 communityknows边集合中。 - - 每条记录对应一条 knows边。 - -### **6. 结果表定义** - -```plain CREATE TABLE tbl_result ( - a_id long, - e1_ts long, - b_id long, - e2_ts long, - c_id long -) WITH ( - type='file', - geaflow.dsl.file.path='${target}' -); -``` - -- **功能:** - - 定义了一个名为 tbl_result 的结果表,用于存储最终的查询结果。 - - 结果表包含五个字段:a_id:路径起点节点的 ID;e1_ts:第一条边的时间戳;b_id:路径中间节点的 ID;e2_ts:第二条边的时间戳;c_id:路径终点节点的 ID. - - **存储方式:** - - 结果会写入文件系统,路径由 ${target} 指定。 + a_id long, + e1_ts long, + b_id long, + e2_ts long, + c_id long +) WITH (type='file', geaflow.dsl.file.path='${target}'); -### **7. 图查询与结果插入** - -```plain USE GRAPH community; INSERT INTO tbl_result -SELECT - a_id, - e1_ts, - b_id, - e2_ts, - c_id +SELECT a_id, e1_ts, b_id, e2_ts, c_id FROM ( - MATCH (a:person) -[e1:knows]->(b:person) -[e2:knows]-> (c:person) - WHERE e2.ts > e1.ts - RETURN a.id as a_id, e1.ts as e1_ts, b.id as b_id, e2.ts as e2_ts, c.id as c_id + MATCH (a:person)-[e1:knows]->(b:person)-[e2:knows]->(c:person) + WHERE e2.ts > e1.ts + RETURN a.id AS a_id, e1.ts AS e1_ts, + b.id AS b_id, e2.ts AS e2_ts, c.id AS c_id ); ``` - - -- **功能:** - - 在图 community 上执行一个图查询。 - - 查询的目标是找到所有满足以下条件的三元组 (a, b, c) - 1. 存在一条路径 a -> b -> c,其中每条边的类型都是 knows - 2. 第二条边 e2 的时间戳晚于第一条边 e1 的时间戳(e2.ts > e1.ts)。 - - 返回的结果包括: - - 起点节点 a 的 ID。 - - 第一条边 e1 的时间戳。 + 中间节点 b 的 ID。 - - 第二条边 e2 的时间戳。 - - 终点节点 c 的 ID。 -- **结果存储:** - - 查询结果被插入到 tbl_result 表中,并最终写入文件系统。 - -### **8. 运行示例** - -假设社交平台中有以下用户和好友关系: - -- **用户信息:** - -```plain -{id: 1, name: "Alice", age: 25} -{id: 2, name: "Bob", age: 30} -{id: 3, name: "Charlie", age: 28} -``` - -- **好友关系:** +**Workflow Explanation**: +1. **Vertex Source**: Kafka-consumed user data (ID, name, age) with 10s sliding windows. +2. **Edge Source**: Kafka-consumed relationships (source/target IDs, weight, timestamp) in 10s windows. +3. **Graph Schema**: Defines `person` vertices and `knows` edges with timestamps. +4. **Data Insertion**: Loads vertices/edges into the `community` graph. +5. **Query**: Finds triads `A→B→C` where `B→C` occurs after `A→B`. +6. **Result Export**: Writes valid triads (A_ID, B_ID, C_ID, timestamps) to files. +**Output Example**: ```plain -{src_id: 1, tar_id: 2, weight: 0.8, ts: 1672531200} -- Alice 在 10:00 添加 Bob 为好友 -{src_id: 2, tar_id: 3, weight: 0.9, ts: 1672531210} -- Bob 在 10:05 添加 Charlie 为好友 +a_id | e1_ts | b_id | e2_ts | c_id +1 | 1672531200 | 2 | 1672531210 | 3 -- Alice (1) met Charlie (3) via Bob (2) ``` -运行上述作业后,系统会输出以下结果: - -```plain -a_id | e1_ts | b_id | e2_ts | c_id -1 | 1672531200 | 2 | 1672531210 | 3 -``` - -这表明 Alice 先通过 Bob 认识了 Charlie。 - -### **9. 业务价值** +**Business Value**: +1. **Enhanced Recommendations**: Suggest potential friends (e.g., recommend Charlie to Alice). +2. **Community Detection**: Identify tight-knit groups for targeted ads/events. +3. **Risk Control**: Flag suspicious triads (e.g., rapid fake-account connections). +4. **User Experience**: Personalize services via real-time relationship analysis. -1. **优化好友推荐** - 通过分析间接好友关系,平台可以向用户推荐更有可能成为好友的潜在对象。例如,Alice 可能会对 Charlie 感兴趣,因为他们有一个共同好友 Bob。 +**Technical Edge**: +- **Real-Time**: Millisecond processing for up-to-date graphs. +- **Time-Aware**: Timestamps enforce chronological validity. +- **Flexible**: SQL-like syntax lowers development barriers. +- **Scalable**: Handles massive dynamic graphs via incremental computation. -2. **识别社交圈层** - 通过挖掘三元关系,平台可以识别出紧密联系的社交圈层,从而为广告投放、活动推广等提供精准的目标群体。**** +## Core Highlights of GeaFlow’s Temporal Capabilities -3. **风险控制** - 如果某些用户频繁出现在异常的三元关系中(例如短时间内大量新增好友),可能暗示存在虚假账号或恶意传播行为,平台可以及时采取措施。 -4. **用户体验提升** - 实时分析用户关系链,帮助平台更好地理解用户行为,从而提供更加个性化的服务。**** +### 1. Time-Aware Data Processing +Timestamps enable precision. GeaFlow supports: +- **5-Minute Trend Analysis**: Track real-time interaction frequency shifts. +- **24-Hour Dynamic Patterns**: Identify long-term trends (e.g., user purchase behavior). -### **10. 技术优势** +### 2. Dynamic Graph + Temporal Fusion +Captures relationship evolution: +- **Social Dynamics**: Map changing friend networks over time. +- **Financial Flows**: Trace real-time capital movements and risks. -- **实时性**GeaFlow 支持毫秒级的数据流处理,确保用户关系图始终是最新的。 -- **时间敏感性:**通过时间戳字段,精确管理好友关系的时间顺序。 -- **灵活性:**SQL 驱动的开发模式,降低了开发门槛,提升了开发效率。 -- **可拓展性:**支持大规模动态图的增量计算,能够轻松应对社交平台的海量用户数据。 +### 3. Real-Time + Historical Data Fusion +Unifies streaming and stored data: +- **IoT Monitoring**: Predict device failures by correlating live feeds with history. +- **Risk Control**: Detect anomalies via real-time/historical transaction cross-analysis. -## GeaFlow 时序能力的核心亮点 +### 4. Rich Built-in Algorithms +Optimized temporal algorithms: +- Shortest Path +- Weakly Connected Components +- k-Hop Neighborhood -### **1. 时间感知的数据处理** +## Conclusion: Start Your Temporal Data Journey -每条数据都带有时间戳,能够精确记录事件发生的时间。GeaFlow 支持基于时间窗口的分析,例如: +Dynamic data holds immense value, and GeaFlow’s temporal capabilities unlock it. Whether you’re a novice or an expert, GeaFlow empowers you to harness time-series data. -- **最近 5 分钟的趋势变化** - 用户可以通过设置时间窗口,分析最近 5 分钟内的数据变化趋势。例如,在社交网络中,分析用户互动的频率变化。**** +**Download GeaFlow today and explore the power of temporal analytics!** -- **过去一天的动态模式** - GeaFlow 支持长时间跨度的分析,帮助用户发现长期趋势。例如,在电商推荐系统中,分析用户在过去一天内的购买行为。 - -### **2. 动态图与时序结合** - -GeaFlow 将图结构与时间维度结合,能够捕捉图中关系的演变。例如: - -- **社交网络中好友关系的变化** - -在社交网络中,用户的好友关系可能会随着时间发生变化。GeaFlow 可以动态更新图结构,捕捉这些变化。**** - **金融交易网络中的资金流动** -在金融交易网络中,资金流动是一个动态过程。GeaFlow 可以实时追踪资金流动路径,并识别潜在的风险点。 - -### **3. 实时与历史数据的无缝融合** - -GeaFlow 不仅支持实时流数据的处理,还能结合历史数据进行对比分析。这种能力特别适合需要长期趋势分析和短期实时监控的场景。例如: - -- **物联网设备监控** - -在物联网场景中,GeaFlow 可以实时监控设备状态,同时结合历史数据,预测设备可能出现的故障。**** - **金融风控** -在金融风控场景中,GeaFlow 可以实时监控交易网络,同时结合历史数据,识别异常行为或潜在风险。 - -### **4. 丰富的内置算法** - -GeaFlow 提供针对时序数据优化的算法,例如: - -- 最短路径 -- 弱联通分量 -- k-hop 算法 - -用户无需从零开发,直接调用即可完成复杂分析。 - -## 结语:开启你的时序数据分析之旅 - -数据的动态变化蕴藏着无限价值,而 GeaFlow 的时序能力正是解锁这一价值的钥匙。无论您是数据分析新手,还是希望提升动态数据处理能力的专业人士,GeaFlow 都将为您提供强大的支持。 - -立即下载 GeaFlow,亲身体验其时序能力的强大之处吧!让我们一起探索时间数据的无限可能! - -## 术语**** +--- -**DSL: **Domain-Specific Language。融合 DSL 是 GeaFlow 提供的图表一体的数据分析语言,支持标准 SQL+ISO/GQL 进行图表分析.通过融合 DSL 可以对表数据做关系运算处理,也可以对图数据做图匹配和图算法计算,同时也支持同时图表数据的联合处理。 +## Terminology +**DSL**: Domain-Specific Language. GeaFlow’s unified DSL integrates SQL and ISO/GQL for relational and graph analysis (pattern matching, algorithms), supporting hybrid table/graph operations. \ No newline at end of file diff --git a/blog/30.md b/blog/30.md index 8c74cbba1..38df28be8 100644 --- a/blog/30.md +++ b/blog/30.md @@ -1,119 +1,123 @@ --- -title: Join性能变革:图数仓让SQL分析快人一步 +title: "Join Performance Revolution: Graph Data Warehouse Makes SQL Analysis Faster Than Ever" date: 2025-5-15 --- ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/8237/1741854036431-9d80b215-73fc-4838-bdda-905d59ebf08e.png) -> 作者:林力韬 +> Author: Lin Litao -## 一、引言:传统数仓分析的困境与破局之道 +## 1. Introduction: The Dilemma and Breakthrough in Traditional Data Warehouses -### 1. 场景化问题:当数据关联成为业务之痛 +### 1. Contextual Problem: When Data Association Becomes a Business Pain Point -- **金融反欺诈场景**:在反欺诈分析中,复杂的多层资金链条挖掘往往依赖多表 JOIN 操作,进行复杂多跳的追踪。分析师团队耗费数天编写 SQL 脚本,最终查询耗时可达小时级别——而此时资金已完成洗白转移。这揭示出传统数仓的深层矛盾:**关系型范式与真实世界网状业务逻辑的错位**,常面临查询耗时高、查询逻辑复杂等挑战。 -- **营销分析场景**:在分析营销业务关系时,试图通过用户社交关系链挖掘潜在 VIP 客户,往往要用到专业的数分技能。尽管当下借助诸如 DeepInsight AI Copilot 等工具,可以通过大模型快速生成至少能打 80 分的维度和度量,集成到自助分析面板。但通常这些分析都涉及深层次的用户关联,**在 SQL 中直观表达性能较差**。 +- **Financial Anti-Fraud Scenario**: In anti-fraud analysis, complex multi-layered fund chain mining often relies on multi-table JOIN operations for intricate multi-hop tracking. Analyst teams spend days writing SQL scripts, and the final query can take hours — by which time the funds have already been laundered. This reveals a deep contradiction in traditional data warehouses: **the misalignment between the relational paradigm and real-world networked business logic**, often leading to high query latency and complex logic. +- **Marketing Analysis Scenario**: In analyzing marketing business relationships, identifying potential VIP customers through social connection chains requires advanced data analysis skills. Although tools like DeepInsight AI Copilot now allow users to quickly generate dimensions and metrics with 80% accuracy via large models and integrate them into self-service dashboards, these analyses often involve deep user associations, **which perform poorly when expressed intuitively in SQL**. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/jpeg/67556465/1741674750798-f519cba9-d8ae-47d4-aec0-97c2ef31a759.jpeg) -**图 1 SQL Join 与 GQL 图 hop 查询性能差异示例** +**Figure 1: Performance Difference Between SQL Join and GQL Graph Hop Queries** -### 2. 数据枷锁 +### 2. Data Constraints -**效率枷锁**:当关联层级超过 3 跳,传统 JOIN 操作的时间复杂度呈指数级增长,以多表 JOIN 为核心的分析模式逐渐失去优势,成为效率的"枷锁"。 +**Efficiency Constraint**: When association levels exceed 3 hops, the time complexity of traditional JOIN operations grows exponentially. Analytical models centered around multi-table JOINs gradually lose their advantage and become a "shackle" to efficiency. -**表达力枷锁**:传统 SQL 不仅需要编写复杂的表达式,更面临关系模型难以直观表达的图拓扑结构。 +**Expressiveness Constraint**: Traditional SQL not only requires complex expressions but also struggles to intuitively represent graph topologies using the relational model. -**创新枷锁**:业务分析师因需要学习 GQL(图查询语言)而放弃采用图技术栈。工具链的割裂导致图分析能力始终停留在技术部门,难以赋能业务前线。 +**Innovation Constraint**: Business analysts often abandon graph technology stacks due to the need to learn GQL (Graph Query Language). The fragmented toolchain keeps graph analytics confined to technical departments, failing to empower front-line business teams. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741674805947-d91bf10a-02eb-427c-acea-3cb96094f164.png?x-oss-process=image/format,png) -**图 2 Join 与 GQL 表达示例** +**Figure 2: JOIN vs GQL Expression Examples** -### 3. 破局之道:图数据仓库的核心价值 +### 3. Breakthrough Strategy: Core Value of the Graph Data Warehouse -#### (1) 降低认知成本 +#### (1) Lower Cognitive Costs -用户无需感知图数据库的专业知识,通过 SQL 操作就能实现复杂的图关联分析,底层嫁接到图引擎底座。 +Users don’t need to understand graph database expertise. Complex graph association analysis can be performed through SQL operations, with the underlying system connected to a graph engine. -#### (2) 加速数据价值升维释放 +#### (2) Accelerate the Dimensional Upgrade of Data Value -在支持传统 SQL 分析基础上,图数据仓库通过内置的算法仓库,将 PageRank、Louvain 等图算法封装为可解释的业务指标,支持分析隐藏的复杂模式(例如资金流的闭环路径识别)。同时,关联关系能够即时以图结构可视化呈现,摆脱传统数仓中基于表关联的抽象性,扩大了系统分析能力边界。 +Building on traditional SQL analysis, the graph data warehouse encapsulates graph algorithms like PageRank and Louvain as interpretable business metrics via a built-in algorithm repository. This enables the analysis of hidden complex patterns (e.g., closed-loop fund paths). Additionally, relationships can be instantly visualized as graph structures, moving away from the abstract nature of table-based associations in traditional warehouses, thereby expanding analytical boundaries. -#### (3) 突破性能瓶颈 +#### (3) Break Through Performance Bottlenecks -多表 JOIN 查询转为图路径检索,利用图引擎关联性分析优势,性能可从分钟级跃升至秒级,单点分析进入毫秒级。 支持动态图数据的实时更新,与传统批量处理模式(T+1)的滞后性形成鲜明对比。 +Multi-table JOIN queries are transformed into graph path retrievals, leveraging the graph engine’s strength in relationship analysis. Performance can jump from minutes to seconds, with single-point analysis entering the millisecond level. Real-time updates of dynamic graph data provide a stark contrast to the lag of traditional batch processing models (T+1). -## 二、技术解析:图数仓的核心技术革命 +## 2. Technical Deep Dive: Core Technological Revolution of the Graph Data Warehouse -### 1. Schema 转换器(ER → Graph) +### 1. Schema Converter (ER → Graph) -对于大多数非专业用户而言,由于图领域知识缺乏、不熟悉图建模的思维方式等原因,导致利用图计算系统解决业务问题、分析需求存在较大挑战。在业务推广中,我们发现利用将表的 ER 模型描述自动转化为图模型建模,提供给用户一个初始的图,有助于用户快速上手。 +For most non-expert users, lack of graph domain knowledge and unfamiliarity with graph modeling mindsets present significant challenges in solving business problems using graph computing systems. In business promotion, we found that automatically converting ER model descriptions of tables into graph models and providing users with an initial graph helps them get started quickly. -图数仓 Schema 转换器自动将传统数据仓库中的 ER 模型(实体-关系模型)转换为图数据库的节点与边结构,支持对物理表、视图表、维度表进行统一建模。在原理上,图的实体可以理解为关系表选定一组列序列作为 ID 生成的 KV 表。在 ER 图解析时,具有等值关系的列可以视为同一个等价列,并将等值关系传递到不同表的等价列上。 +The Graph Data Warehouse Schema Converter automatically transforms the ER model (Entity-Relationship model) in traditional data warehouses into node and edge structures in a graph database, supporting unified modeling of physical tables, view tables, and dimension tables. Conceptually, a graph entity can be seen as a KV table generated from a selected column sequence in a relational table as the ID. During ER graph parsing, columns with equivalent values are treated as the same equivalence class, and the equivalence relationship is propagated across equivalent columns in different tables. -从而,可以将模型转换算法总结为三阶段: +Thus, the model conversion algorithm can be summarized in three stages: -**第一阶段,语义分析。**重点在于选取实体多列序列作为 ID 组成,识别表的实体/关系语义,发现跨表等价列(具有等值关系的列),融合支持表达式列处理。需要在所有可能的解法中,综合考虑存储性能、计算性能、可解释性评分最好的解法,作为构图的基础。 +**Stage 1: Semantic Analysis.**Focus on selecting entity column sequences for ID composition, identifying entity/relationship semantics in tables, discovering cross-table equivalent columns (columns with equal value relationships), and supporting expression-based column processing. Among all possible solutions, the best-performing one in terms of storage performance, computational performance, and interpretability is selected as the graph foundation. -**第二阶段,结构化转换。**重点在于生成点/边实体,合并点实体,必要时通过冗余边生成平衡数据冗余与查询性能。自动创建虚拟点完成关系绑定,配置边的起始端点。 +**Stage 2: Structured Transformation.**Focus on generating vertex/edge entities, merging vertex entities, and creating redundant edges when necessary to balance data redundancy and query performance. Virtual nodes are automatically created to bind relationships, and edge start/endpoints are configured. -**第三阶段,组装成图。**即将所有点合并在一起,绑定在起始点上的边自然合并,对端点可选地进行绑定。对两个有差异的转图方案方案,可以计算差异向量,即所有表映射到实体的变化情况。 +**Stage 3: Graph Assembly.**All vertices are merged, and edges bound to start nodes are naturally merged. Endpoint binding is optional. For two different graph conversion schemes, a difference vector can be calculated — representing how all tables map to entity changes. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741683049495-d75ae87b-9510-40ee-b22f-c6140570b1f1.png) ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741683063811-75a42c49-8b17-482b-9dd2-98be89ec63b0.png) -**图 3 ER 图转图 Schema 示例组图** +**Figure 3: ER to Graph Schema Conversion Example Series** -通过算法自动分析多表之间的关联关系并自动构建图的点边,可以为数据从原始存储位置迁移至图数仓提供依据,同时显著消除人工数据建模、人工编写数据导入 DSL 的工作量,无人工介入即可使传统数仓数据快速迁移到图数据仓库中,立即开始分析。 +Through algorithmic analysis of inter-table associations and automatic graph construction, this provides a basis for migrating data from its original storage location to the graph data warehouse. It also significantly reduces manual data modeling and DSL scripting efforts, enabling fast migration of traditional warehouse data to a graph warehouse with no manual intervention and immediate analysis readiness. -### 2. 数据通道:物化数据交互能力 +### 2. Data Pipeline: Materialized Data Interaction Capabilities -类似于传统数据仓库,图数仓基于 GeaFlow 引擎能力与 TuMaker 成熟的业务平台提供数据任务编排能力,即将多个数据处理任务(如数据抽取、转换、加载等)按照一定的逻辑顺序组织起来,自动执行的过程。提供可视化界面、任务调度机制、监听事件触发、错误处理、监控与日志、版本控制与回滚、智能调度集群资源等关键能力。 +Similar to traditional data warehouses, the graph data warehouse leverages GeaFlow engine capabilities and TuMaker’s mature business platform to provide data task orchestration capabilities — organizing multiple data processing tasks (like data extraction, transformation, and loading) in a logical sequence and executing them automatically. Key features include visual interfaces, task scheduling, event triggers, error handling, monitoring and logging, version control and rollback, and intelligent cluster resource scheduling. -在 Schema 转换器的加持下,可以得到从表存储到图存储的物化方案,它构建了连接传统数仓与图数仓的数据通道。基于表转图的物化方案,可以根据业务实际配置的加速表、加速关系、字段、权限等信息,全自动生成数据同步的任务编排,再通过图数仓平台调度,实现数据迁移全程无感,后续实时更新与增量同步,同步效率可达延迟十分钟级别。 +With the help of the Schema Converter, a materialization plan from table storage to graph storage can be generated, building a data pipeline between traditional and graph data warehouses. Based on the table-to-graph materialization plan, the system can automatically generate data sync task orchestrations according to actual business configurations like acceleration tables, relationships, fields, and permissions. These are then scheduled via the graph warehouse platform to achieve seamless data migration. Subsequent real-time updates and incremental syncs can be completed within ten minutes. -数据通道能力面向主流大数据生态系统,可深度集成 ODPS/Hive/Paimon 等基础设施,通过三层架构实现全生命周期数据管理:在数据接入层,自动捕获表的变化,产出物化方案,同步表-图实体映射的增量部分,当前可管理 10TB 级别图数据;在转换引擎层,全自动化生成导数的 DSL 任务编排,调度到集群执行;在存储优化层,支持 CStore/GraphDB/RocksDB 等自研或开源图存储解决方案,实践中已经过万亿级超大业务图的检验。此外,查询热数据预加载可根据图的实际使用情况,在 TB 级数据规模下仍能维持秒级查询相应,真正实现从表数仓到图数仓的全栈切换,SQL 之下全为图。 +The data pipeline integrates deeply with mainstream big data ecosystems like ODPS/Hive/Paimon. It achieves full lifecycle data management through a three-tier architecture: at the data access layer, it automatically captures table changes, generates materialization plans, and syncs incremental mappings from tables to graph entities, currently managing graph data at the 10TB scale; at the conversion engine layer, it fully automates DSL task orchestration and schedules them to clusters; at the storage optimization layer, it supports proprietary and open-source graph storage solutions like CStore/GraphDB/RocksDB, validated in trillion-edge-scale super-large business graphs. Additionally, hot data preloading maintains second-level query response times even at the TB scale, truly enabling a full-stack transition from relational to graph warehouses, with SQL running on top of graphs. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741684625347-a229239e-fd58-4d42-adc9-f272e3f13fdf.png) -**图 4 开源技术架构一张大图** +**Figure 4: Open-Source Technical Architecture Overview** -### 3. SQL-GQL 翻译引擎 +### 3. SQL-GQL Translation Engine -在传统关系型数据库中,多层表关联查询往往需要编写复杂的 JOIN 语句,不仅开发效率低下,性能也难以满足海量关联数据的即席分析需求。针对这一痛点,我们通过创新的 SQL-GQL 翻译引擎,让用户无需学习图查询语言(GQL)即可将 SQL 中复杂的 JOIN 语句自动转换为图路径查询,消除用户对图领域复杂性感知,同时利用图引擎优化执行性能。 +In traditional relational databases, multi-layer table join queries often require complex JOIN statements, which are not only inefficient to develop but also struggle to meet the demands of ad-hoc analysis of massive associated data. To address this pain point, we introduced an innovative SQL-GQL translation engine that automatically converts complex SQL JOINs into graph path queries without requiring users to learn graph query languages (GQL). This eliminates user perception of graph complexity while leveraging the graph engine for performance optimization. -与 SQL 基于关系模型的二维表操作不同,GQL 的查询结构和语义贴合图数据的特性,尤其在查询逻辑的线性化和嵌套处理上存在显著差异。将 SQL 查询转换为 GQL(图查询语言)是一项涉及语法结构映射数据模型映射执行逻辑重构的复杂任务。其核心挑战在于如何将基于关系模型的集合操作转化为基于图模型的线性路径遍历,同时规避嵌套查询、不合理图计算顺序的代价。 +Unlike SQL, which operates on two-dimensional tables based on the relational model, GQL's query structure and semantics align with graph data characteristics, especially in linearization and nested processing of query logic. Converting SQL queries to GQL involves syntax structure mapping, data model mapping, and execution logic reconstruction. The core challenge is how to transform set operationsbased on the relational model into linear path traversalsbased on the graph model, while avoiding the cost of nested queries and suboptimal graph computation order. -对比传统 SQL 查询,可能需通过 3 层表关联分析用户关联关系,响应时间在分钟级别。而图路径查询直接通过图的遍历语句实现,响应时间缩短至秒级。目前该引擎已在短视频分析、会员用增、客权服务等典型业务场景得到验证,未来将持续扩展对复杂子查询、复杂表达式运算的支持,让更多开发者无需跨越技术鸿沟即可解锁图计算的强大能力。 +Compared to traditional SQL queriesthat may take minutes to analyze user relationships through three table joins, graph path queries can complete the same task in seconds.This engine has been validated in typical business scenarios like short video analysis, membership growth, and customer rights services. In the future, it will expand to support complex subqueries and expression operations, allowing more developers to unlock graph computing power without crossing technical barriers. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741683921355-149d0fea-7a3f-4fb8-ad36-f4b4c8541113.png) -**图 5 SQL 抽象语法树 AST 翻译为 GQL 结构的差异示例** +**Figure 5: SQL AST to GQL Structure Translation Difference Example** -## 三、技术优势与应用场景 +## 3. Technical Advantages and Application Scenarios -### 3.1 效率提升的底层逻辑 +### 3.1 Underlying Logic of Efficiency Gains -在关联分析场景中,图数据仓库的突破性性能源于两大核心技术革新。 +In relational analysis scenarios, the graph data warehouse’s breakthrough performance stems from two core technological innovations. -首先,图存储模型通过物理结构的优化彻底改变了数据组织方式。传统关系型数据库将关联信息分散存储在外键表中,执行多表 JOIN 时需频繁进行基于索引的寻址和数据重组。而图模型采用连接键原生聚合存储机制,将实体属性与其关联关系作为"节点-边"结构进行物理邻接存储,配合缓存预加载技术,使得关联关系的遍历检索复杂度从 O(n²)降低至 O(n),特定键的处理复杂度从 O(n)降低至 O(1)。 +First, the graph storage model optimizes physical structures, fundamentally changing data organization. Traditional relational databases store association information in foreign key tables, requiring frequent indexing and data reorganization during multi-table JOINs. In contrast, the graph model uses native key aggregation storage, storing entity attributes and their relationships as physically adjacent "node-edge" structures. With cache preloading, traversal complexity drops from O(n²) to O(n), and key processing complexity from O(n) to O(1). -其次,图遍历算法构建了全新的查询范式。相较于关系型数据库基于集合的批处理模式,图引擎采用深度优先、广度优先等路径遍历算法,结合查询条件动态剪枝规避无效分支遍历。这种机制使得多层以上的链路追踪响应时间稳定在秒级,而传统 SQL 方案在大表的 3 层关联时往往已出现分钟级延迟。更关键的是,图遍历支持实时增量计算,当表新增记录时,展现出卓越的扩展能力。 +Second, graph traversal algorithms establish a new query paradigm. Unlike relational databases' set-based batch processing, graph engines use depth-first or breadth-first path traversal, dynamically pruning branches to avoid inefficiencies. This ensures that multi-layer link tracking remains at the second level, while traditional SQL approaches often hit minute-level delays on large tables. Crucially, graph traversal supports real-time incremental computation, demonstrating strong scalability when new records are added. -### 3.2 用户价值主张 +### 3.2 User Value Proposition -作为新一代数据基础设施,图数据仓库开创了"一图多用"的全新范式。用户既可通过熟悉的 SQL 接口进行常规分析,通过底层引擎嫁接的形式融入现有的基础设施。也可在需要深度挖掘时切换至 GQL、Gremlin 等专业图查询语言。这种双模兼容特性在同一套数据资产支撑不同类型的分析需求时尤为突出。 +As a next-generation data infrastructure, the graph data warehouse pioneers a "one graph, multiple uses" paradigm. Users can perform standard analysis through familiar SQL interfaces while integrating underlying engines with existing infrastructure. When deeper analysis is needed, they can switch to professional graph query languages like GQL or Gremlin. This dual-mode compatibility is especially valuable when supporting diverse analytical needs with the same data assets. -在算法支持层面,系统预置的图计算引擎突破传统数仓的局限,同时面向开源生态开放自定义图算法开发接口。例如传统 PageRank 算法可识别社交网络影响力节点,应用于精准营销场景;弱连接分析(WCC)帮助在亿级交易数据中发现异常社群;通过标准化 API 开放,用户既无需关注分布式计算细节,也无需关注数据构图流程,即可完成万亿边规模的数据挖掘。 +At the algorithm level, the system’s built-in graph computing engine transcends traditional warehouse limitations and opens interfaces for custom graph algorithm development within the open-source ecosystem. For example, traditional PageRank algorithms can identify influential nodes in social networks for targeted marketing; Weakly Connected Components (WCC) can detect anomalous communities in billion-scale transaction data. Through standardized APIs, users can perform trillion-edge-scale data mining without needing to understand distributed computing or data graphing processes. -相较于传统数仓,图数仓在三个维度实现代际跨越:性能层面,关联查询效率提升 1-2 个数量级;易用性层面,通过 SQL-GQL 自动转换消除图领域学习成本;分析深度层面,支持算法分析和隐性关系挖掘。 +Compared to traditional data warehouses, the graph data warehouse achieves generational leaps in three dimensions: performance (1–2 orders of magnitude faster), usability (eliminating graph learning barriers via SQL-GQL auto-conversion), and analytical depth (supporting algorithmic analysis and hidden relationship discovery). -## 四、未来展望 +## 4. Future Outlook -作为下一代数据基础设施的核心载体,我们计划逐步将图存储引擎、图计算框架引擎、SQL-GQL 翻译模块等核心能力开源,构建开发者共创的技术生态。2023 年已率先开源流图计算引擎 GeaFlow,2025 年 Q3 将继续开放图模型数据分析标准化平台,高性能的图计算引擎,支持社区开发者开发异构数据源连接器。这种开放协作模式不仅加速技术迭代,更推动产品成为 ISO/IEC 39075 GQL 国际标准的最佳实践平台,助力 SQL-GQL 混合查询渐成行业规范。 +As a core carrier of next-generation data infrastructure, we plan to gradually open-source core capabilities like the graph storage engine, graph computing framework engine, and SQL-GQL translation module to build a developer-driven technical ecosystem. In 2023, we first open-sourced the streaming graph computing engine GeaFlow. In Q3 2025, we will release a standardized graph data analysis platform, high-performance graph computing engine, and support community developers in building connectors for heterogeneous data sources. This open collaboration accelerates technical iteration and positions the product as the best practice platform for the ISO/IEC 39075 GQL international standard, driving SQL-GQL hybrid queries to become an industry norm. -技术演进层面,下一代引擎将突破动态流图计算瓶颈,实现万亿边规模数据的增量更新。通过融合向量化计算引擎,可同时处理属性图与向量图的联合查询,满足 AIGC 时代的多模态分析需求,并支持自然语言直接生成图查询语句的颠覆性体验。行业应用前景正呈现爆发态势,未来图数据仓库将承载多数企业关联数据分析负载,成为智能决策的核心引擎。 +On the technical evolution front, the next-generation engine will break through dynamic streaming graph computing bottlenecks to support trillion-edge incremental updates. By integrating vectorized computing engines, it can jointly query property graphs and vector graphs to meet AIGC-era multimodal analysis needs and enable revolutionary experiences like generating graph queries directly from natural language. Industry applications are rapidly expanding, and the graph data warehouse will soon become the core engine for most enterprise relational data analysis and intelligent decision-making. + +--- + +Let me know if you'd like a version formatted for a blog, documentation, or presentation! \ No newline at end of file diff --git a/blog/31.md b/blog/31.md index 2d097d46b..00dac2454 100644 --- a/blog/31.md +++ b/blog/31.md @@ -1,39 +1,31 @@ --- -title: Graph4Stream:基于图的流计算加速 +title: "Graph4Stream: Accelerating Stream Computing with Graph-Based Approaches" date: 2025-3-25 --- ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/8237/1741852109942-9310f385-a0c2-4c32-987f-77b5c9df911a.png) -> 作者:坤羽;审校:东朔。 +> Author: Kunyu; Reviewer: Dongshuo. -之前在「姊妹篇」[《Stream4Graph:动态图上的增量计算》](https://zhuanlan.zhihu.com/p/27618053733)中,向大家介绍了在图计算技术中引入增量计算能力「图+流」,GeaFlow 流图计算相比 Spark GraphX 取得了显著的性能提升。那么在流计算技术中引入图计算能力「流+图」,GeaFlow 流图计算相比 Flink 关联计算性能如何呢? +In a previous article ["Stream4Graph: Incremental Computation on Dynamic Graphs"](https://zhuanlan.zhihu.com/p/27618053733), we introduced how introducing incremental computation into graph computing—essentially combining "graphs + streams"—allowed GeaFlow to significantly outperform Spark GraphX in terms of performance. Now, the question arises: when we introduce graph computing capabilities into stream computing—combining "streams + graphs"—how does GeaFlow compare to Flink's associative computation performance? -当今时代,数据正以前所未有的速度和规模产生,对海量数据进行实时处理在异常检测、搜索推荐、金融交易等各个领域都有着广泛的应用。流计算作为最主要的实时数据处理技术也变得越来越重要。 +In today’s era, data is being generated at an unprecedented speed and scale, and real-time processing of massive datasets has wide applications in various fields such as anomaly detection, search recommendations, and financial transactions. As one of the core technologies for real-time data processing, **stream computing** has become increasingly important. - +Unlike batch processing, which waits for all data to arrive before computation, stream computing partitions continuously generated data streams into micro-batches and performs incremental computations on each batch. This computational characteristic gives stream computing high throughput and low latency. Common stream computing engines include Flink and Spark Streaming, both of which process data using tabular representations. However, as stream computing applications deepen, more and more scenarios involve computing complex relationships among large datasets, leading to significant performance degradation in table-based stream engines. -与批处理需要等待数据全部到齐才进行计算不同,流计算将持续生成的数据流划分成微批,对每个微批的数据进行增量计算。这样的计算特性使得流计算具有高吞吐、低延迟的特性。常见的流计算引擎包括 Flink、Spark Streaming 等,他们都采用表的方式处理流中的数据。随着流计算应用的深入,越来越多的计算场景涉及到大数据之间关联关系的计算,此时基于表的流计算引擎性能会大幅下降。 +GeaFlow, an open-source stream graph computing engine developed by Ant Group's graph computing team, combines graph and stream computing to provide an efficient framework for stream graph processing, significantly improving computational performance. Below, we will introduce the limitations of traditional stream computing engines in relational computation, explain the principles behind GeaFlow's efficiency, and present performance comparisons. - +## Stream Computing Engine: Flink -蚂蚁图计算团队开源的流图计算引擎GeaFlow,将图计算与流计算相结合,提供了高效的流图处理框架,大幅提升了计算性能。下面为大家介绍传统流计算引擎在关联关系计算的局限性,GeaFlow 流图计算高效的原理以及他们的性能对比。 - - - -## 流计算引擎:Flink - -Flink 是经典的基于表的流处理引擎,他将输入的数据流切分成微批,每次计算当前批次的数据。在计算过程中,Flink 将计算任务翻译成由 map、filter、join 等基础算子组成的有向图,每个算子都有他的上游输入和下游输出。增量数据经过所有算子的计算后输出当前批次的结果。 +Flink is a classic table-based stream processing engine. It slices incoming data streams into micro-batches and processes the data in each batch incrementally. During execution, Flink translates computation tasks into directed graphs composed of basic operators like map, filter, and join. Each operator receives input from upstream and sends output downstream. Incremental data passes through all operators to produce results for the current batch. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1741541936849-f9e0ae71-d25d-4789-b9c6-ed0380140f2a.png) -Flink 增量计算 +Flink Incremental Computation - - -我们以 k-Hop 算法为例,描述 Flink 的计算过程。k-Hop 是指 K 跳关系,例如在社交网络中 k-Hop 指的是可以通过 K 个中间人相互认识的关系链,在交易分析中指资金的 K 次连续转移的路径。假定以 2 跳关系为例,输入的数据格式 src dst 代表了两两关系。Flink 的计算 SQL 如下文所示 +We take the k-Hop algorithm as an example to illustrate Flink’s computation process. A k-Hop relationship refers to a path that spans k steps—such as a chain of acquaintances in social networks or a sequence of fund transfers in transaction analysis. Assuming a 2-hop relationship, with input data in the format `src dst` representing pairwise relationships, Flink executes the following SQL: ```sql -- create source table @@ -72,59 +64,51 @@ ON `e`.`dst` = `v`.`vid`; ``` - - -他的执行计划如下图所示,他由 Aggregate、Calc、Join 等算子组成,数据流经每个算子最终得到增量结果。核心算子 join 实现了关联关系的查找,我们来详细分析 Join 算子的实现方式。 +The execution plan is shown below. It consists of operators such as Aggregate, Calc, and Join. Data flows through each operator to yield incremental results. The core operator, Join, is responsible for relationship lookups. Let's examine how the Join operator works. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1740550683257-198617cb-b66b-4997-86f7-d41df94f0fb1.png) -Flink 执行计划 +Flink Execution Plan - +As shown below, the Join operator has two input streams: LeftInput and RightInput, corresponding to the left and right tables of the join. When data arrives from upstream, the operator begins computation. Taking the left input stream as an example, the data is first stored in LeftStateView. Then, the operator queries RightStateView for data that satisfies the join condition. This querying process requires scanning through RightStateView, and the resulting joined data is passed to the next operator. -如下图所示,Join 算子有两个输入流 LeftInput 和 RightInput,分别代表了 join 的左表和右表,Join 算子在接收到上游的数据后执行计算。以左输入流为例,输入的数据首先被加入到 LeftStateView 中保存起来,然后去 RightStateView 中查询是否有数据符合 join 条件,这个查询过程需要遍历 RightStateView,最后将 join 结果输入到下一个算子中。 - - - -join 计算主要的性能瓶颈就在遍历 RightStateView。LeftStateView 和 RightStateView 实际上存储 join 的左表和右表。随着数据不断输入,StateView 中的数据量持续膨胀,最终导致遍历的耗时急剧上升,严重影响系统性能。 +The main performance bottleneck lies in scanning RightStateView. LeftStateView and RightStateView store the left and right tables of the join, respectively. As data continuously flows in, the size of StateViews grows, causing scan times to increase dramatically and severely degrading system performance. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1741589034132-1969e973-94dd-42ca-b506-ebe4594d87a8.png) -Flink Join 算子实现 +Flink Join Operator Implementation -## 流图计算引擎:GeaFlow +## Stream Graph Computing Engine: GeaFlow -### 图计算&流图 +### Graph Computing & Stream Graphs -图计算是一种基于图数据格式的计算范式,其中图 G(V,E)由点集合 V 和边集合 E 构成,边代表了数据之间的关联关系。以公开数据集 web-Google 为例,其中每一行数据由两个数字组成,代表了两个页面之间的跳转关系。如下图所示,左侧是原始数据,常规的数据建模方式是建立一张包含两列数据的表,而图的建模方式是将网页作为点,将页面的跳转关系作为边,构成一张跳转网络图。在表的建模方式中,关联关系的计算是通过表的 join 实现的,join 需要遍历左表或者右表。而在图计算中,关联关系被直接存储在边中,省去了遍历的过程。 +Graph computing is a computational paradigm based on graph data structures. A graph G(V,E) consists of a set of vertices V and edges E, where edges represent relationships between data. Using the public dataset web-Google as an example, each line contains two numbers representing a hyperlink between two web pages. As shown below, the left side shows raw data, which is traditionally modeled as a two-column table. In contrast, graph modeling treats web pages as vertices and hyperlinks as edges, forming a web link graph. In the tabular model, relationship computation is done via joins, which require scanning tables. In graph computing, relationships are directly stored in edges, eliminating the need for scans. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1741544451333-05f87f7e-8c8b-41fb-a27b-051b6df8e5da.png) -表建模 vs. 图建模 +Table Modeling vs. Graph Modeling -流图是图在流场景中的应用,他依据数据流对图的更新将图分成历史图和增量图两个部分。例如在上图中,假设第一行和第二行数据已经输入并完成相应计算,当前处理第三行数据。此时历史图就是由前两行数据建模得到,而增量图是由第三行数据组成的图,两者合并起来就得到完整的图。在流图上应用增量图算法,可以高效完成计算任务,实现实时计算。 +A **stream graph** is the application of graph computing to streaming scenarios. It divides the graph into historical and incremental components based on data stream updates. For example, if the first two rows have been processed and we are now handling the third row, the historical graph is built from the first two rows, and the incremental graph is formed by the third row. Together, they constitute the full graph. Applying incremental graph algorithms on stream graphs enables efficient, real-time computation. - +### GeaFlow Architecture -### GeaFlow 架构 - -GeaFlow 引擎的计算流程分为流数据输入、分布式增量图计算、增量结果输出几个部分。和传统的流计算引擎一样,输入的实时数据按照窗口被切分成微批。对于当前批次的数据,先按照建模策略解析成点边构成增量图。增量图和之前数据构成的历史图一道组成完整的流图。计算框架在流图上应用增量图算法得到增量结果输出,最后把增量图添加到历史图中。 +The GeaFlow engine’s computation flow consists of stream data input, distributed incremental graph computation, and incremental result output. Like traditional stream engines, real-time data is sliced into micro-batches by window. For each batch, the data is parsed into vertices and edges to form an incremental graph. This incremental graph and the historical graph (built from previous data) together form the complete stream graph. The computation framework applies incremental graph algorithms on the stream graph to yield incremental results, which are then output and added to the historical graph. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1740932875376-6633b307-c309-4ae7-be5e-d0b96a66409a.png) -GeaFlow 增量计算 +GeaFlow Incremental Computation -GeaFlow 计算框架是以点为中心的迭代计算模型。他以增量图中的点作为第一轮迭代的起点。在每一轮迭代中,每个点都独立维护自身的状态,根据与每个点关联的历史图和增量图完成当前迭代轮次的计算,最后将计算结果通过消息传递给邻居点,开启下一轮迭代。 +The GeaFlow computation framework is a vertex-centric iterative model. It starts with vertices in the incremental graph. In each iteration, each vertex maintains its own state and performs computation based on its associated historical and incremental graph data. The result is then passed to neighboring vertices via message passing to trigger the next iteration. -以前文中提到的 k-Hop 为例,增量算法如下:在第一轮迭代中,我们找到增量图中的所有边,将这些边作为初始的入向路径和出向路径,分别发送到他们的起点和终点。在后续的迭代中不断扩展入向路径和出向路径。当达到求取跳数时,将出向路径和入向路径发送给起点,在起点组合成最终结果。详细代码实现在开源仓库的[IncKHopAlgorithm.java](https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java)文件中。 +Taking k-Hop as an example, the incremental algorithm works as follows: In the first iteration, all edges in the incremental graph are identified and treated as initial incoming and outgoing paths, which are sent to their start and end vertices. In subsequent iterations, these paths are extended. Once the desired hop count is reached, the paths are sent back to the starting vertex, where they are combined into final results. Detailed implementation can be found in the open-source repository file [IncKHopAlgorithm.java](https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java). -下图是两跳场景的描述。在第一轮迭代,增量边 B->C 分别构建入向路径和出向路径,将他们分别发送给点 B 和点 C。在第二轮迭代,B 收到入向路径,并加上当前点的入边形成 2 跳入向路径,发送给点 B。同样点 C 也收到出向路径,加上当前的出边形成 2 跳出向路径,发送给点 B。最后一轮迭代在 B 点将收到的出向和入向路径整合成新增的路径。可以看到,和 Flink 中需要查找所有的历史关系不同,GeaFlow 采用基于流图的增量图算法,计算量和图中的增量路径成正比。 +The diagram below illustrates the two-hop case. In the first iteration, the edge B->C creates incoming and outgoing paths, sent to B and C, respectively. In the second iteration, B receives an incoming path, adds its own incoming edges, and forms a 2-hop incoming path, which it sends to itself. Similarly, C forms a 2-hop outgoing path and sends it to B. In the final iteration, B combines the incoming and outgoing paths to produce the new paths. Unlike Flink, which must scan all historical relationships, GeaFlow's computation is proportional to the incremental paths, not the historical data. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1741710927215-b6be1398-7485-432b-b8f7-c4cde5366302.png) -两跳增量路径计算 +Two-Hop Incremental Path Computation -上述图算法已经集成到 GeaFlow 的 IncKHop 算子中,用户可以直接通过 DSL 调用。 +The above graph algorithm has been integrated into GeaFlow’s `IncKHop` operator, and users can directly call it using DSL: ```sql set geaflow.dsl.max.traversal=4; @@ -185,33 +169,31 @@ RETURN ret ; ``` - - -## GeaFlow 性能测试 +## GeaFlow Performance Test -为了验证 GeaFlow 的流图计算性能,我们以k-Hop算法为例设计了和 Flink 的对比实验。我们将指定数据作为输入源输入到计算引擎中,执行k-Hop算法,并统计所有数据完成计算的时间来比较系统的性能。我们采用公开数据集[web-Google.txt](https://snap.stanford.edu/data/web-Google.html)作为输入,实验环境为 16 台 8 核 16G 的服务器,分别比较了一跳、两跳、三跳、四跳关系计算的场景。 +To evaluate GeaFlow’s performance in stream graph computing, we designed a comparative experiment using the k-Hop algorithm. We used the public dataset [web-Google.txt](https://snap.stanford.edu/data/web-Google.html) as input and measured the time required to complete the computation across one-hop to four-hop scenarios. The experiment ran on 16 servers, each with 8 cores and 16GB memory. -实验结果如图所示,横坐标是分别是一跳关系、两跳关系、三跳关系、四跳关系,纵坐标是处理完所有数据的耗时,采用对数指标。可以看到在一跳、两跳场景中,Flink 的性能要好于 GeaFlow,这是因为在一跳、两跳场景中参与 join 计算的数据量比较小,join 需要遍历的左表和右表都很小,遍历本身耗时短,而且 Flink 的计算框架可以缓存 join 的历史计算结果。但是到了三跳、四跳场景时候,由于计算复杂度的上升,join 算子需要遍历的表迅速膨胀,带来计算性能的急剧下降,甚至四跳场景超过一天也无法完成计算。而 GeaFlow采用基于流图增量图算法,计算耗时只和增量路径相关,和历史的关联关系计算结果无关,所以性能明显优于 Flink。 +As shown in the results below, the x-axis represents one-hop to four-hop relationships, and the y-axis shows the processing time on a logarithmic scale. In one- and two-hop cases, Flink outperforms GeaFlow due to the small amount of data involved in joins. However, as complexity increases, Flink’s join-based approach becomes inefficient, especially in four-hop cases, where it can’t finish within a day. GeaFlow, by contrast, scales efficiently due to its incremental graph algorithm, whose performance depends only on the incremental paths. ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/8237/1743568484086-10eb9a1a-3dd0-42ee-b885-875ac7d81221.png) -k-Hop 计算性能对比 +k-Hop Computation Performance Comparison -## 总结和展望 +## Conclusion and Future Work -传统的 Flink 等流计算引擎在计算关联关系时需要用到 join 算子,join 算子需要遍历全量的历史数据,这使得他们在大数据关联计算场景中性能不佳。GeaFlow 引擎通过支持流图计算框架,将图计算引入到流计算中,采用增量图计算的方法大大提升了实时数据的处理系性能。 +Traditional stream engines like Flink use join operators for relationship computation, which requires scanning all historical data, resulting in poor performance in large-scale associative scenarios. GeaFlow addresses this by introducing graph computing into stream processing through a stream graph framework, significantly boosting performance with incremental graph algorithms. -目前 GeaFlow 项目代码已经开源,我们希望基于 GeaFlow 构建面向图数据的统一湖仓处理引擎,以解决多样化的大数据关联性分析诉求。同时我们也在积极筹备加入 Apache 基金会,丰富大数据开源生态,因此非常欢迎对图技术有浓厚兴趣同学加入社区共建。 +GeaFlow is now open-source. We aim to build a unified lakehouse engine for graph data to support diverse associative analytics. We are also preparing to join the Apache Software Foundation to enrich the open-source big data ecosystem. If you're interested in graph technology, we welcome you to join the community. -社区中有诸多有趣的工作尚待完成,你可以从如下简单的「Good First Issue」开始,期待你加入同行。 +There are many exciting tasks to explore. You can start with these beginner-friendly issues: -- 支持增量 k-Core 算法。([Issue 466](https://github.com/TuGraph-family/tugraph-analytics/issues/466)) -- 支持增量最小生成树算法。([Issue 465](https://github.com/TuGraph-family/tugraph-analytics/issues/465)) +- Support incremental k-Core algorithm ([Issue 466](https://github.com/TuGraph-family/tugraph-analytics/issues/466)) +- Support incremental Minimum Spanning Tree algorithm ([Issue 465](https://github.com/TuGraph-family/tugraph-analytics/issues/465)) - ... -## 参考链接 +## References -1. GeaFlow 项目地址:[https://github.com/TuGraph-family/tugraph-analytics](https://github.com/TuGraph-family/tugraph-analytics) -2. web-Google 数据集地址:[https://snap.stanford.edu/data/web-Google.html](https://snap.stanford.edu/data/web-Google.html) -3. GeaFlow Issues:[https://github.com/TuGraph-family/tugraph-analytics/issues](https://github.com/TuGraph-family/tugraph-analytics/issues) -4. 增量 k-Hop 算法实现源码:[https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java](https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java) +1. GeaFlow Project: [https://github.com/TuGraph-family/tugraph-analytics](https://github.com/TuGraph-family/tugraph-analytics) +2. web-Google Dataset: [https://snap.stanford.edu/data/web-Google.html](https://snap.stanford.edu/data/web-Google.html) +3. GeaFlow Issues: [https://github.com/TuGraph-family/tugraph-analytics/issues](https://github.com/TuGraph-family/tugraph-analytics/issues) +4. Incremental k-Hop Source Code: [https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java](https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java) diff --git a/blog/32.md b/blog/32.md index f11841e23..7c9b3df08 100644 --- a/blog/32.md +++ b/blog/32.md @@ -1,165 +1,129 @@ --- -title: "流式图计算引擎 GeaFlow v0.6.4 发布,支持关系型访问图数据,增量匹配优化实时处理" -date: 2025-4-3 +title: "Streaming Graph Computing Engine GeaFlow v0.6.4 Released: Supports Relational Access to Graph Data, Incremental Matching Optimizes Real-Time Processing" +date: April 3, 2025 --- -TuGraph 在 2025 年 3 月发布了流式图计算引擎 GeaFlow v0.6.4,新版本实现了多个重要特性更新,包括: +**March 2025** saw the release of streaming graph computing engine GeaFlow v0.6.4. This version implements multiple significant feature updates, including: -- 🍀GeaFlow 图存储扩展支持 paimon 数据湖(实验性功能) -- 🍀图数仓能力扩展:支持对图中的实体进行关系型访问 -- 🍀统一的内存管理器支持 -- 🍀RBO 规则扩展:新增 MatchEdgeLabelFilterRemoveRule 和 MatchIdFilterSimplifyRule -- 🍀支持增量匹配算子 +- 🍀 Experimental support for storing GeaFlow graph data in Paimon data lake +- 🍀 Enhanced graph data warehouse capabilities: Supports relational access to graph entities +- 🍀 Unified memory manager support +- 🍀 RBO rule extensions: New MatchEdgeLabelFilterRemoveRule and MatchIdFilterSimplifyRule +- 🍀 Support for incremental matching operators -## ✨ 新增功能 +## ✨ New Features -### 🍀GeaFlow 图存储扩展支持 paimon 数据湖(实验性功能) +### 🍀 GeaFlow Graph Storage Extended to Support Paimon Data Lake (Experimental) -为提升 GeaFlow 数据存储系统的扩展性、实时数据处理能力及成本效率,本次更新加入了对 Apache Paimon 的支持。Paimon 作为新一代流式数据湖存储格式,在设计理念、功能特性上,与 GeaFlow 之前使用的 RocksDB 存在许多差异: +To enhance GeaFlow's data storage system scalability, real-time processing capabilities, and cost efficiency, this update adds support for **Apache Paimon**. As a next-generation streaming data lake storage format, Paimon differs significantly in design philosophy and features from RocksDB, previously used by GeaFlow: -- 支持对象存储/HDFS 分布式存储,天然适配云原生环境。因此可实现存储与计算分离,降低硬件依赖,支持弹性扩展。 -- 支持主键表 LSM 合并、增量更新,满足实时数据更新需求。 -- 列式存储+统计索引(Z-Order、Min-Max 等),支持高效数据裁剪与 OLAP 查询加速。 +- **Supports object storage/HDFS distributed storage**, natively adapting to cloud-native environments. This enables storage-compute separation, reduces hardware dependencies, and supports elastic scaling. +- **Supports primary key table LSM compaction and incremental updates**, meeting real-time data update demands. +- **Columnar storage + statistical indexing (Z-Order, Min-Max, etc.)**, enabling efficient data pruning and OLAP query acceleration. -在本次更新中,GeaFlow 加入了对 Paimon 存储的支持,但目前仅为实验性质。 - -- 支持在 GeaFlow 中将用户图数据存储到 paimon 数据湖。 -- 当前为实验性功能,仅支持使用本地文件系统作为 paimon 的存储后端,且暂不支持 recover 能力,暂不支持动态图数据存储。 -- 通过配置`geaflow.store.paimon.options.warehouse`参数来指定存储路径,默认路径为"file:///tmp/paimon/"。 - -当前 GeaFlow 的存储架构图如下。 +In this update, GeaFlow adds support for Paimon storage (currently **experimental**): +- Allows storing user graph data in Paimon data lake via GeaFlow. +- **Current limitations**: Only supports local filesystem as Paimon backend; recoverability not yet supported; dynamic graph data storage not yet supported. +- Configure the storage path via the parameter `geaflow.store.paimon.options.warehouse` (default: `"file:///tmp/paimon/"`). +The current GeaFlow storage architecture is shown below: ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583755-e264437e-59a4-4483-81b9-6b1b26a49279.webp) - - -### 🍀图数仓能力扩展:支持对图中的实体进行关系型访问 - -在传统关系型数据库中,多层表关联查询往往需要编写复杂的 JOIN 语句,不仅开发效率低下,性能也难以满足海量关联数据的即席分析需求。针对这一痛点,我们通过创新的 SQL 支持,让用户无需学习图查询语言(GQL)即可将 SQL 中复杂的 JOIN 语句自动转换为图路径查询。当前版本提供以下两种 SQL 语法支持: - -- 支持图中的点/边作为 SQL 查询的来源表,进行查询。 - - 我们通过 TableScanToGraphRule 规则,让生成和优化 RelNode 时识别 SQL 语句中来源于图中的点/边实体,这使得用户可以像 SQL 中扫表操作一样读取图中的点边 - - 示例: student 是图 g_student 中的点实体 - -```plain -USE GRAPH g_student; - -INSERT INTO table_scan_001_result -select avg(age) as avg_age from student; -``` - -- 支持图中的点与边关联作为 SQL 查询的等值条件 Join,进行查询。 - - 我们通过 TableJoinTableToGraphRule 规则,让生成和优化 RelNode 时识别 SQL 语句中的 Join 算子,这使得用户可以像 SQL 中连接表操作一样在图中进行查询 - - 示例: student 是图 g_student 中的点实体,selectCource 是关联在 student 点上的出边 - -```plain -USE GRAPH g_student; - -INSERT INTO vertex_join_edge_001_result -SELECT s.id, sc.targetId, sc.ts -FROM student s JOIN selectCourse sc on s.id = sc.srcId -WHERE s.id < 1004 -ORDER BY s.id, targetId -; -``` - -### 🍀内存管理器支持 - -当前 GeaFlow 没有内存管理,除了外部依赖 rocksdb 会用堆外内存,其他的全都是堆内内存。当内存使用多时,GC 压力明显,另外 shuffle 阶段网络发送也存在多次数据拷贝,导致效率不高。 - -内存管理负责各模块(shuffle、state、framework)的内存管控,包括申请、释放、监控。 内存管理有两部分:堆内和堆外。不同模块使用可能不同的内存区域,合理使用这些资源可以更高效跑完作业。内存管理器主要有以下核心能力: - -- 支持堆内和堆外内存统一管理:通过统一抽象 MemoryView,提供读写接口,屏蔽用户对堆外和堆外的感知。当前 Memoryview 堆外内存是采用预分配模式,初始大小是通过 off.heap.memory.chunkSize.MB 参数来控制,如果不设置,默认是 -Xmx 参数的 30%作为初始值。运行过程中也支持动态扩所容。 -- 支持计算和存储统一内存管理 - -为了避免堆外内存浪费或者过度使用,GeaFlow 对各模块的堆外内存使用统一管理。内存主要分 3 个部分:shuffle、state 和 default。 Default 是预留空间,可动态被 shuffle 或者 state 模块占用。 如下图所示: - - -![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583816-33d13188-a315-4079-bb90-89c58d5a4e82.webp) - -state 和 shuffle 默认独占 10%的堆外内存, default 则占用 80%。 - -### - -🍀RBO 规则扩展:新增 EdgeLabel 和 IdFilter 优化规则 - -- Edge Label 简化:针对 Match 匹配语句后接 Where 子句对边进行过滤的查询进行执行计划简化。 -- ID Filter 简化:针对 Match 匹配语句中对点的 id 进行过滤的查询进行执行计划简化。 -- 规则在默认情况下生效,使用示例如下: - -```plain - -// GQL示例1(MatchIdFilterSimplifyRule优化) -MATCH (a:user where id = 1)-[e:knows]-(b:user) -RETURN a.id as a_id, e.weight as weight, b.id as b_id - -// 原执行计划 -LogicalProject(a_id=[$0.id], weight=[$1.weight], b_id=[$2.id]) - LogicalGraphMatch(path=[(a:user) where =(a.id, 1) -[e:knows]-(b:user)]) - LogicalGraphScan(table=[default.g0]) - -// MatchIdFilterSimplifyRule优化后执行计划,vertex id转移到MatchVertex中进行过滤 -LogicalProject(a_id=[$0.id], weight=[$1.weight], b_id=[$2.id]) - LogicalGraphMatch(path=[(a:user)-[e:knows]-(b:user)]) - LogicalGraphScan(table=[default.g0]) - -// GQL示例2(MatchEdgeLabelFilterRemoveRule优化) -MATCH (a:user where id = 1)-[e:knows]-(b:user) WHERE e.~label = 'knows' -or e.~label = 'created' -RETURN a.id as a_id, e.weight as weight, b.id as b_id - -// 原执行计划 -LogicalProject(a_id=[$0.id], weight=[$1.weight], b_id=[$2.id]) - LogicalGraphMatch(path=[(a:user) where =(a.id, 1) -[e:knows]-(b:user) where OR(=($1.~label, _UTF-16LE'knows'), =($1.~label, _UTF-16LE'created')) ]) - LogicalGraphScan(table=[default.g0]) - -// MatchEdgeLabelFilterRemoveRule优化后执行计划,针对edge label的过滤转移到MatchEdge中进行 -LogicalProject(a_id=[$0.id], weight=[$1.weight], b_id=[$2.id]) - LogicalGraphMatch(path=[(a:user) where =(a.id, 1) -[e:knows]-(b:user)]) - LogicalGraphScan(table=[default.g0]) -``` - -### 🍀支持增量匹配算子 - -在动态图场景中,数据往往不是全部一批到来,而会源源不断地进行输入和计算,图的点边不断地从数据源读取,进行构图,从而形成增量图。对于某一批新增的点边,构成了一个新的版本的图,如果重新对全图(即当前所有点边)进行图遍历,开销较大。当前版本中使用了一种基于子图扩展的增量图匹配方法,通过子图扩展,来扩展每次增量的触发起点,尽可能地只对增量的数据进行查询: - -- 支持增量匹配逻辑,通过反向传播来扩展每次 window 新增数据的触发起点。 - -![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583809-8c862fd5-1d7f-4b8b-9b34-e63d1692deb6.webp) - -- 通过在 dsl 或高阶代码中设置`geaflow.dsl.graph.enable.incr.traversal`参数为 true 开启增量计算逻辑。 - -开启示例如下: - -```plain -QueryTester.build() - .withConfig(FrameworkConfigKeys.BATCH_NUMBER_PER_CHECKPOINT.getKey(), "1") - .withQueryPath(queryPath) - .withConfig(DSLConfigKeys.ENABLE_INCR_TRAVERSAL.getKey(), "true") - .withConfig(DSLConfigKeys.TABLE_SINK_SPLIT_LINE.getKey(), lineSplit) - .execute(); -``` - -**** - -## ✨ 历史版本回顾 - -我们回顾上一版,v0.6.3 版本在 v0.5.2 版本基础之上实现了一些重要功能特性,其中包括: - -- 实现了 OSS/DFS/S3 标准化接口,接入主流云存储:支持开源 OSS/DFS/S3 等 remote 分布式存储,同时标准化了接口,便于按需快速扩展其它外部分布式存储系统。 -- 支持标准 Match 算子:支持标准 ISO-GQL Match 语法及算子。 -- Aliyun ODPS 表的读写能力:支持 Aliyun ODPS 插件,提供 ODPS 表的读写能力。 -- 兼容开源 Ray 生态:引擎支持开源 Ray 版本,同时 console 平台支持将任务提交到 Ray 集群。 -- DSL 支持时序能力:DSL 侧支持时间感知的数据处理、提供动态图与时序结合的能力。 -- Shuffle 支持反压优化:通过滑动窗口的方式进行数据传输和实现反压能力。 -- GeaFlow 流图性能测试:新增了 GeaFlow Vs Spark/Flink 的 demo 和性能测试报告。 - -## ✨ 致谢 - -感谢所有贡献者使这次发布成为可能! - - - -![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583799-ea7feea1-1279-4089-bbb9-61fc0b6331b2.webp) +### 🍀 Graph Data Warehouse Capability Expansion: Supports Relational Access to Graph Entities + +In traditional relational databases, multi-table JOIN queries often require complex SQL statements, hindering development efficiency and struggling with performance for ad-hoc analysis of massive interconnected data. Addressing this pain point, GeaFlow introduces innovative SQL support that automatically translates complex SQL JOIN statements into graph path queries—**no Graph Query Language (GQL) needed**. This version offers two SQL syntax features: + +1. **Querying Vertices/Edges as Source Tables:** + - The `TableScanToGraphRule` identifies vertices/edges within SQL statements, enabling users to query graph entities like standard SQL table scans. + - Example (`student` is a vertex entity in graph `g_student`): + ```sql + USE GRAPH g_student; + INSERT INTO table_scan_001_result + SELECT AVG(age) AS avg_age FROM student; + ``` + +2. **Equi-Joins on Vertices and Associated Edges:** + - The `TableJoinTableToGraphRule` identifies JOIN operators in SQL, enabling relational-style joins directly on graph data. + - Example (`student` is a vertex, `selectCourse` is an outgoing edge associated with `student`): + ```sql + USE GRAPH g_student; + INSERT INTO vertex_join_edge_001_result + SELECT s.id, sc.targetId, sc.ts + FROM student s JOIN selectCourse sc ON s.id = sc.srcId + WHERE s.id < 1004 + ORDER BY s.id, targetId; + ``` + +### 🍀 Unified Memory Manager Support + +Previously, GeaFlow lacked centralized memory management. Apart from RocksDB using off-heap memory, all memory was on-heap, leading to significant GC pressure under heavy loads. Network shuffling also involved multiple data copies, reducing efficiency. + +The new **Unified Memory Manager** governs memory allocation, release, and monitoring across modules (shuffle, state, framework) for both on-heap and off-heap memory. Key capabilities include: +- **Unified On-heap/Off-heap Management:** Abstracts memory access via `MemoryView`, shielding users from the underlying type. Off-heap chunks are pre-allocated (default chunk size: 30% of `-Xmx`, configurable via `off.heap.memory.chunkSize.MB`) and support dynamic resizing. +- **Compute & Storage Memory Unification:** Memory is divided into three pools: + - **Shuffle:** Dedicated 10% of off-heap. + - **State:** Dedicated 10% of off-heap. + - **Default:** 80% of off-heap; dynamically usable by Shuffle or State as needed. + ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583816-33d13188-a315-4079-bb90-89c58d5a4e82.webp) + +### 🍀 RBO Rule Extensions: New EdgeLabel and IdFilter Optimization Rules + +Two new rule-based optimization (RBO) rules simplify execution plans for common GQL `MATCH` patterns: +1. **`MatchEdgeLabelFilterRemoveRule`:** Simplifies plans where `WHERE` filters edges by label (`~label`) *after* the `MATCH` clause. Pushes the filter into the edge matching step. +2. **`MatchIdFilterSimplifyRule`:** Simplifies plans where `MATCH` patterns filter vertices by `id`. Pushes the `id` filter into the vertex matching step. +* Enabled by default. +* **Example 1 (IdFilter Simplification):** + ```cypher + // Original GQL + MATCH (a:user WHERE id = 1)-[e:knows]-(b:user) + RETURN a.id AS a_id, e.weight AS weight, b.id AS b_id + + // Optimized Plan: id filter moved into MatchVertex + LogicalProject(a_id=[$0.id], weight=[$1.weight], b_id=[$2.id]) + LogicalGraphMatch(path=[(a:user)-[e:knows]-(b:user)]) + LogicalGraphScan(table=[default.g0]) + ``` +* **Example 2 (EdgeLabel Filter Removal):** + ```cypher + // Original GQL + MATCH (a:user WHERE id = 1)-[e:knows]-(b:user) WHERE e.~label = 'knows' OR e.~label = 'created' + RETURN a.id AS a_id, e.weight AS weight, b.id AS b_id + + // Optimized Plan: edge label filter moved into MatchEdge + LogicalProject(a_id=[$0.id], weight=[$1.weight], b_id=[$2.id]) + LogicalGraphMatch(path=[(a:user WHERE =(a.id, 1)) -[e:knows]-(b:user)]) + LogicalGraphScan(table=[default.g0]) + ``` + +### 🍀 Support for Incremental Matching Operator + +In dynamic graph scenarios, data arrives continuously. New points/edges incrementally build the graph. Reprocessing the *entire* graph for each update is costly. v0.6.4 introduces an **Incremental Matching Operator** based on **subgraph expansion**: +- Utilizes **backpropagation** to determine starting points triggered by each window of new data, minimizing processing to only affected regions. + ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583809-8c862fd5-1d7f-4b8b-9b34-e63d1692deb6.webp) +- Enable via DSL or high-level code by setting `geaflow.dsl.graph.enable.incr.traversal` to `true`. + ```java + QueryTester.build() + .withConfig(FrameworkConfigKeys.BATCH_NUMBER_PER_CHECKPOINT.getKey(), "1") + .withQueryPath(queryPath) + .withConfig(DSLConfigKeys.ENABLE_INCR_TRAVERSAL.getKey(), "true") // Enable Incremental + .withConfig(DSLConfigKeys.TABLE_SINK_SPLIT_LINE.getKey(), lineSplit) + .execute(); + ``` + +## ✨ Previous Version Recap (v0.6.3) + +Key features introduced in v0.6.3 (building on v0.5.2) include: +- ✨ **OSS/DFS/S3 Standardized Interface:** Access for mainstream cloud storage; extensible architecture. +- ✨ **Standard `MATCH` Operator:** Full support for ISO-GQL `MATCH` syntax. +- ✨ **Aliyun ODPS Read/Write:** Plugin support for Alibaba Cloud ODPS tables. +- ✨ **Open-Source Ray Compatibility:** Engine supports open-source Ray; console submits tasks to Ray clusters. +- ✨ **DSL Temporal Capabilities:** Time-aware data processing & dynamic graph + temporal features. +- ✨ **Shuffle Backpressure Optimization:** Sliding window-based data transfer and backpressure implementation. +- ✨ **GeaFlow Stream-Graph Benchmarks:** Added performance demo/reports vs. Spark/Flink. + +## ✨ Acknowledgments + +Thank you to all contributors for making this release possible! +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583799-ea7feea1-1279-4089-bbb9-61fc0b6331b2.webp) \ No newline at end of file diff --git a/community/en/community.md b/community/en/community.md index 156b1b832..a41329f7c 100644 --- a/community/en/community.md +++ b/community/en/community.md @@ -12,7 +12,7 @@ Anyone who has contributed to the project can be included. Rules for handling security issues and publicly disclosed CVEs. -### Howto Contribute +### How to Contribute Introduction on how to contribute. diff --git a/community/en/how_to_release.md b/community/en/how_to_release.md index 144b687eb..47315b269 100644 --- a/community/en/how_to_release.md +++ b/community/en/how_to_release.md @@ -2,7 +2,7 @@ title: How to Release --- -This document outlines the process for a release manager to publish a new version of Apache Geaflow. +This document outlines the process for a release manager to publish a new version of Apache Geaflow (Incubating). ## Introduction diff --git a/community/zh/community.md b/community/zh/community.md index 40c5ec1cf..6dcd2104c 100644 --- a/community/zh/community.md +++ b/community/zh/community.md @@ -9,7 +9,7 @@ ### Security 针对安全方面的处理规则以及已经处理的公开的 CVE -### Howto Contribute +### How to Contribute 介绍如何贡献 ### Feature Request diff --git a/community/zh/how_to_release.md b/community/zh/how_to_release.md index b3529e5ec..11f8af0a0 100644 --- a/community/zh/how_to_release.md +++ b/community/zh/how_to_release.md @@ -2,7 +2,7 @@ title: How to Release --- -This document outlines the process for a release manager to publish a new version of Apache Geaflow. +This document outlines the process for a release manager to publish a new version of Apache Geaflow (Incubating). ## Introduction diff --git a/blog/1.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/1.md similarity index 100% rename from blog/1.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/1.md diff --git a/blog/10.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/10.md similarity index 100% rename from blog/10.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/10.md diff --git a/blog/11.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/11.md similarity index 100% rename from blog/11.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/11.md diff --git a/blog/12.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/12.md similarity index 100% rename from blog/12.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/12.md diff --git a/blog/13.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/13.md similarity index 100% rename from blog/13.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/13.md diff --git a/blog/14.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/14.md similarity index 100% rename from blog/14.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/14.md diff --git a/blog/15.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/15.md similarity index 100% rename from blog/15.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/15.md diff --git a/blog/16.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/16.md similarity index 100% rename from blog/16.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/16.md diff --git a/blog/17.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/17.md similarity index 100% rename from blog/17.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/17.md diff --git a/blog/18.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/18.md similarity index 100% rename from blog/18.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/18.md diff --git a/blog/19.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/19.md similarity index 100% rename from blog/19.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/19.md diff --git a/blog/2.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/2.md similarity index 100% rename from blog/2.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/2.md diff --git a/blog/20.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/20.md similarity index 100% rename from blog/20.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/20.md diff --git a/blog/21.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/21.md similarity index 100% rename from blog/21.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/21.md diff --git a/blog/22.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/22.md similarity index 100% rename from blog/22.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/22.md diff --git a/blog/23.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/23.md similarity index 100% rename from blog/23.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/23.md diff --git a/blog/24.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/24.md similarity index 100% rename from blog/24.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/24.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/blog/27.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/27.md new file mode 100644 index 000000000..3990f4846 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/27.md @@ -0,0 +1,230 @@ +--- +title: "Stream4Graph:动态图上的增量计算" +date: "2025-3-11" +--- + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/8237/1740982328260-3a0ff09e-920b-4f55-af14-326b5d0a358c.png) + +> 作者:张奇 + +众所周知,当我们需要对数据做关联性分析的时候,一般会采用表连接(SQL join)的方式完成。但是 SQL join 时的笛卡尔积计算需要维护大量的中间结果,从而对整体的数据分析性能带来巨大影响。相比而言,基于图的方式维护数据的关联性,原本的关联性分析可以转换为图上的遍历操作,从而大幅降低数据分析的成本。 + +然而,随着数据规模的不断增长,以及对数据处理更强的实时性需求,如何高效地解决大规模图数据上的实时计算问题,就变得越来越紧迫。传统的计算引擎,如 Spark、Flink 对于图数据的处理已经逐渐不能满足业务日益增长的诉求,因此设计一套面向大规模图数据的实时处理引擎,将会对大数据处理技术革新带来巨大的帮助。 + +蚂蚁图计算团队开源的流图计算引擎[GeaFlow](https://github.com/TuGraph-family/tugraph-analytics),结合了图处理和流处理的技术优势,实现了动态图上的增量计算能力,在高性能关联性分析的基础上,进一步提升了图计算的实时性。接下来向大家介绍图计算技术的特点,业内如何解决大规模实时图计算问题,以及 GeaFlow 在动态图上的计算性能表现。 + + + +## 1. 图计算 + +图是一种数学结构,由节点和边组成。节点代表各种实体,比如人、地点、事物或概念,而边则表示这些节点之间的关系。例如: + +- 社交媒体:节点可以代表用户,边可以表示朋友关系。 +- 网页:节点代表网页,边代表超链接。 +- 交通网络:节点代表城市,边代表道路或航线。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740386529457-b43e2d49-6000-4acf-862c-314ae4f23dbc.png) + +图本身代表了节点与节点之间的链接关系,而针对这些关系,我们可以利用图中的节点和边来进行信息处理、分析和挖掘,帮助我们理解复杂系统中的关系和模式。在图上开展的计算活动就是图计算。图计算有很多应用场景,比如通过社交网络分析可以识别用户之间的联系,发现社群结构;通过分析网页间的链接关系来计算网页排名;通过用户的行为和偏好构建关系图,推荐相关内容和产品。 + + + +我们就以简单的社交网络分析算法,弱联通分量(Weakly Connected Components, WCC)为例。弱联通分量可以帮助我们识别用户之间的“朋友圈”或“社区”,比如某个社交平台上,一群用户通过点赞、评论或关注形成一个大的弱联通分量,而某些用户可能没有连接到这个大分量,形成更小的弱联通分量。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740386998582-16f67c8e-ee45-48d2-bb5f-f45ec3956273.png) + +如果仅仅针对上面这张小图来构建弱联通分量算法,那么非常简单,我们只需要在个人 PC 上构建简单的点边结构然后走图遍历即可。但如果图的规模扩展的千亿甚至万亿,这时就需要用到大规模分布式图计算引擎来处理了。 + +## 2. 分布式图计算:Spark GraphX + +针对图的处理一般有图计算引擎和图数据库两大类,图数据库有Neo4j‌、TigerGraph‌ 等,图计算引擎有 Spark GraphX、Pregel 等。在本文我们主要讨论图计算引擎,以 Spark GraphX 为例,Spark GraphX 是 Apache Spark 的一个组件,专门用于图计算和图分析。GraphX 结合了 Spark 的强大数据处理能力与图计算的灵活性,扩展了 Spark 的核心功能,为用户提供了一个统一的 API,便于处理图数据。 + + + +那么在 Spark GraphX 上是如何处理图算法的呢?GraphX 通过引入一种点和边都附带属性的有向多图扩展了 Spark RDD 这种抽象数据结构,为用户提供了一个类似于 Pregel 计算模型的以点为中心的并行抽象。用户需要为 GraphX 提供原始图 graph、初始消息 initialMsg、核心计算逻辑 vprog、发送消息控制组件 sendMsg、合并消息组件 mergeMsg,计算开始时,GraphX 初始阶段会激活所有点进行初始化,然后按照用户提供的发送消息组件确定接下来向那些点发送消息。在之后的迭代里,只有收到消息的点才会被激活,进行接下来的计算,如此循环往复直到链路中没有被新激活的点或者到达最大迭代次数,最后输出计算结果。 + + + +```scala + def apply[VD: ClassTag, ED: ClassTag, A: ClassTag] + (graph: Graph[VD, ED], + initialMsg: A, + maxIterations: Int = Int.MaxValue, + activeDirection: EdgeDirection = EdgeDirection.Either) + (vprog: (VertexId, VD, A) => VD, + sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)], + mergeMsg: (A, A) => A) + : Graph[VD, ED] +{ + var g = graph.mapVertices((vid, vdata) => vprog(vid, vdata, initialMsg)) + + // compute the messages + var messages = GraphXUtils.mapReduceTriplets(g, sendMsg, mergeMsg) + + // Loop + var prevG: Graph[VD, ED] = null + var i = 0 + while (isActiveMessagesNonEmpty && i < maxIterations) { + // Receive the messages and update the vertices. + prevG = g + g = g.joinVertices(messages)(vprog) + graphCheckpointer.update(g) + + // Send new messages, skipping edges where neither side received + // a message. + messages = GraphXUtils.mapReduceTriplets( + g, sendMsg, mergeMsg, Some((oldMessages, activeDirection))) + } +} +``` + +总的来说,用户首先需要将存储介质中原始的表结构数据转换为 GraphX 中的点边数据类型,然后交给 Spark 进行处理,这是针对静态图进行离线处理。但是我们知道,现实世界中,图数据的规模和数据内节点之间的关系都不是一成不变的,并且在大数据时代其变化非常快。如何实时高效的处理不断变化的图数据(动态图),是一个值得深思的问题。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740454568183-6d42716b-fc84-41a8-945c-c97b81d61135.png) + +## 3. 动态图计算:Spark Streaming + +针对动态图的处理,常见的解决方案是 Spark Streaming 框架,它可以从很多数据源消费数据并对数据进行处理。它是是 Spark 核心 API 的一个扩展,可以实现高吞吐量的、具备容错机制的实时流数据的处理。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740470405961-05389aa3-1b67-4cdf-9c65-ea28641ef89c.png) + +如上图所示是 Spark Streaming 对实时数据进行处理的流程。首先 Spark 中的每个 Receiver 接收到实时消息流后,对实时消息进行解析和切分,之后将生成的图数据存储在每个 Executor 中。每当数据累积到一定的批次,就会触发一次全量计算,最后将计算出的结果输出给用户,这也称之为基于快照的图计算方案。 + + + +但这种方案有一个比较大的缺点,就是它存在着重复计算的问题,假如我们需要以 1 小时一个窗口做一次计算,那么在使用 Spark 进行计算时,不仅要将当前窗口的数据计算进去,历史所有数据也需要进行回溯,存在大量重复计算,这样做效率不高,因此我们需要一套能够进行增量计算的图计算方案。 + + + +## 4. 动态图增量计算:GeaFlow + +我们知道在传统的流计算引擎中,如 Flink,其处理模型允许系统能够处理不断流入的数据事件。处理每个事件时,Flink 可以评估变化并仅针对变化的部分执行计算。这意味着在增量计算过程中,Flink 会关注最新到达的数据,而不是整个数据集。于是受到 Flink 增量计算的启发,我们自研了增量图计算系统 GeaFlow(也叫流图计算引擎),能够很好的支持增量图迭代计算。 + + + +那么 GeaFlow 是如何实现增量图计算的呢?首先,实时数据通过 connector 消息源输入的 GeaFlow 中,GeaFlow 依据实时数据,生成内部的点边结构数据,并且将点边数据插入进底图中。当前窗口的实时数据涉及到的点会被激活,触发图迭代计算。 + +这里以 WCC 算法为例,对联通分量算法而言,在一个时间窗口内每条边对应的 src id 和 tar id 对应的顶点会被激活,第一次迭代需要将其 id 信息通知其邻居节点。如果邻居节点收到消息后,发现需要更新自己的信息,那么它需要继续将更新消息通知给它的邻居节点;如果说邻居节点不需要更新自己的信息,那么它就不需要通知其邻居节点,它对应的迭代终止。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740471552771-36ee8f06-d58e-4cb7-914d-c44e151575a0.png) + +## 5. GeaFlow 架构简析 + +GeaFlow 引擎主要由三大主要部分组成,DSL、Framework 和 State,同时向上为用户提供了 Stream API、静态图 API 和动态图 API。DSL 主要负责图查询语言 SQL+ISO/GQL 的解析和执行计划的优化,同时负责 schema 的推导,也向外部承接了多种 Connector,比如 hive、hudi、kafka、odps 等。Framework 层负责运行时的调度和容灾,shuffle 以及框架内各个组件的管理协调。State 层负责存储底层图数据和数据的持久化,同时也负责索引、下推等众多性能优化工作。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1739276186744-96d40e95-4e29-48ef-8892-1b7dfa60c726.png) + +## 6. GeaFlow 性能测试 + +为了验证 GeaFlow 的增量图计算性能,我们设计了这样的实验。一批数据按照固定时间窗口实时输入到计算引擎中,我们分别用 Spark 和 GeaFlow 对全图做联通分量算法计算,比较两者计算耗时。实验在 3 台 24 核内存 128G 的机器上开展,使用的数据集是公开数据集[soc-Livejournal](https://snap.stanford.edu/data/soc-LiveJournal1.html),测试的图算法是弱联通分量算法。我们以 50w 条数据作为一个计算窗口,每输入到引擎中 50w 条数据,就触发一次图计算。 + +Spark 作为批处理引擎,对于每一批窗口来的数据,不管窗口规模是大是小,都需要对增量图数据连同历史图数据进行全量计算。在 Spark 上,可以直接调用 Spark GraphX 内部内置的 WCC 算法进行计算。 + +```scala +object SparkTest { + + def main(args: Array[String]): Unit = { + + val iter_num: Int = args(0).toInt + val parallel: Int = args(1).toInt + + val spark = SparkSession.builder.appName("HDFS Data Load").config("spark.default.parallelism", args(1)).getOrCreate + + val sc = new JavaSparkContext(spark.sparkContext) + val graph = GraphLoader.edgeListFile(sc, "hdfs://rayagsecurity-42-033147014062:9000/" + args(2), numEdgePartitions = parallel) + + val result = graph.connectedComponents(10) + handleResult(result) + print("finish") + + } + + def handleResult[VD, ED](graph: Graph[VD, ED]): Unit = { + graph.vertices.foreachPartition(_.foreach(tuple => { + + })) + } +} +``` + +GeaFlow 上支持 SQL+ISO/GQL 的图查询语言,我们使用图查询语言调用 GeaFlow 内置的增量联通分量图算法进行测试,图查询语言代码如下: + +```sql +CREATE TABLE IF NOT EXISTS tables ( + f1 bigint, + f2 bigint +) WITH ( + type='file', + geaflow.dsl.window.size='16000', + geaflow.dsl.column.separator='\t', + test.source.parallel = '32', + geaflow.dsl.file.path = 'hdfs://xxxx:9000/com-friendster.ungraph.txt' +); + +CREATE GRAPH modern ( + Vertex v1 ( + id int ID + ), + Edge e1 ( + srcId int SOURCE ID, + targetId int DESTINATION ID + ) +) WITH ( + storeType='memory', + shardCount = 256 +); + +INSERT INTO modern(v1.id, e1.srcId, e1.targetId) +( + SELECT f1, f1, f2 + FROM tables +); + +INSERT INTO modern(v1.id) +( + SELECT f2 + FROM tables +); + +CREATE TABLE IF NOT EXISTS tbl_result ( + vid bigint, + component bigint +) WITH ( + ignore='true', + type ='file' +); + +use GRAPH modern; + +INSERT INTO tbl_result +CALL inc_wcc(10) YIELD (vid, component) +RETURN vid, component +; +``` + +下图是对两者进行联通分量算法实验时得到的实验结果。以 50w 条数据为一个窗口进行迭代计算,Spark 中存在大量的重复计算,因为其还要回溯全量的历史数据进行计算。而 GeaFlow 只会激活当前窗口中涉及到的点边进行增量计算,计算可在秒级别完成,每个窗口的计算时间基本稳定。随着数据量的不断增大,Spark 进行计算时所需要回溯的历史数据就越多,在其机器容量没有达到上限的情况下,其计算时延和数据量呈正相关分布。相同情况下 GeaFlow 的计算时间也会略微增大,但基本可以在秒级别完成。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/314644/1740537488877-eb89b886-7c4c-4c5a-8e27-06356b15afa0.png) + +## 7. 总结 + +传统的图计算方案(如 Spark GraphX)在近实时场景中存在重复计算问题,受 Flink 流处理模型和传统图计算的启发,我们给出了一套能够支持增量图计算的方案。总的来说 GeaFlow 主要有以下几个方面的优势: + +1. GeaFlow 在处理增量实时计算时,性能优于 Spark Streaming + GraphX 方案,尤其是在大规模数据集上。 +2. GeaFlow 通过增量计算避免了全量数据的重复处理,计算效率更高,计算时间更短性能不明显下降。 +3. GeaFlow 支持 SQL+GQL 混合处理语言,更适合开发复杂的图数据处理任务。 + +GeaFlow 项目代码已全部开源,我们完成了部分流图引擎基础能力的构建,未来希望基于 GeaFlow 构建面向图数据的统一湖仓处理引擎,以解决多样化的大数据关联性分析诉求。同时我们也在积极筹备加入 Apache 基金会,丰富大数据开源生态,因此非常欢迎对图技术有浓厚兴趣同学加入社区共建。 + +社区中有诸多有趣的工作尚待完成,你可以从如下简单的「Good First Issue」开始,期待你加入同行。 + +- 支持 Paimon Connector 插件,连接数据湖生态。([Issue 361](https://github.com/TuGraph-family/tugraph-analytics/issues/361)) +- 优化 GQL match 语句性能。([Issue 363](https://github.com/TuGraph-family/tugraph-analytics/issues/363)) +- 新增 ISO/GQL 语法,支持 same 谓词。([Issue 368](https://github.com/TuGraph-family/tugraph-analytics/issues/368)) +- ... + +## 参考链接 + +1. GeaFlow 项目地址:[https://github.com/TuGraph-family/tugraph-analytics](https://github.com/TuGraph-family/tugraph-analytics) +2. soc-Livejournal 数据集地址:[https://snap.stanford.edu/data/soc-LiveJournal1.html](https://snap.stanford.edu/data/soc-LiveJournal1.html) +3. GeaFlow Issues:[https://github.com/TuGraph-family/tugraph-analytics/issues](https://github.com/TuGraph-family/tugraph-analytics/issues) diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/blog/28.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/28.md new file mode 100644 index 000000000..47d1d328a --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/28.md @@ -0,0 +1,166 @@ +--- +title: 流图计算之增量match原理与应用 +date: 2025-6-3 +--- + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/23857192/1743162676746-973d8e75-11b5-43d7-8832-724e7332b964.png) + +## 问题背景 + +在流式计算中,数据往往不是全部一批到来,而会源源不断地进行输入和计算,在图计算/图查询领域,也存在类似的场景,图的点边不断地从数据源读取,进行构图,从而形成增量图。在增量图查询中,图随时发生着变化,在不同的图版本中,进行图查询的结果也会有所不同。对于某一次新增的点边,构成了一个新的版本的图,如果重新对全图(即当前所有点边)进行图遍历,开销较大,并且也会和历史数据有重复。由于历史的数据已经计算过一遍,理想情况下,只需要对增量所影响的部分进行计算/查询,而不需要对全图重新进行查询。 + + + +GQL(Graph Query Language)是国际标准化组织(ISO)为标准化图查询语言所制定的一个标准,用于在图上执行查询的语言。Geaflow 是蚂蚁图计算团队开源的流图计算引擎,专注于处理动态变化的图数据,支持大规模、高并发的实时图计算场景。本文将介绍在 Geaflow 引擎中,对增量图使用 GQL 进行增量 Match 的方法,目的尽可能地只对增量的数据进行查询,避免冗余的全量计算。 + +![画板](https://intranetproxy.alipay.com/skylark/lark/0/2025/jpeg/23857192/1741574572676-ff7e2c56-14d0-470c-b21d-604f928c6ec9.jpeg) + +## 当前问题 + +Geaflow 引擎基于点中心框架(vertex center),通过迭代的方式,每一轮迭代中,每个点向其他点发送消息,并在下一轮收到消息时进行处理、分析。在 Geaflow 的框架中,GQL 的查询需要从前往后进行 Traversal 遍历走图,即从起始节点开始出发,进行扩散,依次进行点边匹配,直到匹配到所需要的查询 pattern。在动态图里场景,如果只使用当前批次新增的点边触发计算,增量的结果会有缺失,例如下面例子所示。 + +
+画板
+ +如上问题关键在于如果只考虑增量的部分,则点 A1 无法触发计算,但是点 A1 实际包含于增量结果中。所以需要设法让点 A1 参与计算,我们考虑一种从新增点扩充子图的方法,将 a 触发。将整个查询分为 2 个阶段,Evolve 扩展阶段和 Traversal 阶段。在 Evolve 阶段中,从起始点开始,向邻居发送 EvolveMessage,后续的 iteration 中,收到 EvolveMessage 的点加入到 EvolveVertices 集合中。而后的 Traversal 阶段则会使用 EvolveVertices 里的点触发遍历,即表示当前窗口的触发点。 + +## 方案步骤 + +整体流程示例图如下: + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/23857192/1741599519420-37fd1d9f-6623-44b3-87e4-5ac5275b876f.png) + +1. 首先得到 query 的计划的迭代次数 N,需向外扩充 N-1 度(maxEvolveIteration=N-1),即可覆盖当前 query。框架的最大迭代数将设置为 N + maxEvolveIteration(N>2) + +```sql +例如 +match(a)迭代数为1,此时不需要Evolve逻辑 +match(a)-[e]->(b)迭代数为2,此时不需要Evolve逻辑 +match(a)-[e]->(b)->[e2]->(c)迭代数为3 最大迭代数5 +``` + +2. 由于当迭代数较大时,扩充子图可能可能扩充到全图,设置一个阈值 T, 当 N<=T 才执行这个增量逻辑。 +3. 在每个 window 数据加入图中后,对于新增的点边,每个点会向邻居发送 EvolveVertexMessage,执行 N-1 次迭代,将 N-1 度子图扩充进来。即当前迭代小于 maxEvolveIteration(N-1)时,发送 EvolveVertexMessage。 +4. 每个点在向邻居点发送 EvolveMessage 时,需要将自己的 id 放在消息中,收到消息的点记录其发送点的 id, 添加到 targetIdList,在后续 traversal 阶段中使用。此步骤作用是下游节点将增量信息反向传递给上游,上游点在进行遍历时可以得知下游的增量影响部分,从而只遍历这些含有动态信息的下游点,而不需要再遍历所有邻居点。 + +反向扩展的主要逻辑在 GeaFlowDynamicVCTraversalFunction 中,GeaFlowDynamicVCTraversalFunction 继承自 IncVertexCentricFunction,在 Geaflow 中 IncVertexCentricFunction 是一个表示增量 VC 方法(点中心)的接口,在每次迭代中,都会对当前收到消息的点进行触发,执行 compute 方法中的逻辑。 + +```java +@Override +public void compute(Object vertexId, Iterator messageIterator) { + TraversalRuntimeContext context = commonFunction.getContext(); + if (needIncrTraversal()) { + long iterationId = context.getIterationId(); + // sendEvolveMessage to evolve subGraphs when iterationId is less than the plan iteration + if (iterationId < queryMaxIteration - 1) { + evolveIds.add(vertexId); + sendEvolveMessage(vertexId, context); + return; + } + + if (iterationId == queryMaxIteration - 1) { + // the current iteration is the end of evolve phase. + evolveIds.add(vertexId); + return; + } + // traversal + commonFunction.compute(vertexId, messageIterator); + + } else { + commonFunction.compute(vertexId, messageIterator); + } +} +``` + +具体示例如下: + +![画板](https://intranetproxy.alipay.com/skylark/lark/0/2024/jpeg/23857192/1734590557540-5f3f4528-fa07-4208-8425-bc514ea5e06b.jpeg) + +总结进行 Evolve 扩展的条件: + +1. query 的迭代次数>2:当 match 小于两跳时不需要 Evolve。 +2. query 的迭代次数<=Threshold:如果迭代数太多可能扩展到全图。 +3. windowId>1:第一次构图不需要进行 Evolve 阶段。 +4. GQL 语句中没有起始点:如果有起始点,则只需使用起始点计算,不需要扩展子图,例如查询语句 Match(a:person where a.id = 1))return a.name。 + +## Demo 示例 + +在 Geaflow 中,通过设置点表或边表的 windowSize 来默认实现增量逻辑,即每一批读入 windowSize 大小的点边数据,来构建增量图。 + +```sql +CREATE GRAPH modern ( + Vertex person ( + id bigint ID, + name varchar, + age int +), +Edge knows ( + srcId bigint SOURCE ID, + targetId bigint DESTINATION ID, + weight double +), +) WITH ( + storeType='rocksdb', + shardCount = 1 +); + +CREATE TABLE modern_vertex ( + id varchar, + type varchar, + name varchar, + other varchar +) WITH ( + type='file', + geaflow.dsl.file.path = 'resource:///data/incr_modern_vertex.txt', + geaflow.dsl.window.size = 20 +); + +CREATE TABLE modern_edge ( + srcId bigint, + targetId bigint, + type varchar, + weight double +) WITH ( + type='file', +geaflow.dsl.file.path = 'resource:///data/incr_modern_edge.txt', +geaflow.dsl.window.size = 3 +); + +INSERT INTO modern.person + SELECT cast(id as bigint), name, cast(other as int) as age + FROM modern_vertex WHERE type = 'person' +; + + +INSERT INTO modern.knows + SELECT srcId, targetId, weight + FROM modern_edge WHERE type = 'knows' +; + +CREATE TABLE tbl_result ( + a_id BIGINT, + b_id BIGINT, + c_id BIGINT, + d_id BIGINT +) WITH ( + type='file', +geaflow.dsl.file.path='${target}' +); + +USE GRAPH modern; + +INSERT INTO tbl_result + SELECT + a_id, b_id, c_id,d_id + FROM ( + MATCH (a:person) -[e:knows]->(b:person)<-[e2:knows]-(c:person)<-[e3:knows]-(d:person) where a.id!=c.id + RETURN a.id as a_id,b.id as b_id,c.id as c_id , d.id as d_id + ) +; +``` + +在 Demo 中,设置点 windowSize 为 20,边 windowSize 为 3,即构图时每个 window 导入 20 个点,3 条边。并执行 3 跳的查询语句。**示例 Demo 在 IncrMatchTest.java 中, 可直接运行执行 Demo。** + +## 总结和展望 + +在动态图/流图的场景中,图的点边是在实时变化的,在进行图查询时,对于不同窗口数据的图,我们往往可以根据一些历史信息,只对增量的部分触发计算,来进行增量地计算,避免触发全图的遍历。Geaflow 使用了一种基于子图扩展的增量 match 方法,应用于点中心分布式图计算框架,在动态图场景下进行增量的查询,未来期望实现更多更复杂场景下的增量匹配逻辑。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/blog/29.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/29.md new file mode 100644 index 000000000..c311f5a62 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/29.md @@ -0,0 +1,488 @@ +--- +title: GeaFlow 时序能力探秘——让时间数据焕发新生! +date: 2025-6-25 +--- + +## 为什么时序能力如此重要? + +** +**    在当今数字化时代,数据已经成为驱动决策和创新的核心资源。然而,数据不仅仅是静态的数字或关系,它会随着时间不断变化。无论是股票市场的实时波动、社交网络中的动态互动,还是物联网设备的状态更新,时间维度都是理解这些数据的关键,例如: + +- 在金融领域,交易的时间顺序决定了资金流动的方向。 +- 在社交网络中,用户的互动行为(如点赞、评论)随时间演变。 +- 在物联网中,传感器采集的数据带有时间戳,反映了设备状态的变化。 + + + +    尽管数据的重要性毋庸置疑,但传统的图数据分析工具往往难以应对动态数据的挑战 + +- **静态分析的局限性** + +    静态分析只能捕捉某一时刻的数据快照,无法反映数据的变化趋势。例如,在监控设备状态时,静态分析可能忽略设备从正常到故障的渐变过程。**** + +- **处理效率低下** + +    传统工具在处理大规模时序数据时效率低下,甚至无法满足实时需求。例如,在金融风控场景中,延迟可能导致错过关键的风险信号。 + +- **缺乏灵活性** + +    很多工具只支持单一类型的数据分析,无法同时处理实时流数据和历史数据。 + +    为了解决上述问题,GeaFlow 创新性地提出了时序图计算的概念。作为一款专为动态图数据处理设计的分布式流图计算引擎,GeaFlow 能够高效应对动态数据带来的挑战。针对实时变化的图结构,用户可以轻松进行图遍历、图匹配和图计算等操作,从而满足复杂场景下的分析需求。通过结合时间维度与动态图处理能力,GeaFlow 为实时数据分析提供了全新的解决方案,帮助用户更精准地挖掘动态数据中的价值。 + +## 什么是 GeaFlow? + +GeaFlow 是一个强大的分布式计算平台,结合了图计算和流处理的优势,能够高效处理动态图和时序数据。它不仅支持复杂的图算法,还具备实时分析能力,适用于各种动态场景。其主要特点包括: + +- 分布式架构 + +GeaFlow 基于分布式计算框架,能够高效处理超大规模的动态图数据(例如数十亿节点和边)。通过分区和副本机制,GeaFlow 确保了系统的高可用性和可扩展性。 + +- 流图与时序图的无缝集成 + +流图提供了动态数据的实时更新能力,而时序图则引入了时间维度的精确记录能力。两者的结合使得 GeaFlow 能够同时支持实时分析和历史追溯。 + +- 灵活的时间窗口机制 + +GeaFlow 支持基于时间窗口的动态分析,用户可以根据需求设置滑动窗口或固定窗口,分析特定时间段内的数据变化趋势。 + +## 流图与时序图的关系? + +### **1. 流图(Stream Graph)** + +流图是一种特殊的图结构,用于表示动态数据的演化过程。其核心特性包括: + +- **动态更新机制** + +流图支持节点和边的动态增删改操作,能够实时反映数据的变化。例如,在金融交易网络中,资金流动会生成新的边,而交易完成后某些边可能会消失。**** + +- **事件驱动模型** + +流图采用事件驱动模型,每条数据(节点或边)都被视为一个事件。通过事件驱动的方式,流图能够高效捕捉数据的变化。 + +- **增量计算** + +为了提高计算效率,流图采用了增量计算策略。即每次只计算新增或修改的部分,而不是重新计算整个图结构。例如,在社交网络中,当用户建立新的好友关系时,GeaFlow 只需更新相关部分,而无需重新计算整个网络。 + +### **2. 时序图(Temporal Graph)** + +时序图是一种带时间属性的图结构,每条边或节点都带有时间戳,用于记录事件发生的时间。其核心特性包括: + +- **时间戳管理** + +每条数据(节点或边)都分配一个时间戳,确保所有操作都能精确记录时间信息。例如,在社交网络中,好友关系的建立时间可以用一条带时间戳的边表示。**** + +- **时间窗口分析** + +时序图支持基于时间窗口的分析功能。例如,用户可以设置一个滑动窗口(如最近 5 分钟),并分析窗口内的数据变化趋势。**** + +- **历史追溯能力** + +时序图保留了历史数据的时间戳信息,支持回溯历史数据。例如,在金融风控场景中,用户可以通过时序图分析过去一段时间内的异常交易行为。 + +### **3. 流图与时序图的关系** + +流图和时序图并不是相互独立的概念,而是相辅相成的: + +- **流图是时序图的基础** + +流图提供了动态数据的实时更新能力,而时序图则在此基础上增加了时间维度的记录能力。换句话说,流图关注的是数据的实时变化,而时序图关注的是这些变化的时间属性。 + +- **时序图增强了流图的分析能力** + +通过引入时间戳,时序图使得流图能够进行更复杂的分析,例如时间窗口分析、趋势预测等。 + +### **4. GeaFlow 的实现细节** + +GeaFlow 通过以下技术手段实现了流图与时序图的无缝结合: + +- **时间戳分配机制** + +GeaFlow 为每条数据(节点或边)分配具体时间戳, 具体分为两种:处理时间和事件时间,确保所有数据都能精确记录时间信息。**** + +- **动态更新与历史保留** + +GeaFlow 支持实时更新流图结构,同时保留历史数据的时间戳信息,方便后续分析。例如,在金融交易网络中,GeaFlow 会记录每笔交易的时间戳,并将其存储在分布式存储系统中。**** + +- **时间窗口优化** + +GeaFlow 采用高效的索引机制和缓存策略,优化时间窗口分析的性能。例如,通过滑动窗口索引,GeaFlow 能够快速定位特定时间段内的数据。 + +## 示例 + +随着社交媒体平台的快速发展,用户之间的互动和关系链变得越来越复杂。为了更好地理解用户行为、优化推荐系统以及识别潜在的风险(如虚假账号或恶意传播),我们需要对用户之间的动态关系进行实时分析。 + +假设某社交平台希望实现一个功能:实时追踪用户的“间接好友关系”,即分析用户 A 是否通过某个共同好友 B 认识了另一个用户 C,并确保这种认识关系的时间顺序是合理的(A 先认识 B,B 再认识 C)。这一功能可以帮助平台发现潜在的社交圈层,优化好友推荐算法,同时为风险控制提供数据支持。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/220029/1749448299226-d23a5d01-5a5c-4cbb-bd99-f1e476f808be.png) + + +具体需求 + +**1、实时性要求** + +用户的行为(如添加好友)是动态变化的,需要实时捕获并更新用户关系图。 + +**2、时间敏感性** + +好友关系的建立是有时间顺序的,例如用户 A 在 10:00 添加了用户 B 为好友,而用户 B 在 10:05 添加了用户 C 为好友。只有在这种情况下,我们才能认为 A 通过 B 间接认识了 C。 + +**3、高效查询** + +平台需要快速查询出所有符合条件的三元关系(A -> B -> C),并将结果存储到文件系统中,供后续分析或可视化使用。 + +**4、扩展性** + +系统需要能够处理大规模用户数据,并支持未来的扩展需求,例如引入更多维度的关系权重(如亲密度、互动频率等)。 + +下面是完整的 DSL 示例: + +```plain +CREATETABLE vertex_source ( + id long, + name varchar, + age int +) WITH ( + type='kafka', + geaflow.dsl.kafka.servers ='localhost:9092', + geaflow.dsl.kafka.topic ='vertex_source', + geaflow.dsl.kafka.data.operation.timeout.seconds =5, + geaflow.dsl.time.window.size=10, + geaflow.dsl.start.time='${startTime}' +); + +CREATETABLE edge_source ( + src_id long, + tar_id long, + weight double, + ts long --knowing time +) WITH ( + type='kafka', + geaflow.dsl.kafka.servers ='localhost:9092', + geaflow.dsl.kafka.topic ='edge_source', + geaflow.dsl.kafka.data.operation.timeout.seconds =5, + geaflow.dsl.time.window.size=10, + geaflow.dsl.start.time='${startTime}' +); + +CREATE GRAPH community ( + Vertex person ( + id bigint ID, + name varchar, + age int + ), + Edge knows ( + src_id bigint SOURCE ID, + tar_id bigint DESTINATION ID, + weight double, + ts long TIMESTAMP--定义时间戳字段 + ) +) WITH ( + storeType='rocksdb' +); + +INSERTINTO community.person +SELECT id, name, age +FROM vertex_source; + +INSERTINTO community.knows +SELECT src_id, tar_id, weight, ts +FROM edge_source; + +CREATETABLE tbl_result ( + a_id long, + e1_ts long, + b_id long, + e2_ts long, + c_id long +) WITH ( + type='file', + geaflow.dsl.file.path='${target}' +); + +USE GRAPH community; + +INSERTINTO tbl_result +SELECT + a_id, + e1_ts, + b_id, + e2_ts, + c_id +FROM ( +MATCH (a:person)-[e1:knows]->(b:person)-[e2:knows]-> (c:person) +where e2.ts > e1.ts +RETURN a.id as a_id, e1.ts as e1_ts, b.id as b_id, e2.ts as e2_ts, c.id as c_id +); +``` + + + +上述 DSL(Domain-Specific Language)代码定义了一个基于图计算的流处理任务,主要目的是通过 Kafka 实时接收用户节点和关系边的数据流,构建一个动态社区图(`community`),并分析其中的时间敏感关系(如“谁先认识谁”)。最终结果将输出到文件系统中,用于进一步分析或可视化。 + +以下是对每个部分的详细解释: + + + +### **1. 点源表定义** + +```plain +CREATETABLE vertex_source ( + id long, + name varchar, + age int +) WITH ( + type='kafka', + geaflow.dsl.kafka.servers ='localhost:9092', + geaflow.dsl.kafka.topic ='vertex_source', + geaflow.dsl.kafka.data.operation.timeout.seconds =5, + geaflow.dsl.time.window.size=10, + geaflow.dsl.start.time='${startTime}' +); + +``` + +- **功能** + + - 定义了一个名为vertex_source的表,表示点数据的来源。 + \_ 数据通过 Kafka 消费,主题为 vertex_source \* 每条记录包含三个字段:id(节点唯一标识符)、name(节点名称)、age(节点年龄)。**** + +- **时间窗口:** + - 使用了滑动窗口机制,窗口大小为 10 秒(geaflow.dsl.time.window.size=10)。 + - 数据流按时间窗口分批处理,窗口内的数据会被用于后续的图构建和计算。**** +- **启动时间:** \* ${startTime}是一个占位符,表示流处理任务的起始时间。 + +### **2. 边源表定义** + +```plain +CREATE TABLE edge_source ( + src_id long, + tar_id long, + weight double, + ts long +) WITH ( + type='kafka', + geaflow.dsl.kafka.servers = 'localhost:9092', + geaflow.dsl.kafka.topic = 'edge_source', + geaflow.dsl.kafka.data.operation.timeout.seconds = 5, + geaflow.dsl.time.window.size=10, -- 滑动窗口大小 + geaflow.dsl.start.time='${startTime}' +); +``` + +- **功能:** + - 定义了一个名为 edge_source的表,表示边数据的来源。 + - 数据通过 Kafka 消费,主题为 edge_source + - 每条记录包含四个字段: + src_idtar_id:分别表示边的起点和终点;weight:边的权重;ts:边的时间戳,表示关系建立的时间。 +- **时间窗口:** + - 同样使用 10 秒的滑动窗口机制。 + +### **3. 图 Schema 定义** + +```plain +CREATE GRAPH community ( + Vertex person ( + id bigint ID, + name varchar, + age int + ), + Edge knows ( + src_id bigint SOURCE ID, + tar_id bigint DESTINATION ID, + weight double, + ts long TIMESTAMP-- 定义时间戳字段 + ) +) WITH ( + storeType='rocksdb' +); +``` + +- **功能:** + - 定义了一个名为community的图结构。 + - 图包含两种元素: + 1. **点类型 **person + - 每个点有三个属性:id(唯一标识符)、name(名称)、age(年龄)。 + 2. **边类型 **knows + - 每条边有四个属性:src_idtar_id:分别表示边的起点和终点;weight:边的权重;ts:边的时间戳,标记关系建立的时间。 +- **存储方式:** + - 图数据存储在 RocksDB 中(storeType='rocksdb')。 + +### **4. 插入点数据到图** + +```plain + +INSERTINTO community.person +SELECT id, name, age +FROM vertex_source; +``` + +- **功能:** + - vertex_source表中的点数据插入到图 communityperson点集合中。 + - 每条记录对应一个person节点。 + +### **5. 插入边数据到图** + +```plain + +INSERTINTO community.knows +SELECT src_id, tar_id, weight, ts +FROM edge_source; +``` + +- **功能:** + - edge_source表中的边数据插入到图 communityknows边集合中。 + - 每条记录对应一条 knows边。 + +### **6. 结果表定义** + +```plain +CREATE TABLE tbl_result ( + a_id long, + e1_ts long, + b_id long, + e2_ts long, + c_id long +) WITH ( + type='file', + geaflow.dsl.file.path='${target}' +); +``` + +- **功能:** + - 定义了一个名为 tbl_result 的结果表,用于存储最终的查询结果。 + - 结果表包含五个字段:a_id:路径起点节点的 ID;e1_ts:第一条边的时间戳;b_id:路径中间节点的 ID;e2_ts:第二条边的时间戳;c_id:路径终点节点的 ID. + - **存储方式:** + - 结果会写入文件系统,路径由 ${target} 指定。 + +### **7. 图查询与结果插入** + +```plain +USE GRAPH community; + +INSERT INTO tbl_result +SELECT + a_id, + e1_ts, + b_id, + e2_ts, + c_id +FROM ( + MATCH (a:person) -[e1:knows]->(b:person) -[e2:knows]-> (c:person) + WHERE e2.ts > e1.ts + RETURN a.id as a_id, e1.ts as e1_ts, b.id as b_id, e2.ts as e2_ts, c.id as c_id +); +``` + + + +- **功能:** + - 在图 community 上执行一个图查询。 + - 查询的目标是找到所有满足以下条件的三元组 (a, b, c) + 1. 存在一条路径 a -> b -> c,其中每条边的类型都是 knows + 2. 第二条边 e2 的时间戳晚于第一条边 e1 的时间戳(e2.ts > e1.ts)。 + - 返回的结果包括: + - 起点节点 a 的 ID。 + - 第一条边 e1 的时间戳。 + 中间节点 b 的 ID。 + - 第二条边 e2 的时间戳。 + - 终点节点 c 的 ID。 +- **结果存储:** + - 查询结果被插入到 tbl_result 表中,并最终写入文件系统。 + +### **8. 运行示例** + +假设社交平台中有以下用户和好友关系: + +- **用户信息:** + +```plain +{id: 1, name: "Alice", age: 25} +{id: 2, name: "Bob", age: 30} +{id: 3, name: "Charlie", age: 28} +``` + +- **好友关系:** + +```plain +{src_id: 1, tar_id: 2, weight: 0.8, ts: 1672531200} -- Alice 在 10:00 添加 Bob 为好友 +{src_id: 2, tar_id: 3, weight: 0.9, ts: 1672531210} -- Bob 在 10:05 添加 Charlie 为好友 +``` + +运行上述作业后,系统会输出以下结果: + +```plain +a_id | e1_ts | b_id | e2_ts | c_id +1 | 1672531200 | 2 | 1672531210 | 3 +``` + +这表明 Alice 先通过 Bob 认识了 Charlie。 + +### **9. 业务价值** + +1. **优化好友推荐** + 通过分析间接好友关系,平台可以向用户推荐更有可能成为好友的潜在对象。例如,Alice 可能会对 Charlie 感兴趣,因为他们有一个共同好友 Bob。 + +2. **识别社交圈层** + 通过挖掘三元关系,平台可以识别出紧密联系的社交圈层,从而为广告投放、活动推广等提供精准的目标群体。**** + +3. **风险控制** + 如果某些用户频繁出现在异常的三元关系中(例如短时间内大量新增好友),可能暗示存在虚假账号或恶意传播行为,平台可以及时采取措施。 +4. **用户体验提升** + 实时分析用户关系链,帮助平台更好地理解用户行为,从而提供更加个性化的服务。**** + +### **10. 技术优势** + +- **实时性**GeaFlow 支持毫秒级的数据流处理,确保用户关系图始终是最新的。 +- **时间敏感性:**通过时间戳字段,精确管理好友关系的时间顺序。 +- **灵活性:**SQL 驱动的开发模式,降低了开发门槛,提升了开发效率。 +- **可拓展性:**支持大规模动态图的增量计算,能够轻松应对社交平台的海量用户数据。 + +## GeaFlow 时序能力的核心亮点 + +### **1. 时间感知的数据处理** + +每条数据都带有时间戳,能够精确记录事件发生的时间。GeaFlow 支持基于时间窗口的分析,例如: + +- **最近 5 分钟的趋势变化** + 用户可以通过设置时间窗口,分析最近 5 分钟内的数据变化趋势。例如,在社交网络中,分析用户互动的频率变化。**** + +- **过去一天的动态模式** + GeaFlow 支持长时间跨度的分析,帮助用户发现长期趋势。例如,在电商推荐系统中,分析用户在过去一天内的购买行为。 + +### **2. 动态图与时序结合** + +GeaFlow 将图结构与时间维度结合,能够捕捉图中关系的演变。例如: + +- **社交网络中好友关系的变化** + +在社交网络中,用户的好友关系可能会随着时间发生变化。GeaFlow 可以动态更新图结构,捕捉这些变化。**** - **金融交易网络中的资金流动** +在金融交易网络中,资金流动是一个动态过程。GeaFlow 可以实时追踪资金流动路径,并识别潜在的风险点。 + +### **3. 实时与历史数据的无缝融合** + +GeaFlow 不仅支持实时流数据的处理,还能结合历史数据进行对比分析。这种能力特别适合需要长期趋势分析和短期实时监控的场景。例如: + +- **物联网设备监控** + +在物联网场景中,GeaFlow 可以实时监控设备状态,同时结合历史数据,预测设备可能出现的故障。**** - **金融风控** +在金融风控场景中,GeaFlow 可以实时监控交易网络,同时结合历史数据,识别异常行为或潜在风险。 + +### **4. 丰富的内置算法** + +GeaFlow 提供针对时序数据优化的算法,例如: + +- 最短路径 +- 弱联通分量 +- k-hop 算法 + +用户无需从零开发,直接调用即可完成复杂分析。 + +## 结语:开启你的时序数据分析之旅 + +数据的动态变化蕴藏着无限价值,而 GeaFlow 的时序能力正是解锁这一价值的钥匙。无论您是数据分析新手,还是希望提升动态数据处理能力的专业人士,GeaFlow 都将为您提供强大的支持。 + +立即下载 GeaFlow,亲身体验其时序能力的强大之处吧!让我们一起探索时间数据的无限可能! + +## 术语**** + +**DSL: **Domain-Specific Language。融合 DSL 是 GeaFlow 提供的图表一体的数据分析语言,支持标准 SQL+ISO/GQL 进行图表分析.通过融合 DSL 可以对表数据做关系运算处理,也可以对图数据做图匹配和图算法计算,同时也支持同时图表数据的联合处理。 diff --git a/blog/3.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/3.md similarity index 100% rename from blog/3.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/3.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/blog/30.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/30.md new file mode 100644 index 000000000..8c74cbba1 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/30.md @@ -0,0 +1,119 @@ +--- +title: Join性能变革:图数仓让SQL分析快人一步 +date: 2025-5-15 +--- + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/8237/1741854036431-9d80b215-73fc-4838-bdda-905d59ebf08e.png) + +> 作者:林力韬 + +## 一、引言:传统数仓分析的困境与破局之道 + +### 1. 场景化问题:当数据关联成为业务之痛 + +- **金融反欺诈场景**:在反欺诈分析中,复杂的多层资金链条挖掘往往依赖多表 JOIN 操作,进行复杂多跳的追踪。分析师团队耗费数天编写 SQL 脚本,最终查询耗时可达小时级别——而此时资金已完成洗白转移。这揭示出传统数仓的深层矛盾:**关系型范式与真实世界网状业务逻辑的错位**,常面临查询耗时高、查询逻辑复杂等挑战。 +- **营销分析场景**:在分析营销业务关系时,试图通过用户社交关系链挖掘潜在 VIP 客户,往往要用到专业的数分技能。尽管当下借助诸如 DeepInsight AI Copilot 等工具,可以通过大模型快速生成至少能打 80 分的维度和度量,集成到自助分析面板。但通常这些分析都涉及深层次的用户关联,**在 SQL 中直观表达性能较差**。 + + + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/jpeg/67556465/1741674750798-f519cba9-d8ae-47d4-aec0-97c2ef31a759.jpeg) + +**图 1 SQL Join 与 GQL 图 hop 查询性能差异示例** + +### 2. 数据枷锁 + +**效率枷锁**:当关联层级超过 3 跳,传统 JOIN 操作的时间复杂度呈指数级增长,以多表 JOIN 为核心的分析模式逐渐失去优势,成为效率的"枷锁"。 + +**表达力枷锁**:传统 SQL 不仅需要编写复杂的表达式,更面临关系模型难以直观表达的图拓扑结构。 + +**创新枷锁**:业务分析师因需要学习 GQL(图查询语言)而放弃采用图技术栈。工具链的割裂导致图分析能力始终停留在技术部门,难以赋能业务前线。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741674805947-d91bf10a-02eb-427c-acea-3cb96094f164.png?x-oss-process=image/format,png) + +**图 2 Join 与 GQL 表达示例** + +### 3. 破局之道:图数据仓库的核心价值 + +#### (1) 降低认知成本 + +用户无需感知图数据库的专业知识,通过 SQL 操作就能实现复杂的图关联分析,底层嫁接到图引擎底座。 + +#### (2) 加速数据价值升维释放 + +在支持传统 SQL 分析基础上,图数据仓库通过内置的算法仓库,将 PageRank、Louvain 等图算法封装为可解释的业务指标,支持分析隐藏的复杂模式(例如资金流的闭环路径识别)。同时,关联关系能够即时以图结构可视化呈现,摆脱传统数仓中基于表关联的抽象性,扩大了系统分析能力边界。 + +#### (3) 突破性能瓶颈 + +多表 JOIN 查询转为图路径检索,利用图引擎关联性分析优势,性能可从分钟级跃升至秒级,单点分析进入毫秒级。 支持动态图数据的实时更新,与传统批量处理模式(T+1)的滞后性形成鲜明对比。 + +## 二、技术解析:图数仓的核心技术革命 + +### 1. Schema 转换器(ER → Graph) + +对于大多数非专业用户而言,由于图领域知识缺乏、不熟悉图建模的思维方式等原因,导致利用图计算系统解决业务问题、分析需求存在较大挑战。在业务推广中,我们发现利用将表的 ER 模型描述自动转化为图模型建模,提供给用户一个初始的图,有助于用户快速上手。 + +图数仓 Schema 转换器自动将传统数据仓库中的 ER 模型(实体-关系模型)转换为图数据库的节点与边结构,支持对物理表、视图表、维度表进行统一建模。在原理上,图的实体可以理解为关系表选定一组列序列作为 ID 生成的 KV 表。在 ER 图解析时,具有等值关系的列可以视为同一个等价列,并将等值关系传递到不同表的等价列上。 + +从而,可以将模型转换算法总结为三阶段: + +**第一阶段,语义分析。**重点在于选取实体多列序列作为 ID 组成,识别表的实体/关系语义,发现跨表等价列(具有等值关系的列),融合支持表达式列处理。需要在所有可能的解法中,综合考虑存储性能、计算性能、可解释性评分最好的解法,作为构图的基础。 + +**第二阶段,结构化转换。**重点在于生成点/边实体,合并点实体,必要时通过冗余边生成平衡数据冗余与查询性能。自动创建虚拟点完成关系绑定,配置边的起始端点。 + +**第三阶段,组装成图。**即将所有点合并在一起,绑定在起始点上的边自然合并,对端点可选地进行绑定。对两个有差异的转图方案方案,可以计算差异向量,即所有表映射到实体的变化情况。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741683049495-d75ae87b-9510-40ee-b22f-c6140570b1f1.png) + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741683063811-75a42c49-8b17-482b-9dd2-98be89ec63b0.png) + +**图 3 ER 图转图 Schema 示例组图** + +通过算法自动分析多表之间的关联关系并自动构建图的点边,可以为数据从原始存储位置迁移至图数仓提供依据,同时显著消除人工数据建模、人工编写数据导入 DSL 的工作量,无人工介入即可使传统数仓数据快速迁移到图数据仓库中,立即开始分析。 + +### 2. 数据通道:物化数据交互能力 + +类似于传统数据仓库,图数仓基于 GeaFlow 引擎能力与 TuMaker 成熟的业务平台提供数据任务编排能力,即将多个数据处理任务(如数据抽取、转换、加载等)按照一定的逻辑顺序组织起来,自动执行的过程。提供可视化界面、任务调度机制、监听事件触发、错误处理、监控与日志、版本控制与回滚、智能调度集群资源等关键能力。 + +在 Schema 转换器的加持下,可以得到从表存储到图存储的物化方案,它构建了连接传统数仓与图数仓的数据通道。基于表转图的物化方案,可以根据业务实际配置的加速表、加速关系、字段、权限等信息,全自动生成数据同步的任务编排,再通过图数仓平台调度,实现数据迁移全程无感,后续实时更新与增量同步,同步效率可达延迟十分钟级别。 + +数据通道能力面向主流大数据生态系统,可深度集成 ODPS/Hive/Paimon 等基础设施,通过三层架构实现全生命周期数据管理:在数据接入层,自动捕获表的变化,产出物化方案,同步表-图实体映射的增量部分,当前可管理 10TB 级别图数据;在转换引擎层,全自动化生成导数的 DSL 任务编排,调度到集群执行;在存储优化层,支持 CStore/GraphDB/RocksDB 等自研或开源图存储解决方案,实践中已经过万亿级超大业务图的检验。此外,查询热数据预加载可根据图的实际使用情况,在 TB 级数据规模下仍能维持秒级查询相应,真正实现从表数仓到图数仓的全栈切换,SQL 之下全为图。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741684625347-a229239e-fd58-4d42-adc9-f272e3f13fdf.png) + +**图 4 开源技术架构一张大图** + +### 3. SQL-GQL 翻译引擎 + +在传统关系型数据库中,多层表关联查询往往需要编写复杂的 JOIN 语句,不仅开发效率低下,性能也难以满足海量关联数据的即席分析需求。针对这一痛点,我们通过创新的 SQL-GQL 翻译引擎,让用户无需学习图查询语言(GQL)即可将 SQL 中复杂的 JOIN 语句自动转换为图路径查询,消除用户对图领域复杂性感知,同时利用图引擎优化执行性能。 + +与 SQL 基于关系模型的二维表操作不同,GQL 的查询结构和语义贴合图数据的特性,尤其在查询逻辑的线性化和嵌套处理上存在显著差异。将 SQL 查询转换为 GQL(图查询语言)是一项涉及语法结构映射数据模型映射执行逻辑重构的复杂任务。其核心挑战在于如何将基于关系模型的集合操作转化为基于图模型的线性路径遍历,同时规避嵌套查询、不合理图计算顺序的代价。 + +对比传统 SQL 查询,可能需通过 3 层表关联分析用户关联关系,响应时间在分钟级别。而图路径查询直接通过图的遍历语句实现,响应时间缩短至秒级。目前该引擎已在短视频分析、会员用增、客权服务等典型业务场景得到验证,未来将持续扩展对复杂子查询、复杂表达式运算的支持,让更多开发者无需跨越技术鸿沟即可解锁图计算的强大能力。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/67556465/1741683921355-149d0fea-7a3f-4fb8-ad36-f4b4c8541113.png) + +**图 5 SQL 抽象语法树 AST 翻译为 GQL 结构的差异示例** + +## 三、技术优势与应用场景 + +### 3.1 效率提升的底层逻辑 + +在关联分析场景中,图数据仓库的突破性性能源于两大核心技术革新。 + +首先,图存储模型通过物理结构的优化彻底改变了数据组织方式。传统关系型数据库将关联信息分散存储在外键表中,执行多表 JOIN 时需频繁进行基于索引的寻址和数据重组。而图模型采用连接键原生聚合存储机制,将实体属性与其关联关系作为"节点-边"结构进行物理邻接存储,配合缓存预加载技术,使得关联关系的遍历检索复杂度从 O(n²)降低至 O(n),特定键的处理复杂度从 O(n)降低至 O(1)。 + +其次,图遍历算法构建了全新的查询范式。相较于关系型数据库基于集合的批处理模式,图引擎采用深度优先、广度优先等路径遍历算法,结合查询条件动态剪枝规避无效分支遍历。这种机制使得多层以上的链路追踪响应时间稳定在秒级,而传统 SQL 方案在大表的 3 层关联时往往已出现分钟级延迟。更关键的是,图遍历支持实时增量计算,当表新增记录时,展现出卓越的扩展能力。 + +### 3.2 用户价值主张 + +作为新一代数据基础设施,图数据仓库开创了"一图多用"的全新范式。用户既可通过熟悉的 SQL 接口进行常规分析,通过底层引擎嫁接的形式融入现有的基础设施。也可在需要深度挖掘时切换至 GQL、Gremlin 等专业图查询语言。这种双模兼容特性在同一套数据资产支撑不同类型的分析需求时尤为突出。 + +在算法支持层面,系统预置的图计算引擎突破传统数仓的局限,同时面向开源生态开放自定义图算法开发接口。例如传统 PageRank 算法可识别社交网络影响力节点,应用于精准营销场景;弱连接分析(WCC)帮助在亿级交易数据中发现异常社群;通过标准化 API 开放,用户既无需关注分布式计算细节,也无需关注数据构图流程,即可完成万亿边规模的数据挖掘。 + +相较于传统数仓,图数仓在三个维度实现代际跨越:性能层面,关联查询效率提升 1-2 个数量级;易用性层面,通过 SQL-GQL 自动转换消除图领域学习成本;分析深度层面,支持算法分析和隐性关系挖掘。 + +## 四、未来展望 + +作为下一代数据基础设施的核心载体,我们计划逐步将图存储引擎、图计算框架引擎、SQL-GQL 翻译模块等核心能力开源,构建开发者共创的技术生态。2023 年已率先开源流图计算引擎 GeaFlow,2025 年 Q3 将继续开放图模型数据分析标准化平台,高性能的图计算引擎,支持社区开发者开发异构数据源连接器。这种开放协作模式不仅加速技术迭代,更推动产品成为 ISO/IEC 39075 GQL 国际标准的最佳实践平台,助力 SQL-GQL 混合查询渐成行业规范。 + +技术演进层面,下一代引擎将突破动态流图计算瓶颈,实现万亿边规模数据的增量更新。通过融合向量化计算引擎,可同时处理属性图与向量图的联合查询,满足 AIGC 时代的多模态分析需求,并支持自然语言直接生成图查询语句的颠覆性体验。行业应用前景正呈现爆发态势,未来图数据仓库将承载多数企业关联数据分析负载,成为智能决策的核心引擎。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/blog/31.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/31.md new file mode 100644 index 000000000..2d097d46b --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/31.md @@ -0,0 +1,217 @@ +--- +title: Graph4Stream:基于图的流计算加速 +date: 2025-3-25 +--- + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/8237/1741852109942-9310f385-a0c2-4c32-987f-77b5c9df911a.png) + +> 作者:坤羽;审校:东朔。 + +之前在「姊妹篇」[《Stream4Graph:动态图上的增量计算》](https://zhuanlan.zhihu.com/p/27618053733)中,向大家介绍了在图计算技术中引入增量计算能力「图+流」,GeaFlow 流图计算相比 Spark GraphX 取得了显著的性能提升。那么在流计算技术中引入图计算能力「流+图」,GeaFlow 流图计算相比 Flink 关联计算性能如何呢? + +当今时代,数据正以前所未有的速度和规模产生,对海量数据进行实时处理在异常检测、搜索推荐、金融交易等各个领域都有着广泛的应用。流计算作为最主要的实时数据处理技术也变得越来越重要。 + + + + + +与批处理需要等待数据全部到齐才进行计算不同,流计算将持续生成的数据流划分成微批,对每个微批的数据进行增量计算。这样的计算特性使得流计算具有高吞吐、低延迟的特性。常见的流计算引擎包括 Flink、Spark Streaming 等,他们都采用表的方式处理流中的数据。随着流计算应用的深入,越来越多的计算场景涉及到大数据之间关联关系的计算,此时基于表的流计算引擎性能会大幅下降。 + + + +蚂蚁图计算团队开源的流图计算引擎GeaFlow,将图计算与流计算相结合,提供了高效的流图处理框架,大幅提升了计算性能。下面为大家介绍传统流计算引擎在关联关系计算的局限性,GeaFlow 流图计算高效的原理以及他们的性能对比。 + + + +## 流计算引擎:Flink + +Flink 是经典的基于表的流处理引擎,他将输入的数据流切分成微批,每次计算当前批次的数据。在计算过程中,Flink 将计算任务翻译成由 map、filter、join 等基础算子组成的有向图,每个算子都有他的上游输入和下游输出。增量数据经过所有算子的计算后输出当前批次的结果。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1741541936849-f9e0ae71-d25d-4789-b9c6-ed0380140f2a.png) + +Flink 增量计算 + + + +我们以 k-Hop 算法为例,描述 Flink 的计算过程。k-Hop 是指 K 跳关系,例如在社交网络中 k-Hop 指的是可以通过 K 个中间人相互认识的关系链,在交易分析中指资金的 K 次连续转移的路径。假定以 2 跳关系为例,输入的数据格式 src dst 代表了两两关系。Flink 的计算 SQL 如下文所示 + +```sql +-- create source table +CREATE TABLE edge ( + src int, + dst int +) WITH ( +); + +CREATE VIEW `v_view` (`vid`) AS +SELECT distinct * from +( +SELECT `src` FROM `edge` +UNION ALL +SELECT `dst` FROM `edge` +); + +CREATE VIEW `e_view` (`src`, `dst`) AS +SELECT `src`, `dst` FROM `edge`; + +CREATE VIEW `join1_edge`(`id1`, `dst`) AS SELECT `v`.`vid`, `e`.`dst` +FROM `v_view` AS `v` INNER JOIN `e_view` AS `e` +ON `v`.`vid` = `e`.`src`; + +CREATE VIEW `join1`(`id1`, `id2`) AS SELECT `e`.`id1`, `v`.`vid` +FROM `join1_edge` AS `e` INNER JOIN `v_view` AS `v` +ON `e`.`dst` = `v`.`vid`; + +CREATE VIEW `join2_edge`(`id1`, `id2`, `dst`) AS SELECT `v`.`id1`, `v`.`id2`, `e`.`dst` +FROM `join1` AS `v` INNER JOIN `e_view` AS `e` +ON `v`.`id2` = `e`.`src`; + +CREATE VIEW `join2`(`id1`, `id2`, `id3`) AS SELECT `e`.`id1`, `e`.`id2`, `v`.`vid` +FROM `join2_edge` AS `e` INNER JOIN `v_view` AS `v` +ON `e`.`dst` = `v`.`vid`; + +``` + + + +他的执行计划如下图所示,他由 Aggregate、Calc、Join 等算子组成,数据流经每个算子最终得到增量结果。核心算子 join 实现了关联关系的查找,我们来详细分析 Join 算子的实现方式。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1740550683257-198617cb-b66b-4997-86f7-d41df94f0fb1.png) + +Flink 执行计划 + + + +如下图所示,Join 算子有两个输入流 LeftInput 和 RightInput,分别代表了 join 的左表和右表,Join 算子在接收到上游的数据后执行计算。以左输入流为例,输入的数据首先被加入到 LeftStateView 中保存起来,然后去 RightStateView 中查询是否有数据符合 join 条件,这个查询过程需要遍历 RightStateView,最后将 join 结果输入到下一个算子中。 + + + +join 计算主要的性能瓶颈就在遍历 RightStateView。LeftStateView 和 RightStateView 实际上存储 join 的左表和右表。随着数据不断输入,StateView 中的数据量持续膨胀,最终导致遍历的耗时急剧上升,严重影响系统性能。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1741589034132-1969e973-94dd-42ca-b506-ebe4594d87a8.png) + +Flink Join 算子实现 + +## 流图计算引擎:GeaFlow + +### 图计算&流图 + +图计算是一种基于图数据格式的计算范式,其中图 G(V,E)由点集合 V 和边集合 E 构成,边代表了数据之间的关联关系。以公开数据集 web-Google 为例,其中每一行数据由两个数字组成,代表了两个页面之间的跳转关系。如下图所示,左侧是原始数据,常规的数据建模方式是建立一张包含两列数据的表,而图的建模方式是将网页作为点,将页面的跳转关系作为边,构成一张跳转网络图。在表的建模方式中,关联关系的计算是通过表的 join 实现的,join 需要遍历左表或者右表。而在图计算中,关联关系被直接存储在边中,省去了遍历的过程。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1741544451333-05f87f7e-8c8b-41fb-a27b-051b6df8e5da.png) + +表建模 vs. 图建模 + +流图是图在流场景中的应用,他依据数据流对图的更新将图分成历史图和增量图两个部分。例如在上图中,假设第一行和第二行数据已经输入并完成相应计算,当前处理第三行数据。此时历史图就是由前两行数据建模得到,而增量图是由第三行数据组成的图,两者合并起来就得到完整的图。在流图上应用增量图算法,可以高效完成计算任务,实现实时计算。 + + + +### GeaFlow 架构 + +GeaFlow 引擎的计算流程分为流数据输入、分布式增量图计算、增量结果输出几个部分。和传统的流计算引擎一样,输入的实时数据按照窗口被切分成微批。对于当前批次的数据,先按照建模策略解析成点边构成增量图。增量图和之前数据构成的历史图一道组成完整的流图。计算框架在流图上应用增量图算法得到增量结果输出,最后把增量图添加到历史图中。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1740932875376-6633b307-c309-4ae7-be5e-d0b96a66409a.png) + +GeaFlow 增量计算 + +GeaFlow 计算框架是以点为中心的迭代计算模型。他以增量图中的点作为第一轮迭代的起点。在每一轮迭代中,每个点都独立维护自身的状态,根据与每个点关联的历史图和增量图完成当前迭代轮次的计算,最后将计算结果通过消息传递给邻居点,开启下一轮迭代。 + +以前文中提到的 k-Hop 为例,增量算法如下:在第一轮迭代中,我们找到增量图中的所有边,将这些边作为初始的入向路径和出向路径,分别发送到他们的起点和终点。在后续的迭代中不断扩展入向路径和出向路径。当达到求取跳数时,将出向路径和入向路径发送给起点,在起点组合成最终结果。详细代码实现在开源仓库的[IncKHopAlgorithm.java](https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java)文件中。 + +下图是两跳场景的描述。在第一轮迭代,增量边 B->C 分别构建入向路径和出向路径,将他们分别发送给点 B 和点 C。在第二轮迭代,B 收到入向路径,并加上当前点的入边形成 2 跳入向路径,发送给点 B。同样点 C 也收到出向路径,加上当前的出边形成 2 跳出向路径,发送给点 B。最后一轮迭代在 B 点将收到的出向和入向路径整合成新增的路径。可以看到,和 Flink 中需要查找所有的历史关系不同,GeaFlow 采用基于流图的增量图算法,计算量和图中的增量路径成正比。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/35234/1741710927215-b6be1398-7485-432b-b8f7-c4cde5366302.png) + +两跳增量路径计算 + +上述图算法已经集成到 GeaFlow 的 IncKHop 算子中,用户可以直接通过 DSL 调用。 + +```sql +set geaflow.dsl.max.traversal=4; +set geaflow.dsl.table.parallelism=4; + +CREATE GRAPH modern ( + Vertex node ( + id int ID + ), + Edge relation ( + srcId int SOURCE ID, + targetId int DESTINATION ID + ) +) WITH ( + storeType='rocksdb', + shardCount = 4 +); + +CREATE TABLE web_google_20 ( + src varchar, + dst varchar +) WITH ( + type='file', + geaflow.dsl.table.parallelism='4', + geaflow.dsl.column.separator='\t', + `geaflow.dsl.source.file.parallel.mod`='true', + geaflow.dsl.file.path = 'resource:///data/web-google-20', + geaflow.dsl.window.size = 8 +); + +INSERT INTO modern.node +SELECT cast(src as int) +FROM web_google_20 +; + +INSERT INTO modern.node +SELECT cast(dst as int) +FROM web_google_20 +; + +INSERT INTO modern.relation +SELECT cast(src as int), cast(dst as int) +FROM web_google_20; +; + +CREATE TABLE tbl_result ( + ret varchar +) WITH ( + type='file', + geaflow.dsl.file.path='${target}' +); + +USE GRAPH modern; + +INSERT INTO tbl_result +CALL inc_khop(2) YIELD (ret) +RETURN ret +; +``` + + + +## GeaFlow 性能测试 + +为了验证 GeaFlow 的流图计算性能,我们以k-Hop算法为例设计了和 Flink 的对比实验。我们将指定数据作为输入源输入到计算引擎中,执行k-Hop算法,并统计所有数据完成计算的时间来比较系统的性能。我们采用公开数据集[web-Google.txt](https://snap.stanford.edu/data/web-Google.html)作为输入,实验环境为 16 台 8 核 16G 的服务器,分别比较了一跳、两跳、三跳、四跳关系计算的场景。 + +实验结果如图所示,横坐标是分别是一跳关系、两跳关系、三跳关系、四跳关系,纵坐标是处理完所有数据的耗时,采用对数指标。可以看到在一跳、两跳场景中,Flink 的性能要好于 GeaFlow,这是因为在一跳、两跳场景中参与 join 计算的数据量比较小,join 需要遍历的左表和右表都很小,遍历本身耗时短,而且 Flink 的计算框架可以缓存 join 的历史计算结果。但是到了三跳、四跳场景时候,由于计算复杂度的上升,join 算子需要遍历的表迅速膨胀,带来计算性能的急剧下降,甚至四跳场景超过一天也无法完成计算。而 GeaFlow采用基于流图增量图算法,计算耗时只和增量路径相关,和历史的关联关系计算结果无关,所以性能明显优于 Flink。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/png/8237/1743568484086-10eb9a1a-3dd0-42ee-b885-875ac7d81221.png) + +k-Hop 计算性能对比 + +## 总结和展望 + +传统的 Flink 等流计算引擎在计算关联关系时需要用到 join 算子,join 算子需要遍历全量的历史数据,这使得他们在大数据关联计算场景中性能不佳。GeaFlow 引擎通过支持流图计算框架,将图计算引入到流计算中,采用增量图计算的方法大大提升了实时数据的处理系性能。 + +目前 GeaFlow 项目代码已经开源,我们希望基于 GeaFlow 构建面向图数据的统一湖仓处理引擎,以解决多样化的大数据关联性分析诉求。同时我们也在积极筹备加入 Apache 基金会,丰富大数据开源生态,因此非常欢迎对图技术有浓厚兴趣同学加入社区共建。 + +社区中有诸多有趣的工作尚待完成,你可以从如下简单的「Good First Issue」开始,期待你加入同行。 + +- 支持增量 k-Core 算法。([Issue 466](https://github.com/TuGraph-family/tugraph-analytics/issues/466)) +- 支持增量最小生成树算法。([Issue 465](https://github.com/TuGraph-family/tugraph-analytics/issues/465)) +- ... + +## 参考链接 + +1. GeaFlow 项目地址:[https://github.com/TuGraph-family/tugraph-analytics](https://github.com/TuGraph-family/tugraph-analytics) +2. web-Google 数据集地址:[https://snap.stanford.edu/data/web-Google.html](https://snap.stanford.edu/data/web-Google.html) +3. GeaFlow Issues:[https://github.com/TuGraph-family/tugraph-analytics/issues](https://github.com/TuGraph-family/tugraph-analytics/issues) +4. 增量 k-Hop 算法实现源码:[https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java](https://github.com/TuGraph-family/tugraph-analytics/blob/master/geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/com/antgroup/geaflow/dsl/udf/graph/IncKHopAlgorithm.java) diff --git a/i18n/zh-CN/docusaurus-plugin-content-blog/blog/32.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/32.md new file mode 100644 index 000000000..fc5575b53 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/32.md @@ -0,0 +1,165 @@ +--- +title: "流式图计算引擎 GeaFlow v0.6.4 发布,支持关系型访问图数据,增量匹配优化实时处理" +date: 2025-4-3 +--- + +2025 年 3 月发布了流式图计算引擎 GeaFlow v0.6.4,新版本实现了多个重要特性更新,包括: + +- 🍀GeaFlow 图存储扩展支持 paimon 数据湖(实验性功能) +- 🍀图数仓能力扩展:支持对图中的实体进行关系型访问 +- 🍀统一的内存管理器支持 +- 🍀RBO 规则扩展:新增 MatchEdgeLabelFilterRemoveRule 和 MatchIdFilterSimplifyRule +- 🍀支持增量匹配算子 + + + +## ✨ 新增功能 + +### 🍀GeaFlow 图存储扩展支持 paimon 数据湖(实验性功能) + +为提升 GeaFlow 数据存储系统的扩展性、实时数据处理能力及成本效率,本次更新加入了对 Apache Paimon 的支持。Paimon 作为新一代流式数据湖存储格式,在设计理念、功能特性上,与 GeaFlow 之前使用的 RocksDB 存在许多差异: + +- 支持对象存储/HDFS 分布式存储,天然适配云原生环境。因此可实现存储与计算分离,降低硬件依赖,支持弹性扩展。 +- 支持主键表 LSM 合并、增量更新,满足实时数据更新需求。 +- 列式存储+统计索引(Z-Order、Min-Max 等),支持高效数据裁剪与 OLAP 查询加速。 + +在本次更新中,GeaFlow 加入了对 Paimon 存储的支持,但目前仅为实验性质。 + +- 支持在 GeaFlow 中将用户图数据存储到 paimon 数据湖。 +- 当前为实验性功能,仅支持使用本地文件系统作为 paimon 的存储后端,且暂不支持 recover 能力,暂不支持动态图数据存储。 +- 通过配置`geaflow.store.paimon.options.warehouse`参数来指定存储路径,默认路径为"file:///tmp/paimon/"。 + +当前 GeaFlow 的存储架构图如下。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583755-e264437e-59a4-4483-81b9-6b1b26a49279.webp) + + + +### 🍀图数仓能力扩展:支持对图中的实体进行关系型访问 + +在传统关系型数据库中,多层表关联查询往往需要编写复杂的 JOIN 语句,不仅开发效率低下,性能也难以满足海量关联数据的即席分析需求。针对这一痛点,我们通过创新的 SQL 支持,让用户无需学习图查询语言(GQL)即可将 SQL 中复杂的 JOIN 语句自动转换为图路径查询。当前版本提供以下两种 SQL 语法支持: + +- 支持图中的点/边作为 SQL 查询的来源表,进行查询。 + - 我们通过 TableScanToGraphRule 规则,让生成和优化 RelNode 时识别 SQL 语句中来源于图中的点/边实体,这使得用户可以像 SQL 中扫表操作一样读取图中的点边 + - 示例: student 是图 g_student 中的点实体 + +```plain +USE GRAPH g_student; + +INSERT INTO table_scan_001_result +select avg(age) as avg_age from student; +``` + +- 支持图中的点与边关联作为 SQL 查询的等值条件 Join,进行查询。 + - 我们通过 TableJoinTableToGraphRule 规则,让生成和优化 RelNode 时识别 SQL 语句中的 Join 算子,这使得用户可以像 SQL 中连接表操作一样在图中进行查询 + - 示例: student 是图 g_student 中的点实体,selectCource 是关联在 student 点上的出边 + +```plain +USE GRAPH g_student; + +INSERT INTO vertex_join_edge_001_result +SELECT s.id, sc.targetId, sc.ts +FROM student s JOIN selectCourse sc on s.id = sc.srcId +WHERE s.id < 1004 +ORDER BY s.id, targetId +; +``` + +### 🍀内存管理器支持 + +当前 GeaFlow 没有内存管理,除了外部依赖 rocksdb 会用堆外内存,其他的全都是堆内内存。当内存使用多时,GC 压力明显,另外 shuffle 阶段网络发送也存在多次数据拷贝,导致效率不高。 + +内存管理负责各模块(shuffle、state、framework)的内存管控,包括申请、释放、监控。 内存管理有两部分:堆内和堆外。不同模块使用可能不同的内存区域,合理使用这些资源可以更高效跑完作业。内存管理器主要有以下核心能力: + +- 支持堆内和堆外内存统一管理:通过统一抽象 MemoryView,提供读写接口,屏蔽用户对堆外和堆外的感知。当前 Memoryview 堆外内存是采用预分配模式,初始大小是通过 off.heap.memory.chunkSize.MB 参数来控制,如果不设置,默认是 -Xmx 参数的 30%作为初始值。运行过程中也支持动态扩所容。 +- 支持计算和存储统一内存管理 + +为了避免堆外内存浪费或者过度使用,GeaFlow 对各模块的堆外内存使用统一管理。内存主要分 3 个部分:shuffle、state 和 default。 Default 是预留空间,可动态被 shuffle 或者 state 模块占用。 如下图所示: + + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583816-33d13188-a315-4079-bb90-89c58d5a4e82.webp) + +state 和 shuffle 默认独占 10%的堆外内存, default 则占用 80%。 + +### + +🍀RBO 规则扩展:新增 EdgeLabel 和 IdFilter 优化规则 + +- Edge Label 简化:针对 Match 匹配语句后接 Where 子句对边进行过滤的查询进行执行计划简化。 +- ID Filter 简化:针对 Match 匹配语句中对点的 id 进行过滤的查询进行执行计划简化。 +- 规则在默认情况下生效,使用示例如下: + +```plain + +// GQL示例1(MatchIdFilterSimplifyRule优化) +MATCH (a:user where id = 1)-[e:knows]-(b:user) +RETURN a.id as a_id, e.weight as weight, b.id as b_id + +// 原执行计划 +LogicalProject(a_id=[$0.id], weight=[$1.weight], b_id=[$2.id]) + LogicalGraphMatch(path=[(a:user) where =(a.id, 1) -[e:knows]-(b:user)]) + LogicalGraphScan(table=[default.g0]) + +// MatchIdFilterSimplifyRule优化后执行计划,vertex id转移到MatchVertex中进行过滤 +LogicalProject(a_id=[$0.id], weight=[$1.weight], b_id=[$2.id]) + LogicalGraphMatch(path=[(a:user)-[e:knows]-(b:user)]) + LogicalGraphScan(table=[default.g0]) + +// GQL示例2(MatchEdgeLabelFilterRemoveRule优化) +MATCH (a:user where id = 1)-[e:knows]-(b:user) WHERE e.~label = 'knows' +or e.~label = 'created' +RETURN a.id as a_id, e.weight as weight, b.id as b_id + +// 原执行计划 +LogicalProject(a_id=[$0.id], weight=[$1.weight], b_id=[$2.id]) + LogicalGraphMatch(path=[(a:user) where =(a.id, 1) -[e:knows]-(b:user) where OR(=($1.~label, _UTF-16LE'knows'), =($1.~label, _UTF-16LE'created')) ]) + LogicalGraphScan(table=[default.g0]) + +// MatchEdgeLabelFilterRemoveRule优化后执行计划,针对edge label的过滤转移到MatchEdge中进行 +LogicalProject(a_id=[$0.id], weight=[$1.weight], b_id=[$2.id]) + LogicalGraphMatch(path=[(a:user) where =(a.id, 1) -[e:knows]-(b:user)]) + LogicalGraphScan(table=[default.g0]) +``` + +### 🍀支持增量匹配算子 + +在动态图场景中,数据往往不是全部一批到来,而会源源不断地进行输入和计算,图的点边不断地从数据源读取,进行构图,从而形成增量图。对于某一批新增的点边,构成了一个新的版本的图,如果重新对全图(即当前所有点边)进行图遍历,开销较大。当前版本中使用了一种基于子图扩展的增量图匹配方法,通过子图扩展,来扩展每次增量的触发起点,尽可能地只对增量的数据进行查询: + +- 支持增量匹配逻辑,通过反向传播来扩展每次 window 新增数据的触发起点。 + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583809-8c862fd5-1d7f-4b8b-9b34-e63d1692deb6.webp) + +- 通过在 dsl 或高阶代码中设置`geaflow.dsl.graph.enable.incr.traversal`参数为 true 开启增量计算逻辑。 + +开启示例如下: + +```plain +QueryTester.build() + .withConfig(FrameworkConfigKeys.BATCH_NUMBER_PER_CHECKPOINT.getKey(), "1") + .withQueryPath(queryPath) + .withConfig(DSLConfigKeys.ENABLE_INCR_TRAVERSAL.getKey(), "true") + .withConfig(DSLConfigKeys.TABLE_SINK_SPLIT_LINE.getKey(), lineSplit) + .execute(); +``` + +**** + +## ✨ 历史版本回顾 + +我们回顾上一版,v0.6.3 版本在 v0.5.2 版本基础之上实现了一些重要功能特性,其中包括: + +- 实现了 OSS/DFS/S3 标准化接口,接入主流云存储:支持开源 OSS/DFS/S3 等 remote 分布式存储,同时标准化了接口,便于按需快速扩展其它外部分布式存储系统。 +- 支持标准 Match 算子:支持标准 ISO-GQL Match 语法及算子。 +- Aliyun ODPS 表的读写能力:支持 Aliyun ODPS 插件,提供 ODPS 表的读写能力。 +- 兼容开源 Ray 生态:引擎支持开源 Ray 版本,同时 console 平台支持将任务提交到 Ray 集群。 +- DSL 支持时序能力:DSL 侧支持时间感知的数据处理、提供动态图与时序结合的能力。 +- Shuffle 支持反压优化:通过滑动窗口的方式进行数据传输和实现反压能力。 +- GeaFlow 流图性能测试:新增了 GeaFlow Vs Spark/Flink 的 demo 和性能测试报告。 + +## ✨ 致谢 + +感谢所有贡献者使这次发布成为可能! + + + +![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792583799-ea7feea1-1279-4089-bbb9-61fc0b6331b2.webp) diff --git a/blog/33.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/33.md similarity index 92% rename from blog/33.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/33.md index 2f6ee0c7c..a668d4a5d 100644 --- a/blog/33.md +++ b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/33.md @@ -1,9 +1,9 @@ --- -title: 深入解读TuGraph计算引擎模型推理系统 +title: 深入解读GeaFlow计算引擎模型推理系统 date: 2024-5-24 --- -TuGraph 计算引擎模型推理系统将基于迭代计算的图计算框架与模型推理系统相结合,推理系统可自定义推理依赖环境,图迭代计算与推理链路实现隔离。基于共享内存的跨进程通信方式,提高了推理数据交换效率,满足流图近线推理的时效性。在蚂蚁集团内部的实际应用场景中,大幅缩短了模型推理上线的链路与开发时间,用户迭代模型版本更方便快捷。 +GeaFlow计算引擎模型推理系统将基于迭代计算的图计算框架与模型推理系统相结合,推理系统可自定义推理依赖环境,图迭代计算与推理链路实现隔离。基于共享内存的跨进程通信方式,提高了推理数据交换效率,满足流图近线推理的时效性。在蚂蚁集团内部的实际应用场景中,大幅缩短了模型推理上线的链路与开发时间,用户迭代模型版本更方便快捷。 @@ -26,7 +26,7 @@ date: 2024-5-24 ## 2. 流图推理简介 -TuGraph 计算引擎(TuGraph Analytics[1])是蚂蚁集团开源的大规模分布式实时图计算引擎(流图引擎),实现了流批一体的图计算模型,支持了丰富的图计算算法。GeaFlow 的流图计算能力,能处理连续输入的数据流,并支持增量的计算模式,极大得提高了数据的计算效率和实时性。GeaFlow 解决了业界大规模数据关联分析的实时计算问题,已广泛应用于数仓加速、金融风控、知识图谱以及社交推荐等场景。 +GeaFlow是一个分布式实时图计算引擎(流图引擎),实现了流批一体的图计算模型,支持了丰富的图计算算法。GeaFlow 的流图计算能力,能处理连续输入的数据流,并支持增量的计算模式,极大得提高了数据的计算效率和实时性。GeaFlow 解决了业界大规模数据关联分析的实时计算问题,已广泛应用于数仓加速、金融风控、知识图谱以及社交推荐等场景。 随着业务场景中问题复杂度的提升,基于传统的迭代图算法已无法满足业务的实际需求。例如在反洗钱场景中,利用图神经网络算法处理复杂的交易关系,能够捕获到节点的局部图结构信息。通过聚合邻接节点的特征信息,每个交易节点都可以感知到周边图网络结构的信息。类似的图神经网络等 AI 模型的推理逻辑,是无法基于传统的图迭代计算模式直接高效地表达的。 @@ -99,7 +99,7 @@ date: 2024-5-24 GeaFlow 模型推理系统工作流中,Driver 端(即控制节点)发挥着至关重要的角色。该节点运行在 Java 虚拟机进程,是整个推理流程的控制中心。Driver 端初始化了一个非常关键的组件——InferenceContext 对象,InferenceContext 对象被设计为模型推理流程的核心,在 JVM 环境中创建并负责加载和预处理用户提供的模型文件和环境依赖信息。在模型推理任务之前,InferenceContext 会详细检查并准备好模型文件,确保能够正确加载到预期的执行环境中。InferenceContext 也负责初始化和配置与模型推理相关的虚拟环境,确保正确的 Python 环境或其他必要的运行时库得以安装和配置。 -如图所示,由流式数据源源不断的触发图迭代计算与模型推理工作。TuGraph 计算引擎提供了 DeltaGraphCompute 计算接口,用户可自主定义增量图数据的处理逻辑,并更新历史的图存储(Graph Store)。通过 TuGraph 计算引擎模型推理系统,增量图迭代的中间计算结果,经过推理前置数据处理接口,并基于共享内存的跨进程通信方式,将处理后的数据流输入到推理进程,完成推理工作后的结果参与后续图迭代计算逻辑。下文将详细介绍各个数据接口的使用。 +如图所示,由流式数据源源不断的触发图迭代计算与模型推理工作。GeaFlow提供了 DeltaGraphCompute 计算接口,用户可自主定义增量图数据的处理逻辑,并更新历史的图存储(Graph Store)。通过 GeaFlow模型推理系统,增量图迭代的中间计算结果,经过推理前置数据处理接口,并基于共享内存的跨进程通信方式,将处理后的数据流输入到推理进程,完成推理工作后的结果参与后续图迭代计算逻辑。下文将详细介绍各个数据接口的使用。 ![](https://intranetproxy.alipay.com/skylark/lark/0/2025/webp/96961/1756792986743-dc6cf33f-23a2-4a70-9f2c-aa9c9b8fa2da.webp) @@ -333,10 +333,10 @@ public static class PRVertexCentricComputeFunction implements ## 6. 总结 -通过将 AI 模型推理引入 GeaFlow 流图计算系统,让我们能够对图数据进行深度地分析和预测。利用最新的机器学习和深度学习技术,TuGraph Analytics 图计算引擎不仅可以对图数据进行分类和回归分析,还可以预测未来趋势,从而在多个维度上提供决策支持。 +通过将 AI 模型推理引入 GeaFlow 流图计算系统,让我们能够对图数据进行深度地分析和预测。利用最新的机器学习和深度学习技术,GeaFlow不仅可以对图数据进行分类和回归分析,还可以预测未来趋势,从而在多个维度上提供决策支持。 希望通过以上的介绍,可以让大家对 GeaFlow 模型推理系统有个比较清晰的了解,非常欢迎大家加入我们社区(https://github.com/TuGraph-family/tugraph-analytics),一起构建图数据上的智能化分析能力! ##### 引用链接 -[1] TuGraph Analytics: _https://github.com/TuGraph-family/tugraph-analytics_ +[1] GeaFlow: _https://github.com/TuGraph-family/tugraph-analytics_ diff --git a/blog/34.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/34.md similarity index 100% rename from blog/34.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/34.md diff --git a/blog/35.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/35.md similarity index 100% rename from blog/35.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/35.md diff --git a/blog/36.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/36.md similarity index 100% rename from blog/36.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/36.md diff --git a/blog/4.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/4.md similarity index 100% rename from blog/4.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/4.md diff --git a/blog/5.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/5.md similarity index 100% rename from blog/5.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/5.md diff --git a/blog/6.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/6.md similarity index 100% rename from blog/6.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/6.md diff --git a/blog/7.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/7.md similarity index 100% rename from blog/7.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/7.md diff --git a/blog/8.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/8.md similarity index 100% rename from blog/8.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/8.md diff --git a/blog/9.md b/i18n/zh-CN/docusaurus-plugin-content-blog/blog/9.md similarity index 100% rename from blog/9.md rename to i18n/zh-CN/docusaurus-plugin-content-blog/blog/9.md diff --git a/i18n/zh-CN/docusaurus-plugin-content-pages/download/index.md b/i18n/zh-CN/docusaurus-plugin-content-pages/download/index.md index 57ee76c6e..2a1c7813f 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-pages/download/index.md +++ b/i18n/zh-CN/docusaurus-plugin-content-pages/download/index.md @@ -6,17 +6,17 @@ title: 下载 ## Apache GeaFlow -[Apache GeaFlow 0.6.7 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6.7) +[GeaFlow 0.6.7 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6.7) -[Apache GeaFlow 0.6.6 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6.6) +[GeaFlow 0.6.6 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6.6) -[Apache GeaFlow 0.6.5 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6.5) +[GeaFlow 0.6.5 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6.5) -[Apache GeaFlow 0.6 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6) +[GeaFlow 0.6 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6) -[Apache GeaFlow 0.5.2 Source Release](https://github.com/apache/geaflow/releases/tag/v0.5.2) +[GeaFlow 0.5.2 Source Release](https://github.com/apache/geaflow/releases/tag/v0.5.2) -[Apache GeaFlow 0.5.1 Source Release](https://github.com/apache/geaflow/releases/tag/v0.5.1) +[GeaFlow 0.5.1 Source Release](https://github.com/apache/geaflow/releases/tag/v0.5.1) ## 通过 Docker 安装 @@ -45,7 +45,7 @@ title: 下载 > 获取 GeaFlow 控制台镜像:geaflow/geaflow-console-arm:0.6 -## Apache GeaFlow GeaFlow K8S 控制器 +## GeaFlow GeaFlow K8S 控制器 ### geaflow-kubernetes-operator @@ -55,22 +55,22 @@ title: 下载 ## Maven 依赖 -您可以在 **pom.xml** 中添加以下依赖,以在项目中引入 **Apache GeaFlow** +您可以在 **pom.xml** 中添加以下依赖,以在项目中引入 **Apache GeaFlow (Incubating)** **GeaFlow** 的构件可从 **sonatype.com** 获取 [Official Repository](https://search.maven.org/search?q=GeaFlow. ## 稳定版本发布说明 -[Apache GeaFlow 0.6 Release Note](https://github.com/apache/geaflow/releases/tag/v0.6) +[GeaFlow 0.6 Release Note](https://github.com/apache/geaflow/releases/tag/v0.6) -[Apache GeaFlow 0.5 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.5) +[GeaFlow 0.5 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.5) -[Apache GeaFlow 0.4 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.4) +[GeaFlow 0.4 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.4) -[Apache GeaFlow 0.3 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.3) +[GeaFlow 0.3 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.3) -[Apache GeaFlow 0.2 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.2) +[GeaFlow 0.2 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.2) -[Apache GeaFlow 0.1 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.1) +[GeaFlow 0.1 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.1) ## diff --git a/src/pages/Home/components/Banner/index.tsx b/src/pages/Home/components/Banner/index.tsx index 9b72c113b..6591662e4 100644 --- a/src/pages/Home/components/Banner/index.tsx +++ b/src/pages/Home/components/Banner/index.tsx @@ -11,7 +11,7 @@ const Banner = () => { 'url(/img/BG.png)'; const bannerDetail = { - title: 'Apache GeaFlow', + title: 'Apache GeaFlow (Incubating)', desc: translate({ message: 'product_analytics.description' }), btn: ( diff --git a/src/pages/download/index.md b/src/pages/download/index.md index 970256648..edaa1535d 100644 --- a/src/pages/download/index.md +++ b/src/pages/download/index.md @@ -4,19 +4,19 @@ title: "Downloads" # Apache GeaFlow (Incubating) Downloads -## Apache GeaFlow +## Apache GeaFlow (Incubating) -[Apache GeaFlow 0.6.7 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6.7) +[GeaFlow 0.6.7 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6.7) -[Apache GeaFlow 0.6.6 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6.6) +[GeaFlow 0.6.6 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6.6) -[Apache GeaFlow 0.6.5 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6.5) +[GeaFlow 0.6.5 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6.5) -[Apache GeaFlow 0.6 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6) +[GeaFlow 0.6 Source Release](https://github.com/apache/geaflow/releases/tag/v0.6) -[Apache GeaFlow 0.5.2 Source Release](https://github.com/apache/geaflow/releases/tag/v0.5.2) +[GeaFlow 0.5.2 Source Release](https://github.com/apache/geaflow/releases/tag/v0.5.2) -[Apache GeaFlow 0.5.1 Source Release](https://github.com/apache/geaflow/releases/tag/v0.5.1) +[GeaFlow 0.5.1 Source Release](https://github.com/apache/geaflow/releases/tag/v0.5.1) ## Installing with Docker @@ -42,7 +42,7 @@ title: "Downloads" > docker pull geaflow/geaflow-console-arm:0.6 -## Apache GeaFlow Kubernetes Operator +## GeaFlow Kubernetes Operator ### geaflow-kubernetes-operator @@ -52,22 +52,22 @@ title: "Downloads" ## Maven Dependencies -You can add the following dependencies to your `pom.xml` to include Apache GeaFlow in your project. +You can add the following dependencies to your `pom.xml` to include Apache GeaFlow (Incubating) in your project. GeaFlow artifacts are available from sonatype.com [Official Repository](https://search.maven.org/search?q=GeaFlow). ## Release notes for stable releases -[Apache GeaFlow 0.6 Release Note](https://github.com/apache/geaflow/releases/tag/v0.6) +[GeaFlow 0.6 Release Note](https://github.com/apache/geaflow/releases/tag/v0.6) -[Apache GeaFlow 0.5 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.5) +[GeaFlow 0.5 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.5) -[Apache GeaFlow 0.4 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.4) +[GeaFlow 0.4 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.4) -[Apache GeaFlow 0.3 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.3) +[GeaFlow 0.3 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.3) -[Apache GeaFlow 0.2 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.2) +[GeaFlow 0.2 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.2) -[Apache GeaFlow 0.1 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.1) +[GeaFlow 0.1 Release Note](https://github.com/apache/geaflow/releases/tag/release-0.1) ## diff --git a/static/graph/1734590557540-5f3f4528-fa07-4208-8425-bc514ea5e06b.jpeg b/static/graph/1734590557540-5f3f4528-fa07-4208-8425-bc514ea5e06b.jpeg new file mode 100644 index 000000000..3b3385fd4 Binary files /dev/null and b/static/graph/1734590557540-5f3f4528-fa07-4208-8425-bc514ea5e06b.jpeg differ diff --git a/static/graph/1739276186744-96d40e95-4e29-48ef-8892-1b7dfa60c726.png b/static/graph/1739276186744-96d40e95-4e29-48ef-8892-1b7dfa60c726.png new file mode 100644 index 000000000..321442752 Binary files /dev/null and b/static/graph/1739276186744-96d40e95-4e29-48ef-8892-1b7dfa60c726.png differ diff --git a/static/graph/1740386529457-b43e2d49-6000-4acf-862c-314ae4f23dbc.png b/static/graph/1740386529457-b43e2d49-6000-4acf-862c-314ae4f23dbc.png new file mode 100644 index 000000000..658654188 Binary files /dev/null and b/static/graph/1740386529457-b43e2d49-6000-4acf-862c-314ae4f23dbc.png differ diff --git a/static/graph/1740386998582-16f67c8e-ee45-48d2-bb5f-f45ec3956273.png b/static/graph/1740386998582-16f67c8e-ee45-48d2-bb5f-f45ec3956273.png new file mode 100644 index 000000000..dc4c38b86 Binary files /dev/null and b/static/graph/1740386998582-16f67c8e-ee45-48d2-bb5f-f45ec3956273.png differ diff --git a/static/graph/1740454568183-6d42716b-fc84-41a8-945c-c97b81d61135.png b/static/graph/1740454568183-6d42716b-fc84-41a8-945c-c97b81d61135.png new file mode 100644 index 000000000..afa73ff42 Binary files /dev/null and b/static/graph/1740454568183-6d42716b-fc84-41a8-945c-c97b81d61135.png differ diff --git a/static/graph/1740470405961-05389aa3-1b67-4cdf-9c65-ea28641ef89c.png b/static/graph/1740470405961-05389aa3-1b67-4cdf-9c65-ea28641ef89c.png new file mode 100644 index 000000000..d40c3a5f0 Binary files /dev/null and b/static/graph/1740470405961-05389aa3-1b67-4cdf-9c65-ea28641ef89c.png differ diff --git a/static/graph/1740471552771-36ee8f06-d58e-4cb7-914d-c44e151575a0.png b/static/graph/1740471552771-36ee8f06-d58e-4cb7-914d-c44e151575a0.png new file mode 100644 index 000000000..edc6dfc85 Binary files /dev/null and b/static/graph/1740471552771-36ee8f06-d58e-4cb7-914d-c44e151575a0.png differ diff --git a/static/graph/1740537488877-eb89b886-7c4c-4c5a-8e27-06356b15afa0.png b/static/graph/1740537488877-eb89b886-7c4c-4c5a-8e27-06356b15afa0.png new file mode 100644 index 000000000..c1de13424 Binary files /dev/null and b/static/graph/1740537488877-eb89b886-7c4c-4c5a-8e27-06356b15afa0.png differ diff --git a/static/graph/1740550683257-198617cb-b66b-4997-86f7-d41df94f0fb1.png b/static/graph/1740550683257-198617cb-b66b-4997-86f7-d41df94f0fb1.png new file mode 100644 index 000000000..c4438c400 Binary files /dev/null and b/static/graph/1740550683257-198617cb-b66b-4997-86f7-d41df94f0fb1.png differ diff --git a/static/graph/1740932875376-6633b307-c309-4ae7-be5e-d0b96a66409a.png b/static/graph/1740932875376-6633b307-c309-4ae7-be5e-d0b96a66409a.png new file mode 100644 index 000000000..0e402ffc6 Binary files /dev/null and b/static/graph/1740932875376-6633b307-c309-4ae7-be5e-d0b96a66409a.png differ diff --git a/static/graph/1740982328260-3a0ff09e-920b-4f55-af14-326b5d0a358c.png b/static/graph/1740982328260-3a0ff09e-920b-4f55-af14-326b5d0a358c.png new file mode 100644 index 000000000..7c2212659 Binary files /dev/null and b/static/graph/1740982328260-3a0ff09e-920b-4f55-af14-326b5d0a358c.png differ diff --git a/static/graph/1741541936849-f9e0ae71-d25d-4789-b9c6-ed0380140f2a.png b/static/graph/1741541936849-f9e0ae71-d25d-4789-b9c6-ed0380140f2a.png new file mode 100644 index 000000000..edec57ee3 Binary files /dev/null and b/static/graph/1741541936849-f9e0ae71-d25d-4789-b9c6-ed0380140f2a.png differ diff --git a/static/graph/1741544451333-05f87f7e-8c8b-41fb-a27b-051b6df8e5da.png b/static/graph/1741544451333-05f87f7e-8c8b-41fb-a27b-051b6df8e5da.png new file mode 100644 index 000000000..852c7e7d0 Binary files /dev/null and b/static/graph/1741544451333-05f87f7e-8c8b-41fb-a27b-051b6df8e5da.png differ diff --git a/static/graph/1741574572676-ff7e2c56-14d0-470c-b21d-604f928c6ec9.jpeg b/static/graph/1741574572676-ff7e2c56-14d0-470c-b21d-604f928c6ec9.jpeg new file mode 100644 index 000000000..52073038d Binary files /dev/null and b/static/graph/1741574572676-ff7e2c56-14d0-470c-b21d-604f928c6ec9.jpeg differ diff --git a/static/graph/1741576149930-b169b7da-0600-4fca-b6ad-5eadcfdbff5b.jpeg b/static/graph/1741576149930-b169b7da-0600-4fca-b6ad-5eadcfdbff5b.jpeg new file mode 100644 index 000000000..76f43fa5c Binary files /dev/null and b/static/graph/1741576149930-b169b7da-0600-4fca-b6ad-5eadcfdbff5b.jpeg differ diff --git a/static/graph/1741589034132-1969e973-94dd-42ca-b506-ebe4594d87a8.png b/static/graph/1741589034132-1969e973-94dd-42ca-b506-ebe4594d87a8.png new file mode 100644 index 000000000..92126384a Binary files /dev/null and b/static/graph/1741589034132-1969e973-94dd-42ca-b506-ebe4594d87a8.png differ diff --git a/static/graph/1741599519420-37fd1d9f-6623-44b3-87e4-5ac5275b876f.png b/static/graph/1741599519420-37fd1d9f-6623-44b3-87e4-5ac5275b876f.png new file mode 100644 index 000000000..0d194d362 Binary files /dev/null and b/static/graph/1741599519420-37fd1d9f-6623-44b3-87e4-5ac5275b876f.png differ diff --git a/static/graph/1741674750798-f519cba9-d8ae-47d4-aec0-97c2ef31a759.jpeg b/static/graph/1741674750798-f519cba9-d8ae-47d4-aec0-97c2ef31a759.jpeg new file mode 100644 index 000000000..ccfda7f21 Binary files /dev/null and b/static/graph/1741674750798-f519cba9-d8ae-47d4-aec0-97c2ef31a759.jpeg differ diff --git a/static/graph/1741674805947-d91bf10a-02eb-427c-acea-3cb96094f164.png b/static/graph/1741674805947-d91bf10a-02eb-427c-acea-3cb96094f164.png new file mode 100644 index 000000000..dbe748b3f Binary files /dev/null and b/static/graph/1741674805947-d91bf10a-02eb-427c-acea-3cb96094f164.png differ diff --git a/static/graph/1741683049495-d75ae87b-9510-40ee-b22f-c6140570b1f1.png b/static/graph/1741683049495-d75ae87b-9510-40ee-b22f-c6140570b1f1.png new file mode 100644 index 000000000..8b4dbf7da Binary files /dev/null and b/static/graph/1741683049495-d75ae87b-9510-40ee-b22f-c6140570b1f1.png differ diff --git a/static/graph/1741683063811-75a42c49-8b17-482b-9dd2-98be89ec63b0.png b/static/graph/1741683063811-75a42c49-8b17-482b-9dd2-98be89ec63b0.png new file mode 100644 index 000000000..9c4dc1244 Binary files /dev/null and b/static/graph/1741683063811-75a42c49-8b17-482b-9dd2-98be89ec63b0.png differ diff --git a/static/graph/1741683921355-149d0fea-7a3f-4fb8-ad36-f4b4c8541113.png b/static/graph/1741683921355-149d0fea-7a3f-4fb8-ad36-f4b4c8541113.png new file mode 100644 index 000000000..90b86710d Binary files /dev/null and b/static/graph/1741683921355-149d0fea-7a3f-4fb8-ad36-f4b4c8541113.png differ diff --git a/static/graph/1741684625347-a229239e-fd58-4d42-adc9-f272e3f13fdf.png b/static/graph/1741684625347-a229239e-fd58-4d42-adc9-f272e3f13fdf.png new file mode 100644 index 000000000..586a67f35 Binary files /dev/null and b/static/graph/1741684625347-a229239e-fd58-4d42-adc9-f272e3f13fdf.png differ diff --git a/static/graph/1741710927215-b6be1398-7485-432b-b8f7-c4cde5366302.png b/static/graph/1741710927215-b6be1398-7485-432b-b8f7-c4cde5366302.png new file mode 100644 index 000000000..e964c6da0 Binary files /dev/null and b/static/graph/1741710927215-b6be1398-7485-432b-b8f7-c4cde5366302.png differ diff --git a/static/graph/1741852109942-9310f385-a0c2-4c32-987f-77b5c9df911a.png b/static/graph/1741852109942-9310f385-a0c2-4c32-987f-77b5c9df911a.png new file mode 100644 index 000000000..57a4db89b Binary files /dev/null and b/static/graph/1741852109942-9310f385-a0c2-4c32-987f-77b5c9df911a.png differ diff --git a/static/graph/1741854036431-9d80b215-73fc-4838-bdda-905d59ebf08e.png b/static/graph/1741854036431-9d80b215-73fc-4838-bdda-905d59ebf08e.png new file mode 100644 index 000000000..2a5582bcf Binary files /dev/null and b/static/graph/1741854036431-9d80b215-73fc-4838-bdda-905d59ebf08e.png differ diff --git a/static/graph/1743162676746-973d8e75-11b5-43d7-8832-724e7332b964.png b/static/graph/1743162676746-973d8e75-11b5-43d7-8832-724e7332b964.png new file mode 100644 index 000000000..9932c9c2c Binary files /dev/null and b/static/graph/1743162676746-973d8e75-11b5-43d7-8832-724e7332b964.png differ diff --git a/static/graph/1743568484086-10eb9a1a-3dd0-42ee-b885-875ac7d81221.png b/static/graph/1743568484086-10eb9a1a-3dd0-42ee-b885-875ac7d81221.png new file mode 100644 index 000000000..b687ea1d3 Binary files /dev/null and b/static/graph/1743568484086-10eb9a1a-3dd0-42ee-b885-875ac7d81221.png differ diff --git a/static/graph/1749448299226-d23a5d01-5a5c-4cbb-bd99-f1e476f808be.png b/static/graph/1749448299226-d23a5d01-5a5c-4cbb-bd99-f1e476f808be.png new file mode 100644 index 000000000..d33ca3084 Binary files /dev/null and b/static/graph/1749448299226-d23a5d01-5a5c-4cbb-bd99-f1e476f808be.png differ diff --git a/static/graph/1755590462430-ff982d97-68dc-46ec-afac-aafefdd188f2.png b/static/graph/1755590462430-ff982d97-68dc-46ec-afac-aafefdd188f2.png new file mode 100644 index 000000000..c2bde3bb1 Binary files /dev/null and b/static/graph/1755590462430-ff982d97-68dc-46ec-afac-aafefdd188f2.png differ diff --git a/static/graph/1755590462522-40c72425-44f1-4da4-983f-3a1ac57c3777.png b/static/graph/1755590462522-40c72425-44f1-4da4-983f-3a1ac57c3777.png new file mode 100644 index 000000000..d048b959a Binary files /dev/null and b/static/graph/1755590462522-40c72425-44f1-4da4-983f-3a1ac57c3777.png differ diff --git a/static/graph/1755590462559-8d88a0bf-c3f1-4d0c-ae20-9f304bc40d5e.png b/static/graph/1755590462559-8d88a0bf-c3f1-4d0c-ae20-9f304bc40d5e.png new file mode 100644 index 000000000..4094b235f Binary files /dev/null and b/static/graph/1755590462559-8d88a0bf-c3f1-4d0c-ae20-9f304bc40d5e.png differ diff --git a/static/graph/1755590462583-8c4300c0-fd3d-48fc-891d-2e6995c4d2ac.png b/static/graph/1755590462583-8c4300c0-fd3d-48fc-891d-2e6995c4d2ac.png new file mode 100644 index 000000000..73690df21 Binary files /dev/null and b/static/graph/1755590462583-8c4300c0-fd3d-48fc-891d-2e6995c4d2ac.png differ diff --git a/static/graph/1755590462620-e414a7fb-cbce-4a81-aa96-f0e0819e7bb9.png b/static/graph/1755590462620-e414a7fb-cbce-4a81-aa96-f0e0819e7bb9.png new file mode 100644 index 000000000..c4b2b8f58 Binary files /dev/null and b/static/graph/1755590462620-e414a7fb-cbce-4a81-aa96-f0e0819e7bb9.png differ diff --git a/static/graph/1755590463631-cf49d325-848b-41fa-8dcf-cf8e657ba89a.png b/static/graph/1755590463631-cf49d325-848b-41fa-8dcf-cf8e657ba89a.png new file mode 100644 index 000000000..d123c1810 Binary files /dev/null and b/static/graph/1755590463631-cf49d325-848b-41fa-8dcf-cf8e657ba89a.png differ diff --git a/static/graph/1755590463694-59d535f5-2adb-4619-8afb-5a8a85ce2057.png b/static/graph/1755590463694-59d535f5-2adb-4619-8afb-5a8a85ce2057.png new file mode 100644 index 000000000..93ead6804 Binary files /dev/null and b/static/graph/1755590463694-59d535f5-2adb-4619-8afb-5a8a85ce2057.png differ diff --git a/static/graph/1755590463877-fc897345-ebcf-4775-8387-42119271af34.png b/static/graph/1755590463877-fc897345-ebcf-4775-8387-42119271af34.png new file mode 100644 index 000000000..064db671c Binary files /dev/null and b/static/graph/1755590463877-fc897345-ebcf-4775-8387-42119271af34.png differ diff --git a/static/graph/1755590463939-e98e78fd-d8b7-49f2-b24b-090fdde11f56.png b/static/graph/1755590463939-e98e78fd-d8b7-49f2-b24b-090fdde11f56.png new file mode 100644 index 000000000..1787446cb Binary files /dev/null and b/static/graph/1755590463939-e98e78fd-d8b7-49f2-b24b-090fdde11f56.png differ diff --git a/static/graph/1755590464758-a048d93b-ac51-47b6-b0f2-17fd924ff708.png b/static/graph/1755590464758-a048d93b-ac51-47b6-b0f2-17fd924ff708.png new file mode 100644 index 000000000..37344faa0 Binary files /dev/null and b/static/graph/1755590464758-a048d93b-ac51-47b6-b0f2-17fd924ff708.png differ diff --git a/static/graph/1755590464897-e875229c-6975-4202-a17c-911139e17175.png b/static/graph/1755590464897-e875229c-6975-4202-a17c-911139e17175.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755590464897-e875229c-6975-4202-a17c-911139e17175.png differ diff --git a/static/graph/1755590524023-48a59804-e15a-41cc-9fd4-fcefb2bc2ac0.png b/static/graph/1755590524023-48a59804-e15a-41cc-9fd4-fcefb2bc2ac0.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755590524023-48a59804-e15a-41cc-9fd4-fcefb2bc2ac0.png differ diff --git a/static/graph/1755590524194-4235a7d1-16e8-4d3b-8d4b-97d2410faaff.png b/static/graph/1755590524194-4235a7d1-16e8-4d3b-8d4b-97d2410faaff.png new file mode 100644 index 000000000..05b8fe3bf Binary files /dev/null and b/static/graph/1755590524194-4235a7d1-16e8-4d3b-8d4b-97d2410faaff.png differ diff --git a/static/graph/1755590583238-367e66b9-a41d-4535-ac05-08be6fcbb1f0.png b/static/graph/1755590583238-367e66b9-a41d-4535-ac05-08be6fcbb1f0.png new file mode 100644 index 000000000..761e54e2b Binary files /dev/null and b/static/graph/1755590583238-367e66b9-a41d-4535-ac05-08be6fcbb1f0.png differ diff --git a/static/graph/1755590583286-58dbe4bc-c84e-4ced-a1e0-f30dff7baade.png b/static/graph/1755590583286-58dbe4bc-c84e-4ced-a1e0-f30dff7baade.png new file mode 100644 index 000000000..58eb1f323 Binary files /dev/null and b/static/graph/1755590583286-58dbe4bc-c84e-4ced-a1e0-f30dff7baade.png differ diff --git a/static/graph/1755590583430-d8d3e5a4-332b-4ed2-9183-adc92ba394d4.png b/static/graph/1755590583430-d8d3e5a4-332b-4ed2-9183-adc92ba394d4.png new file mode 100644 index 000000000..f83aa5b5e Binary files /dev/null and b/static/graph/1755590583430-d8d3e5a4-332b-4ed2-9183-adc92ba394d4.png differ diff --git a/static/graph/1755590583442-473045cc-500a-45d2-86fb-01b32f5e40ec.png b/static/graph/1755590583442-473045cc-500a-45d2-86fb-01b32f5e40ec.png new file mode 100644 index 000000000..fd73a8140 Binary files /dev/null and b/static/graph/1755590583442-473045cc-500a-45d2-86fb-01b32f5e40ec.png differ diff --git a/static/graph/1755590583491-40041e22-3a2f-44b3-b484-66d504ec3721.png b/static/graph/1755590583491-40041e22-3a2f-44b3-b484-66d504ec3721.png new file mode 100644 index 000000000..c69161378 Binary files /dev/null and b/static/graph/1755590583491-40041e22-3a2f-44b3-b484-66d504ec3721.png differ diff --git a/static/graph/1755590585021-69acf3f9-1bf6-42ff-b94b-62ce8d3b6f9c.png b/static/graph/1755590585021-69acf3f9-1bf6-42ff-b94b-62ce8d3b6f9c.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755590585021-69acf3f9-1bf6-42ff-b94b-62ce8d3b6f9c.png differ diff --git a/static/graph/1755590634103-cc5f09f4-39f6-44be-b66d-79a5668ac307.png b/static/graph/1755590634103-cc5f09f4-39f6-44be-b66d-79a5668ac307.png new file mode 100644 index 000000000..1b9a85d84 Binary files /dev/null and b/static/graph/1755590634103-cc5f09f4-39f6-44be-b66d-79a5668ac307.png differ diff --git a/static/graph/1755590634183-5df5b632-5a9f-405a-8f2d-7a9648cf5c88.png b/static/graph/1755590634183-5df5b632-5a9f-405a-8f2d-7a9648cf5c88.png new file mode 100644 index 000000000..386f90d1c Binary files /dev/null and b/static/graph/1755590634183-5df5b632-5a9f-405a-8f2d-7a9648cf5c88.png differ diff --git a/static/graph/1755590634216-6dc1bef0-d173-4f47-918a-327034b6d445.png b/static/graph/1755590634216-6dc1bef0-d173-4f47-918a-327034b6d445.png new file mode 100644 index 000000000..5be07f00a Binary files /dev/null and b/static/graph/1755590634216-6dc1bef0-d173-4f47-918a-327034b6d445.png differ diff --git a/static/graph/1755590634343-2086da61-d056-4260-bbe8-7782dc151b71.png b/static/graph/1755590634343-2086da61-d056-4260-bbe8-7782dc151b71.png new file mode 100644 index 000000000..46655132d Binary files /dev/null and b/static/graph/1755590634343-2086da61-d056-4260-bbe8-7782dc151b71.png differ diff --git a/static/graph/1755590634449-3c2b9b60-b17b-403c-9435-92ca753f04dd.png b/static/graph/1755590634449-3c2b9b60-b17b-403c-9435-92ca753f04dd.png new file mode 100644 index 000000000..ced7b17fd Binary files /dev/null and b/static/graph/1755590634449-3c2b9b60-b17b-403c-9435-92ca753f04dd.png differ diff --git a/static/graph/1755590635405-eede39f8-4a05-4b21-af13-a309b6eb3f7c.png b/static/graph/1755590635405-eede39f8-4a05-4b21-af13-a309b6eb3f7c.png new file mode 100644 index 000000000..8a82a7c6f Binary files /dev/null and b/static/graph/1755590635405-eede39f8-4a05-4b21-af13-a309b6eb3f7c.png differ diff --git a/static/graph/1755590635456-8581b620-9778-4e3b-8968-27203d2cca59.png b/static/graph/1755590635456-8581b620-9778-4e3b-8968-27203d2cca59.png new file mode 100644 index 000000000..3737d3864 Binary files /dev/null and b/static/graph/1755590635456-8581b620-9778-4e3b-8968-27203d2cca59.png differ diff --git a/static/graph/1755590635988-bdefc357-4a30-4505-9c5a-8594e9d88346.png b/static/graph/1755590635988-bdefc357-4a30-4505-9c5a-8594e9d88346.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755590635988-bdefc357-4a30-4505-9c5a-8594e9d88346.png differ diff --git a/static/graph/1755590689096-911b5143-f4db-4af2-adb2-63f0c86549ba.png b/static/graph/1755590689096-911b5143-f4db-4af2-adb2-63f0c86549ba.png new file mode 100644 index 000000000..2992d9532 Binary files /dev/null and b/static/graph/1755590689096-911b5143-f4db-4af2-adb2-63f0c86549ba.png differ diff --git a/static/graph/1755590689300-d17f09d3-59ff-4381-ada4-062012d22360.png b/static/graph/1755590689300-d17f09d3-59ff-4381-ada4-062012d22360.png new file mode 100644 index 000000000..0658b92ff Binary files /dev/null and b/static/graph/1755590689300-d17f09d3-59ff-4381-ada4-062012d22360.png differ diff --git a/static/graph/1755590689987-54d89c91-6cd4-4bf3-8846-523737a73014.png b/static/graph/1755590689987-54d89c91-6cd4-4bf3-8846-523737a73014.png new file mode 100644 index 000000000..d62d16435 Binary files /dev/null and b/static/graph/1755590689987-54d89c91-6cd4-4bf3-8846-523737a73014.png differ diff --git a/static/graph/1755590690653-deec4e97-416c-46fc-a3ab-48bf493871fc.png b/static/graph/1755590690653-deec4e97-416c-46fc-a3ab-48bf493871fc.png new file mode 100644 index 000000000..107a4b483 Binary files /dev/null and b/static/graph/1755590690653-deec4e97-416c-46fc-a3ab-48bf493871fc.png differ diff --git a/static/graph/1755590693535-4cbf63d5-142f-4765-9013-90463619dd84.png b/static/graph/1755590693535-4cbf63d5-142f-4765-9013-90463619dd84.png new file mode 100644 index 000000000..f5ffa0f47 Binary files /dev/null and b/static/graph/1755590693535-4cbf63d5-142f-4765-9013-90463619dd84.png differ diff --git a/static/graph/1755590694918-e791f116-2dd7-46f5-9c9f-af24d84fd6e4.png b/static/graph/1755590694918-e791f116-2dd7-46f5-9c9f-af24d84fd6e4.png new file mode 100644 index 000000000..378279fdb Binary files /dev/null and b/static/graph/1755590694918-e791f116-2dd7-46f5-9c9f-af24d84fd6e4.png differ diff --git a/static/graph/1755590699528-f0b62043-38e7-4ff1-9402-a5e813b33715.png b/static/graph/1755590699528-f0b62043-38e7-4ff1-9402-a5e813b33715.png new file mode 100644 index 000000000..39727ce83 Binary files /dev/null and b/static/graph/1755590699528-f0b62043-38e7-4ff1-9402-a5e813b33715.png differ diff --git a/static/graph/1755590699812-900597f3-6842-4178-af43-cbf010126803.png b/static/graph/1755590699812-900597f3-6842-4178-af43-cbf010126803.png new file mode 100644 index 000000000..8b145213e Binary files /dev/null and b/static/graph/1755590699812-900597f3-6842-4178-af43-cbf010126803.png differ diff --git a/static/graph/1755590700970-7d65afb7-8e6b-4f65-a3b6-d4e3d6a6a86d.png b/static/graph/1755590700970-7d65afb7-8e6b-4f65-a3b6-d4e3d6a6a86d.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755590700970-7d65afb7-8e6b-4f65-a3b6-d4e3d6a6a86d.png differ diff --git a/static/graph/1755590701971-086e1566-2487-4247-8846-e9847358ccc1.png b/static/graph/1755590701971-086e1566-2487-4247-8846-e9847358ccc1.png new file mode 100644 index 000000000..66839f7ab Binary files /dev/null and b/static/graph/1755590701971-086e1566-2487-4247-8846-e9847358ccc1.png differ diff --git a/static/graph/1755590705180-a40b9136-faa4-4fce-bf14-f6b65f8586c2.png b/static/graph/1755590705180-a40b9136-faa4-4fce-bf14-f6b65f8586c2.png new file mode 100644 index 000000000..b6ba1ed5d Binary files /dev/null and b/static/graph/1755590705180-a40b9136-faa4-4fce-bf14-f6b65f8586c2.png differ diff --git a/static/graph/1755590705203-20764e69-1b54-4228-9ffc-2fab6e5953f5.png b/static/graph/1755590705203-20764e69-1b54-4228-9ffc-2fab6e5953f5.png new file mode 100644 index 000000000..3216961d4 Binary files /dev/null and b/static/graph/1755590705203-20764e69-1b54-4228-9ffc-2fab6e5953f5.png differ diff --git a/static/graph/1755590820839-32a0da91-82f2-43d8-af0e-1046f171431e.png b/static/graph/1755590820839-32a0da91-82f2-43d8-af0e-1046f171431e.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755590820839-32a0da91-82f2-43d8-af0e-1046f171431e.png differ diff --git a/static/graph/1755590865396-ffe17ed9-cdca-4b4b-8abc-15b9c1fc0772.png b/static/graph/1755590865396-ffe17ed9-cdca-4b4b-8abc-15b9c1fc0772.png new file mode 100644 index 000000000..3922c6f6e Binary files /dev/null and b/static/graph/1755590865396-ffe17ed9-cdca-4b4b-8abc-15b9c1fc0772.png differ diff --git a/static/graph/1755590930068-dd98bfe8-1ac0-4c0c-ab21-742389a69b83.png b/static/graph/1755590930068-dd98bfe8-1ac0-4c0c-ab21-742389a69b83.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755590930068-dd98bfe8-1ac0-4c0c-ab21-742389a69b83.png differ diff --git a/static/graph/1755590931230-a8c6d558-cae0-4a8d-8fe8-079e4826468a.png b/static/graph/1755590931230-a8c6d558-cae0-4a8d-8fe8-079e4826468a.png new file mode 100644 index 000000000..67cbe0ead Binary files /dev/null and b/static/graph/1755590931230-a8c6d558-cae0-4a8d-8fe8-079e4826468a.png differ diff --git a/static/graph/1755590976984-5fee7369-5c67-4557-bf20-04344aab49b0.png b/static/graph/1755590976984-5fee7369-5c67-4557-bf20-04344aab49b0.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755590976984-5fee7369-5c67-4557-bf20-04344aab49b0.png differ diff --git a/static/graph/1755591074262-80ca798e-7f4f-444b-b5a2-d8628416e55d.png b/static/graph/1755591074262-80ca798e-7f4f-444b-b5a2-d8628416e55d.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755591074262-80ca798e-7f4f-444b-b5a2-d8628416e55d.png differ diff --git a/static/graph/1755591135224-66acf812-e1e0-4433-9d74-6ce3b2b03b29.png b/static/graph/1755591135224-66acf812-e1e0-4433-9d74-6ce3b2b03b29.png new file mode 100644 index 000000000..40aac94a5 Binary files /dev/null and b/static/graph/1755591135224-66acf812-e1e0-4433-9d74-6ce3b2b03b29.png differ diff --git a/static/graph/1755591238942-9738b84e-e9cb-42a2-910e-e565d3f71a42.png b/static/graph/1755591238942-9738b84e-e9cb-42a2-910e-e565d3f71a42.png new file mode 100644 index 000000000..c0acb147f Binary files /dev/null and b/static/graph/1755591238942-9738b84e-e9cb-42a2-910e-e565d3f71a42.png differ diff --git a/static/graph/1755591238998-b2397697-d50d-4696-9b68-ec6074f1dc77.png b/static/graph/1755591238998-b2397697-d50d-4696-9b68-ec6074f1dc77.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755591238998-b2397697-d50d-4696-9b68-ec6074f1dc77.png differ diff --git a/static/graph/1755591239008-8bb90521-3739-468a-9810-620c3aa45406.png b/static/graph/1755591239008-8bb90521-3739-468a-9810-620c3aa45406.png new file mode 100644 index 000000000..c6df27339 Binary files /dev/null and b/static/graph/1755591239008-8bb90521-3739-468a-9810-620c3aa45406.png differ diff --git a/static/graph/1755591243591-d5d1060e-5412-45b9-9fed-852bea2d9583.png b/static/graph/1755591243591-d5d1060e-5412-45b9-9fed-852bea2d9583.png new file mode 100644 index 000000000..74ea8cecd Binary files /dev/null and b/static/graph/1755591243591-d5d1060e-5412-45b9-9fed-852bea2d9583.png differ diff --git a/static/graph/1755591249760-e960773a-5d16-45b0-9fc3-ca04104f6a91.png b/static/graph/1755591249760-e960773a-5d16-45b0-9fc3-ca04104f6a91.png new file mode 100644 index 000000000..eebe21d00 Binary files /dev/null and b/static/graph/1755591249760-e960773a-5d16-45b0-9fc3-ca04104f6a91.png differ diff --git a/static/graph/1755591298928-e206740c-c925-448f-9887-267f3959903f.png b/static/graph/1755591298928-e206740c-c925-448f-9887-267f3959903f.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755591298928-e206740c-c925-448f-9887-267f3959903f.png differ diff --git a/static/graph/1755591299462-b49ebf18-1266-4830-89d2-ed984b84ff84.png b/static/graph/1755591299462-b49ebf18-1266-4830-89d2-ed984b84ff84.png new file mode 100644 index 000000000..5db69e43b Binary files /dev/null and b/static/graph/1755591299462-b49ebf18-1266-4830-89d2-ed984b84ff84.png differ diff --git a/static/graph/1755591348634-6e742d50-830e-44ac-9dcd-0afdce02eceb.png b/static/graph/1755591348634-6e742d50-830e-44ac-9dcd-0afdce02eceb.png new file mode 100644 index 000000000..0dec02b8c Binary files /dev/null and b/static/graph/1755591348634-6e742d50-830e-44ac-9dcd-0afdce02eceb.png differ diff --git a/static/graph/1755591361883-744efdac-273b-4d79-8238-a4db875cf0c7.png b/static/graph/1755591361883-744efdac-273b-4d79-8238-a4db875cf0c7.png new file mode 100644 index 000000000..334df4e3f Binary files /dev/null and b/static/graph/1755591361883-744efdac-273b-4d79-8238-a4db875cf0c7.png differ diff --git a/static/graph/1755591363107-b610d139-419a-4cea-836a-be55ca928360.png b/static/graph/1755591363107-b610d139-419a-4cea-836a-be55ca928360.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755591363107-b610d139-419a-4cea-836a-be55ca928360.png differ diff --git a/static/graph/1755591371971-1363f96a-0f25-472b-9929-374f2799f01e.png b/static/graph/1755591371971-1363f96a-0f25-472b-9929-374f2799f01e.png new file mode 100644 index 000000000..af937a031 Binary files /dev/null and b/static/graph/1755591371971-1363f96a-0f25-472b-9929-374f2799f01e.png differ diff --git a/static/graph/1755591421306-cce77bdd-6f01-4e3f-b7ea-70d2e985bc0d.png b/static/graph/1755591421306-cce77bdd-6f01-4e3f-b7ea-70d2e985bc0d.png new file mode 100644 index 000000000..fa6a38d82 Binary files /dev/null and b/static/graph/1755591421306-cce77bdd-6f01-4e3f-b7ea-70d2e985bc0d.png differ diff --git a/static/graph/1755591451238-c7783506-ae3c-49b9-be7f-a71610257ace.png b/static/graph/1755591451238-c7783506-ae3c-49b9-be7f-a71610257ace.png new file mode 100644 index 000000000..b0f744df4 Binary files /dev/null and b/static/graph/1755591451238-c7783506-ae3c-49b9-be7f-a71610257ace.png differ diff --git a/static/graph/1755591453805-84debc07-c1c0-44f2-8bd2-411ba2a4ee5b.png b/static/graph/1755591453805-84debc07-c1c0-44f2-8bd2-411ba2a4ee5b.png new file mode 100644 index 000000000..d946877b4 Binary files /dev/null and b/static/graph/1755591453805-84debc07-c1c0-44f2-8bd2-411ba2a4ee5b.png differ diff --git a/static/graph/1755591536807-6ca175a4-103a-4780-8115-024850aeb919.png b/static/graph/1755591536807-6ca175a4-103a-4780-8115-024850aeb919.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755591536807-6ca175a4-103a-4780-8115-024850aeb919.png differ diff --git a/static/graph/1755591537107-6601a6a6-d746-4373-abbb-77896b7a68a2.png b/static/graph/1755591537107-6601a6a6-d746-4373-abbb-77896b7a68a2.png new file mode 100644 index 000000000..a5f44fd1b Binary files /dev/null and b/static/graph/1755591537107-6601a6a6-d746-4373-abbb-77896b7a68a2.png differ diff --git a/static/graph/1755591537249-9eeb87f6-1cb3-458a-879c-58fe04b25579.png b/static/graph/1755591537249-9eeb87f6-1cb3-458a-879c-58fe04b25579.png new file mode 100644 index 000000000..2660a52c0 Binary files /dev/null and b/static/graph/1755591537249-9eeb87f6-1cb3-458a-879c-58fe04b25579.png differ diff --git a/static/graph/1755591587472-328b525e-7b38-4a8f-ba15-2a59d73cd5c5.png b/static/graph/1755591587472-328b525e-7b38-4a8f-ba15-2a59d73cd5c5.png new file mode 100644 index 000000000..654398047 Binary files /dev/null and b/static/graph/1755591587472-328b525e-7b38-4a8f-ba15-2a59d73cd5c5.png differ diff --git a/static/graph/1755591588572-1344f1ed-fe8f-413f-93f7-82917c2bb412.png b/static/graph/1755591588572-1344f1ed-fe8f-413f-93f7-82917c2bb412.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755591588572-1344f1ed-fe8f-413f-93f7-82917c2bb412.png differ diff --git a/static/graph/1755591590394-11aadd16-414b-4a0f-8392-8f1e90b705f4.png b/static/graph/1755591590394-11aadd16-414b-4a0f-8392-8f1e90b705f4.png new file mode 100644 index 000000000..987942b27 Binary files /dev/null and b/static/graph/1755591590394-11aadd16-414b-4a0f-8392-8f1e90b705f4.png differ diff --git a/static/graph/1755591590404-b0d9dba0-8242-4828-83c5-5d287f52d17f.png b/static/graph/1755591590404-b0d9dba0-8242-4828-83c5-5d287f52d17f.png new file mode 100644 index 000000000..a257e18de Binary files /dev/null and b/static/graph/1755591590404-b0d9dba0-8242-4828-83c5-5d287f52d17f.png differ diff --git a/static/graph/1755591590486-76f2ce41-6b9f-4491-813d-841089beb31a.png b/static/graph/1755591590486-76f2ce41-6b9f-4491-813d-841089beb31a.png new file mode 100644 index 000000000..6995f4945 Binary files /dev/null and b/static/graph/1755591590486-76f2ce41-6b9f-4491-813d-841089beb31a.png differ diff --git a/static/graph/1755591591636-292847a9-f57b-47ac-b775-b00ede04cc6f.png b/static/graph/1755591591636-292847a9-f57b-47ac-b775-b00ede04cc6f.png new file mode 100644 index 000000000..a6439a090 Binary files /dev/null and b/static/graph/1755591591636-292847a9-f57b-47ac-b775-b00ede04cc6f.png differ diff --git a/static/graph/1755591645822-001ef6ae-691f-4d1b-a773-5c6b1a166c7d.png b/static/graph/1755591645822-001ef6ae-691f-4d1b-a773-5c6b1a166c7d.png new file mode 100644 index 000000000..be8597fcb Binary files /dev/null and b/static/graph/1755591645822-001ef6ae-691f-4d1b-a773-5c6b1a166c7d.png differ diff --git a/static/graph/1755591646835-4d4e09b7-4f97-4904-9fe4-425101351793.png b/static/graph/1755591646835-4d4e09b7-4f97-4904-9fe4-425101351793.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755591646835-4d4e09b7-4f97-4904-9fe4-425101351793.png differ diff --git a/static/graph/1755591653148-f982c9de-15ef-4cf2-b2e8-a0ebd46ed291.png b/static/graph/1755591653148-f982c9de-15ef-4cf2-b2e8-a0ebd46ed291.png new file mode 100644 index 000000000..9b3ece3e2 Binary files /dev/null and b/static/graph/1755591653148-f982c9de-15ef-4cf2-b2e8-a0ebd46ed291.png differ diff --git a/static/graph/1755591664595-ae5bdde7-5a0d-45b2-9135-c12df316caf2.png b/static/graph/1755591664595-ae5bdde7-5a0d-45b2-9135-c12df316caf2.png new file mode 100644 index 000000000..3a6c61eab Binary files /dev/null and b/static/graph/1755591664595-ae5bdde7-5a0d-45b2-9135-c12df316caf2.png differ diff --git a/static/graph/1755591665635-8b174637-4f16-426f-83ed-2e23045da935.png b/static/graph/1755591665635-8b174637-4f16-426f-83ed-2e23045da935.png new file mode 100644 index 000000000..a791853b4 Binary files /dev/null and b/static/graph/1755591665635-8b174637-4f16-426f-83ed-2e23045da935.png differ diff --git a/static/graph/1755591665805-690da66d-8f30-43d0-ab67-c1e4607737b4.png b/static/graph/1755591665805-690da66d-8f30-43d0-ab67-c1e4607737b4.png new file mode 100644 index 000000000..152e98d58 Binary files /dev/null and b/static/graph/1755591665805-690da66d-8f30-43d0-ab67-c1e4607737b4.png differ diff --git a/static/graph/1755591744911-4383871d-f60b-4fb4-8ee1-435a8ed702d0.png b/static/graph/1755591744911-4383871d-f60b-4fb4-8ee1-435a8ed702d0.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755591744911-4383871d-f60b-4fb4-8ee1-435a8ed702d0.png differ diff --git a/static/graph/1755591759812-c2b7fa2a-01d3-4562-8b09-c77808435cc2.png b/static/graph/1755591759812-c2b7fa2a-01d3-4562-8b09-c77808435cc2.png new file mode 100644 index 000000000..cc3d6c2ae Binary files /dev/null and b/static/graph/1755591759812-c2b7fa2a-01d3-4562-8b09-c77808435cc2.png differ diff --git a/static/graph/1755591761309-851b2211-a9bf-4ca0-bfc0-b781e12ad5ad.png b/static/graph/1755591761309-851b2211-a9bf-4ca0-bfc0-b781e12ad5ad.png new file mode 100644 index 000000000..531ac5695 Binary files /dev/null and b/static/graph/1755591761309-851b2211-a9bf-4ca0-bfc0-b781e12ad5ad.png differ diff --git a/static/graph/1755591806655-d5e22d15-81ca-4104-bd60-aa7f786760f6.png b/static/graph/1755591806655-d5e22d15-81ca-4104-bd60-aa7f786760f6.png new file mode 100644 index 000000000..01dddc38e Binary files /dev/null and b/static/graph/1755591806655-d5e22d15-81ca-4104-bd60-aa7f786760f6.png differ diff --git a/static/graph/1755591845997-b32224ad-1dce-4c1a-a270-92c875647299.png b/static/graph/1755591845997-b32224ad-1dce-4c1a-a270-92c875647299.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755591845997-b32224ad-1dce-4c1a-a270-92c875647299.png differ diff --git a/static/graph/1755591849163-db0d05cb-ab81-4371-864b-234443c913be.png b/static/graph/1755591849163-db0d05cb-ab81-4371-864b-234443c913be.png new file mode 100644 index 000000000..1908fc245 Binary files /dev/null and b/static/graph/1755591849163-db0d05cb-ab81-4371-864b-234443c913be.png differ diff --git a/static/graph/1755591849768-88bd0c09-65f3-460d-a79d-c4731dde9785.png b/static/graph/1755591849768-88bd0c09-65f3-460d-a79d-c4731dde9785.png new file mode 100644 index 000000000..1799321d4 Binary files /dev/null and b/static/graph/1755591849768-88bd0c09-65f3-460d-a79d-c4731dde9785.png differ diff --git a/static/graph/1755591850659-a35545c4-15a1-411b-a1ab-5d03d560b3f8.png b/static/graph/1755591850659-a35545c4-15a1-411b-a1ab-5d03d560b3f8.png new file mode 100644 index 000000000..f9043d0a9 Binary files /dev/null and b/static/graph/1755591850659-a35545c4-15a1-411b-a1ab-5d03d560b3f8.png differ diff --git a/static/graph/1755591851397-16609bd5-fe54-4c1b-a710-1ac7a94b06bb.png b/static/graph/1755591851397-16609bd5-fe54-4c1b-a710-1ac7a94b06bb.png new file mode 100644 index 000000000..4e1f7fb7f Binary files /dev/null and b/static/graph/1755591851397-16609bd5-fe54-4c1b-a710-1ac7a94b06bb.png differ diff --git a/static/graph/1755591894892-4868f3f2-0626-489f-a48b-1245c941d3e6.png b/static/graph/1755591894892-4868f3f2-0626-489f-a48b-1245c941d3e6.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755591894892-4868f3f2-0626-489f-a48b-1245c941d3e6.png differ diff --git a/static/graph/1755591895974-f5d6a894-ca0c-4353-9322-fc6a0cd318a0.png b/static/graph/1755591895974-f5d6a894-ca0c-4353-9322-fc6a0cd318a0.png new file mode 100644 index 000000000..61979e3db Binary files /dev/null and b/static/graph/1755591895974-f5d6a894-ca0c-4353-9322-fc6a0cd318a0.png differ diff --git a/static/graph/1755591897491-7f2fd43a-9e78-458a-bb8c-95f1a9266158.png b/static/graph/1755591897491-7f2fd43a-9e78-458a-bb8c-95f1a9266158.png new file mode 100644 index 000000000..43d1907a4 Binary files /dev/null and b/static/graph/1755591897491-7f2fd43a-9e78-458a-bb8c-95f1a9266158.png differ diff --git a/static/graph/1755591972867-e737a9cc-93c6-4fb4-ae9d-8be2e98cfe04.png b/static/graph/1755591972867-e737a9cc-93c6-4fb4-ae9d-8be2e98cfe04.png new file mode 100644 index 000000000..c062f8d54 Binary files /dev/null and b/static/graph/1755591972867-e737a9cc-93c6-4fb4-ae9d-8be2e98cfe04.png differ diff --git a/static/graph/1755591976335-765c0fd8-d56c-475f-8eaa-a4858f5cccff.png b/static/graph/1755591976335-765c0fd8-d56c-475f-8eaa-a4858f5cccff.png new file mode 100644 index 000000000..c66bfd34b Binary files /dev/null and b/static/graph/1755591976335-765c0fd8-d56c-475f-8eaa-a4858f5cccff.png differ diff --git a/static/graph/1755591977995-dcc34d45-e380-428e-8f21-e7bceebf110c.png b/static/graph/1755591977995-dcc34d45-e380-428e-8f21-e7bceebf110c.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755591977995-dcc34d45-e380-428e-8f21-e7bceebf110c.png differ diff --git a/static/graph/1755591984320-49890880-ded1-4cfa-970c-77e2e0414afe.png b/static/graph/1755591984320-49890880-ded1-4cfa-970c-77e2e0414afe.png new file mode 100644 index 000000000..41e35a806 Binary files /dev/null and b/static/graph/1755591984320-49890880-ded1-4cfa-970c-77e2e0414afe.png differ diff --git a/static/graph/1755591985947-34d537cb-f520-46ea-8ad4-7191c48896b4.png b/static/graph/1755591985947-34d537cb-f520-46ea-8ad4-7191c48896b4.png new file mode 100644 index 000000000..b2b5578f0 Binary files /dev/null and b/static/graph/1755591985947-34d537cb-f520-46ea-8ad4-7191c48896b4.png differ diff --git a/static/graph/1755591989721-46f20cc2-2795-4fd5-b6f4-264b015de975.png b/static/graph/1755591989721-46f20cc2-2795-4fd5-b6f4-264b015de975.png new file mode 100644 index 000000000..f6b231a4a Binary files /dev/null and b/static/graph/1755591989721-46f20cc2-2795-4fd5-b6f4-264b015de975.png differ diff --git a/static/graph/1755591989810-7378c101-556d-4b7b-a471-f79359ef0166.png b/static/graph/1755591989810-7378c101-556d-4b7b-a471-f79359ef0166.png new file mode 100644 index 000000000..531ac5695 Binary files /dev/null and b/static/graph/1755591989810-7378c101-556d-4b7b-a471-f79359ef0166.png differ diff --git a/static/graph/1755592032287-17d104b9-1706-4d35-b96d-27b84e2a2288.png b/static/graph/1755592032287-17d104b9-1706-4d35-b96d-27b84e2a2288.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755592032287-17d104b9-1706-4d35-b96d-27b84e2a2288.png differ diff --git a/static/graph/1755592034234-84eecf48-6a33-4b91-8846-c83dd5b1abac.png b/static/graph/1755592034234-84eecf48-6a33-4b91-8846-c83dd5b1abac.png new file mode 100644 index 000000000..603df3a93 Binary files /dev/null and b/static/graph/1755592034234-84eecf48-6a33-4b91-8846-c83dd5b1abac.png differ diff --git a/static/graph/1755592038487-d14f2500-280e-45b1-80f6-487e67e37000.png b/static/graph/1755592038487-d14f2500-280e-45b1-80f6-487e67e37000.png new file mode 100644 index 000000000..43d1907a4 Binary files /dev/null and b/static/graph/1755592038487-d14f2500-280e-45b1-80f6-487e67e37000.png differ diff --git a/static/graph/1755592075976-4ee641c3-3507-4a9d-b760-a3acf0e3615f.png b/static/graph/1755592075976-4ee641c3-3507-4a9d-b760-a3acf0e3615f.png new file mode 100644 index 000000000..b8a303d04 Binary files /dev/null and b/static/graph/1755592075976-4ee641c3-3507-4a9d-b760-a3acf0e3615f.png differ diff --git a/static/graph/1755592076402-0de1627f-be18-41c3-8a0f-aaafc64f79f6.png b/static/graph/1755592076402-0de1627f-be18-41c3-8a0f-aaafc64f79f6.png new file mode 100644 index 000000000..00d89eba0 Binary files /dev/null and b/static/graph/1755592076402-0de1627f-be18-41c3-8a0f-aaafc64f79f6.png differ diff --git a/static/graph/1755592076997-b3e2a455-8f5d-450e-9f72-8e1e4883382c.png b/static/graph/1755592076997-b3e2a455-8f5d-450e-9f72-8e1e4883382c.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755592076997-b3e2a455-8f5d-450e-9f72-8e1e4883382c.png differ diff --git a/static/graph/1755592079489-0f954ae4-cf1c-4f45-a38c-a4150d9b2f1f.png b/static/graph/1755592079489-0f954ae4-cf1c-4f45-a38c-a4150d9b2f1f.png new file mode 100644 index 000000000..0cd8729c1 Binary files /dev/null and b/static/graph/1755592079489-0f954ae4-cf1c-4f45-a38c-a4150d9b2f1f.png differ diff --git a/static/graph/1755592094562-f301cd7f-7afb-4b98-9924-77a9dea305e7.png b/static/graph/1755592094562-f301cd7f-7afb-4b98-9924-77a9dea305e7.png new file mode 100644 index 000000000..bab507daa Binary files /dev/null and b/static/graph/1755592094562-f301cd7f-7afb-4b98-9924-77a9dea305e7.png differ diff --git a/static/graph/1755592096707-4bbfd81b-d16f-4f88-94c5-42c8152097a6.png b/static/graph/1755592096707-4bbfd81b-d16f-4f88-94c5-42c8152097a6.png new file mode 100644 index 000000000..bf5bdc773 Binary files /dev/null and b/static/graph/1755592096707-4bbfd81b-d16f-4f88-94c5-42c8152097a6.png differ diff --git a/static/graph/1755592139066-ccf4ce91-869b-4a53-ad74-0534deac4538.png b/static/graph/1755592139066-ccf4ce91-869b-4a53-ad74-0534deac4538.png new file mode 100644 index 000000000..e1998f7eb Binary files /dev/null and b/static/graph/1755592139066-ccf4ce91-869b-4a53-ad74-0534deac4538.png differ diff --git a/static/graph/1755592142550-59d5f6a2-6578-4b84-9504-42c5baef7214.png b/static/graph/1755592142550-59d5f6a2-6578-4b84-9504-42c5baef7214.png new file mode 100644 index 000000000..dc45fc466 Binary files /dev/null and b/static/graph/1755592142550-59d5f6a2-6578-4b84-9504-42c5baef7214.png differ diff --git a/static/graph/1755592142660-4d2945fa-5b73-46d6-94cf-2eabdb039d9a.png b/static/graph/1755592142660-4d2945fa-5b73-46d6-94cf-2eabdb039d9a.png new file mode 100644 index 000000000..6a80b84fa Binary files /dev/null and b/static/graph/1755592142660-4d2945fa-5b73-46d6-94cf-2eabdb039d9a.png differ diff --git a/static/graph/1755592144365-7efa021c-6874-4a34-8b22-dbe44c8abd88.png b/static/graph/1755592144365-7efa021c-6874-4a34-8b22-dbe44c8abd88.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755592144365-7efa021c-6874-4a34-8b22-dbe44c8abd88.png differ diff --git a/static/graph/1755592144442-386c2594-6867-4b9c-a886-4992014b83b7.png b/static/graph/1755592144442-386c2594-6867-4b9c-a886-4992014b83b7.png new file mode 100644 index 000000000..f2f766c6b Binary files /dev/null and b/static/graph/1755592144442-386c2594-6867-4b9c-a886-4992014b83b7.png differ diff --git a/static/graph/1755592159377-baf2a035-12bb-4e95-825e-ffd02bc02c46.png b/static/graph/1755592159377-baf2a035-12bb-4e95-825e-ffd02bc02c46.png new file mode 100644 index 000000000..3bf65ab6a Binary files /dev/null and b/static/graph/1755592159377-baf2a035-12bb-4e95-825e-ffd02bc02c46.png differ diff --git a/static/graph/1755592164223-ed815750-dcc9-4b52-9169-1464e5285e79.png b/static/graph/1755592164223-ed815750-dcc9-4b52-9169-1464e5285e79.png new file mode 100644 index 000000000..3c0852193 Binary files /dev/null and b/static/graph/1755592164223-ed815750-dcc9-4b52-9169-1464e5285e79.png differ diff --git a/static/graph/1755592164242-70177a12-1b68-4306-a6ed-b416301b3d6e.png b/static/graph/1755592164242-70177a12-1b68-4306-a6ed-b416301b3d6e.png new file mode 100644 index 000000000..8c5ada10b Binary files /dev/null and b/static/graph/1755592164242-70177a12-1b68-4306-a6ed-b416301b3d6e.png differ diff --git a/static/graph/1755592229183-6e880de0-ebb8-476b-9531-bbe277bbe705.png b/static/graph/1755592229183-6e880de0-ebb8-476b-9531-bbe277bbe705.png new file mode 100644 index 000000000..f2bd4a573 Binary files /dev/null and b/static/graph/1755592229183-6e880de0-ebb8-476b-9531-bbe277bbe705.png differ diff --git a/static/graph/1755592247311-ef91bb78-19ae-43f2-8e7a-701001b7759d.png b/static/graph/1755592247311-ef91bb78-19ae-43f2-8e7a-701001b7759d.png new file mode 100644 index 000000000..a56dd50c3 Binary files /dev/null and b/static/graph/1755592247311-ef91bb78-19ae-43f2-8e7a-701001b7759d.png differ diff --git a/static/graph/1755592259385-b4377a0d-7c78-4ce4-a19c-d21e6aea6290.png b/static/graph/1755592259385-b4377a0d-7c78-4ce4-a19c-d21e6aea6290.png new file mode 100644 index 000000000..35f72884c Binary files /dev/null and b/static/graph/1755592259385-b4377a0d-7c78-4ce4-a19c-d21e6aea6290.png differ diff --git a/static/graph/1755592294749-1c359609-0c5c-464f-8181-b60e6d740773.png b/static/graph/1755592294749-1c359609-0c5c-464f-8181-b60e6d740773.png new file mode 100644 index 000000000..f10d5e252 Binary files /dev/null and b/static/graph/1755592294749-1c359609-0c5c-464f-8181-b60e6d740773.png differ diff --git a/static/graph/1755608876791-a0f8c2a1-2a1f-470f-8360-20e11256b7da.png b/static/graph/1755608876791-a0f8c2a1-2a1f-470f-8360-20e11256b7da.png new file mode 100644 index 000000000..bfd354b08 Binary files /dev/null and b/static/graph/1755608876791-a0f8c2a1-2a1f-470f-8360-20e11256b7da.png differ diff --git a/static/graph/1755608897897-c9f26965-97b0-4c05-8bc1-3b8441cce90a.png b/static/graph/1755608897897-c9f26965-97b0-4c05-8bc1-3b8441cce90a.png new file mode 100644 index 000000000..10a19956c Binary files /dev/null and b/static/graph/1755608897897-c9f26965-97b0-4c05-8bc1-3b8441cce90a.png differ diff --git a/static/graph/1755608950376-1033c9a5-76d4-443f-a2b6-fbffb6191d56.png b/static/graph/1755608950376-1033c9a5-76d4-443f-a2b6-fbffb6191d56.png new file mode 100644 index 000000000..fdf0c72dd Binary files /dev/null and b/static/graph/1755608950376-1033c9a5-76d4-443f-a2b6-fbffb6191d56.png differ diff --git a/static/graph/1755608970990-ee92b2a7-cce4-4468-b691-d302937205ec.png b/static/graph/1755608970990-ee92b2a7-cce4-4468-b691-d302937205ec.png new file mode 100644 index 000000000..f0718bc04 Binary files /dev/null and b/static/graph/1755608970990-ee92b2a7-cce4-4468-b691-d302937205ec.png differ diff --git a/static/graph/1755608983849-35ac9d0d-55cb-48b8-a01e-1e99aceaf11d.png b/static/graph/1755608983849-35ac9d0d-55cb-48b8-a01e-1e99aceaf11d.png new file mode 100644 index 000000000..0c79593ad Binary files /dev/null and b/static/graph/1755608983849-35ac9d0d-55cb-48b8-a01e-1e99aceaf11d.png differ diff --git a/static/graph/1755609001322-7f019699-9e69-4b85-a557-5d4ec75b1d41.png b/static/graph/1755609001322-7f019699-9e69-4b85-a557-5d4ec75b1d41.png new file mode 100644 index 000000000..8e2749df3 Binary files /dev/null and b/static/graph/1755609001322-7f019699-9e69-4b85-a557-5d4ec75b1d41.png differ diff --git a/static/graph/1755609016219-5bd192fb-f3ea-4357-bc27-0ab3f8fd55f8.png b/static/graph/1755609016219-5bd192fb-f3ea-4357-bc27-0ab3f8fd55f8.png new file mode 100644 index 000000000..79ef79045 Binary files /dev/null and b/static/graph/1755609016219-5bd192fb-f3ea-4357-bc27-0ab3f8fd55f8.png differ diff --git a/static/graph/1755609028453-4a77fefb-1e77-4b11-9b70-1bdbb077826c.png b/static/graph/1755609028453-4a77fefb-1e77-4b11-9b70-1bdbb077826c.png new file mode 100644 index 000000000..ece0b38d5 Binary files /dev/null and b/static/graph/1755609028453-4a77fefb-1e77-4b11-9b70-1bdbb077826c.png differ diff --git a/static/graph/1755609039536-d72dae2f-54f7-479c-9721-f44dc664016e.png b/static/graph/1755609039536-d72dae2f-54f7-479c-9721-f44dc664016e.png new file mode 100644 index 000000000..07a10191a Binary files /dev/null and b/static/graph/1755609039536-d72dae2f-54f7-479c-9721-f44dc664016e.png differ diff --git a/static/graph/1755609052563-e9b6e50c-852d-48fb-b334-7e7586287a03.png b/static/graph/1755609052563-e9b6e50c-852d-48fb-b334-7e7586287a03.png new file mode 100644 index 000000000..868f35b58 Binary files /dev/null and b/static/graph/1755609052563-e9b6e50c-852d-48fb-b334-7e7586287a03.png differ diff --git a/static/graph/1755609067341-d777e122-207f-4c18-8d7b-a0ef8151b861.png b/static/graph/1755609067341-d777e122-207f-4c18-8d7b-a0ef8151b861.png new file mode 100644 index 000000000..c15cdde31 Binary files /dev/null and b/static/graph/1755609067341-d777e122-207f-4c18-8d7b-a0ef8151b861.png differ diff --git a/static/graph/1755609080630-9132bd40-3ffb-4f98-ab93-fb083c1fbfe0.png b/static/graph/1755609080630-9132bd40-3ffb-4f98-ab93-fb083c1fbfe0.png new file mode 100644 index 000000000..a0313dfd9 Binary files /dev/null and b/static/graph/1755609080630-9132bd40-3ffb-4f98-ab93-fb083c1fbfe0.png differ diff --git a/static/graph/1755609098935-d268d785-ff3c-4a2d-85c6-2db615589c9c.png b/static/graph/1755609098935-d268d785-ff3c-4a2d-85c6-2db615589c9c.png new file mode 100644 index 000000000..580712e7f Binary files /dev/null and b/static/graph/1755609098935-d268d785-ff3c-4a2d-85c6-2db615589c9c.png differ diff --git a/static/graph/1756792583755-e264437e-59a4-4483-81b9-6b1b26a49279.webp b/static/graph/1756792583755-e264437e-59a4-4483-81b9-6b1b26a49279.webp new file mode 100644 index 000000000..162a61502 Binary files /dev/null and b/static/graph/1756792583755-e264437e-59a4-4483-81b9-6b1b26a49279.webp differ diff --git a/static/graph/1756792583799-ea7feea1-1279-4089-bbb9-61fc0b6331b2.webp b/static/graph/1756792583799-ea7feea1-1279-4089-bbb9-61fc0b6331b2.webp new file mode 100644 index 000000000..38ada004f Binary files /dev/null and b/static/graph/1756792583799-ea7feea1-1279-4089-bbb9-61fc0b6331b2.webp differ diff --git a/static/graph/1756792583809-8c862fd5-1d7f-4b8b-9b34-e63d1692deb6.webp b/static/graph/1756792583809-8c862fd5-1d7f-4b8b-9b34-e63d1692deb6.webp new file mode 100644 index 000000000..bbbe0e4b0 Binary files /dev/null and b/static/graph/1756792583809-8c862fd5-1d7f-4b8b-9b34-e63d1692deb6.webp differ diff --git a/static/graph/1756792583816-33d13188-a315-4079-bb90-89c58d5a4e82.webp b/static/graph/1756792583816-33d13188-a315-4079-bb90-89c58d5a4e82.webp new file mode 100644 index 000000000..b30785ca2 Binary files /dev/null and b/static/graph/1756792583816-33d13188-a315-4079-bb90-89c58d5a4e82.webp differ diff --git a/static/graph/1756792986716-5c2105c7-cbff-458b-84c6-6d511fc360e7.webp b/static/graph/1756792986716-5c2105c7-cbff-458b-84c6-6d511fc360e7.webp new file mode 100644 index 000000000..54af8c93e Binary files /dev/null and b/static/graph/1756792986716-5c2105c7-cbff-458b-84c6-6d511fc360e7.webp differ diff --git a/static/graph/1756792986725-25a2cf58-a7a2-411e-8845-6cecb9c29acd.webp b/static/graph/1756792986725-25a2cf58-a7a2-411e-8845-6cecb9c29acd.webp new file mode 100644 index 000000000..c75c61ce4 Binary files /dev/null and b/static/graph/1756792986725-25a2cf58-a7a2-411e-8845-6cecb9c29acd.webp differ diff --git a/static/graph/1756792986743-dc6cf33f-23a2-4a70-9f2c-aa9c9b8fa2da.webp b/static/graph/1756792986743-dc6cf33f-23a2-4a70-9f2c-aa9c9b8fa2da.webp new file mode 100644 index 000000000..60e77c1c2 Binary files /dev/null and b/static/graph/1756792986743-dc6cf33f-23a2-4a70-9f2c-aa9c9b8fa2da.webp differ diff --git a/static/graph/1756792986790-9c0d8b8f-8c66-4589-b81e-15f8536956bf.webp b/static/graph/1756792986790-9c0d8b8f-8c66-4589-b81e-15f8536956bf.webp new file mode 100644 index 000000000..c0f76fe05 Binary files /dev/null and b/static/graph/1756792986790-9c0d8b8f-8c66-4589-b81e-15f8536956bf.webp differ diff --git a/static/graph/1756792986839-8eee4445-bb1b-4104-85f4-3bc15a5fc021.webp b/static/graph/1756792986839-8eee4445-bb1b-4104-85f4-3bc15a5fc021.webp new file mode 100644 index 000000000..4dbf97482 Binary files /dev/null and b/static/graph/1756792986839-8eee4445-bb1b-4104-85f4-3bc15a5fc021.webp differ diff --git a/static/graph/1756792987364-c77f0314-09ab-4ff3-8047-9d6d3d12b6b4.webp b/static/graph/1756792987364-c77f0314-09ab-4ff3-8047-9d6d3d12b6b4.webp new file mode 100644 index 000000000..a49e04295 Binary files /dev/null and b/static/graph/1756792987364-c77f0314-09ab-4ff3-8047-9d6d3d12b6b4.webp differ diff --git a/static/graph/1756792987443-316faf5a-6578-4faf-ba84-939d0afb0e50.webp b/static/graph/1756792987443-316faf5a-6578-4faf-ba84-939d0afb0e50.webp new file mode 100644 index 000000000..b5fe24c16 Binary files /dev/null and b/static/graph/1756792987443-316faf5a-6578-4faf-ba84-939d0afb0e50.webp differ diff --git a/static/graph/1756792987475-891e2894-bbf3-4f19-a139-243b73f59a9a.webp b/static/graph/1756792987475-891e2894-bbf3-4f19-a139-243b73f59a9a.webp new file mode 100644 index 000000000..e0bc6a12a Binary files /dev/null and b/static/graph/1756792987475-891e2894-bbf3-4f19-a139-243b73f59a9a.webp differ diff --git a/static/graph/1756792987773-faa65852-6287-4046-ab2b-b4e40fa4f88c.webp b/static/graph/1756792987773-faa65852-6287-4046-ab2b-b4e40fa4f88c.webp new file mode 100644 index 000000000..8d06deff7 Binary files /dev/null and b/static/graph/1756792987773-faa65852-6287-4046-ab2b-b4e40fa4f88c.webp differ diff --git a/static/graph/1756792988026-796ff1bd-7af5-4b78-965c-b88e3e1a27d8.webp b/static/graph/1756792988026-796ff1bd-7af5-4b78-965c-b88e3e1a27d8.webp new file mode 100644 index 000000000..3f225acc0 Binary files /dev/null and b/static/graph/1756792988026-796ff1bd-7af5-4b78-965c-b88e3e1a27d8.webp differ diff --git a/static/graph/1756793045504-5e25f725-8a31-4b38-8b68-298ddf4e49bb.webp b/static/graph/1756793045504-5e25f725-8a31-4b38-8b68-298ddf4e49bb.webp new file mode 100644 index 000000000..0da34bf51 Binary files /dev/null and b/static/graph/1756793045504-5e25f725-8a31-4b38-8b68-298ddf4e49bb.webp differ diff --git a/static/graph/1756793045510-afca1d07-33a7-4f6a-b1eb-1d72b042c54c.webp b/static/graph/1756793045510-afca1d07-33a7-4f6a-b1eb-1d72b042c54c.webp new file mode 100644 index 000000000..06aa3b717 Binary files /dev/null and b/static/graph/1756793045510-afca1d07-33a7-4f6a-b1eb-1d72b042c54c.webp differ diff --git a/static/graph/1756793045524-3f0e2323-fe02-40f8-97c5-592a2376b244.webp b/static/graph/1756793045524-3f0e2323-fe02-40f8-97c5-592a2376b244.webp new file mode 100644 index 000000000..fcce0b11b Binary files /dev/null and b/static/graph/1756793045524-3f0e2323-fe02-40f8-97c5-592a2376b244.webp differ diff --git a/static/graph/1756793045570-6c9d2a3e-3866-4f58-9b27-e4bb7090a40e.webp b/static/graph/1756793045570-6c9d2a3e-3866-4f58-9b27-e4bb7090a40e.webp new file mode 100644 index 000000000..620f62447 Binary files /dev/null and b/static/graph/1756793045570-6c9d2a3e-3866-4f58-9b27-e4bb7090a40e.webp differ diff --git a/static/graph/1756793045592-98897b82-65ac-4a66-a492-ed011c2ae392.webp b/static/graph/1756793045592-98897b82-65ac-4a66-a492-ed011c2ae392.webp new file mode 100644 index 000000000..cbfa5cea6 Binary files /dev/null and b/static/graph/1756793045592-98897b82-65ac-4a66-a492-ed011c2ae392.webp differ diff --git a/static/graph/1756793046206-ff65e1e4-84e7-4dcf-8342-55668d0df766.webp b/static/graph/1756793046206-ff65e1e4-84e7-4dcf-8342-55668d0df766.webp new file mode 100644 index 000000000..2586c9d09 Binary files /dev/null and b/static/graph/1756793046206-ff65e1e4-84e7-4dcf-8342-55668d0df766.webp differ diff --git a/static/graph/1756793046241-58451068-967f-43d7-b14c-9abc2fa1cfa9.webp b/static/graph/1756793046241-58451068-967f-43d7-b14c-9abc2fa1cfa9.webp new file mode 100644 index 000000000..f96c84abd Binary files /dev/null and b/static/graph/1756793046241-58451068-967f-43d7-b14c-9abc2fa1cfa9.webp differ diff --git a/static/graph/1756793046409-6698f99b-bc6f-4579-8800-3f3738e9ca98.png b/static/graph/1756793046409-6698f99b-bc6f-4579-8800-3f3738e9ca98.png new file mode 100644 index 000000000..0d32a95c6 Binary files /dev/null and b/static/graph/1756793046409-6698f99b-bc6f-4579-8800-3f3738e9ca98.png differ diff --git a/static/graph/1756793046696-8d2dbcab-f176-4abb-9835-8fe0332968c6.webp b/static/graph/1756793046696-8d2dbcab-f176-4abb-9835-8fe0332968c6.webp new file mode 100644 index 000000000..b2b7d4931 Binary files /dev/null and b/static/graph/1756793046696-8d2dbcab-f176-4abb-9835-8fe0332968c6.webp differ diff --git a/static/graph/1756793046697-3ad5dde4-dcce-49cc-83fc-65ee08d7f014.png b/static/graph/1756793046697-3ad5dde4-dcce-49cc-83fc-65ee08d7f014.png new file mode 100644 index 000000000..1e753596a Binary files /dev/null and b/static/graph/1756793046697-3ad5dde4-dcce-49cc-83fc-65ee08d7f014.png differ diff --git a/static/graph/1756793046841-2a3f3f47-2880-40f5-bbc0-5abf1ebe133c.webp b/static/graph/1756793046841-2a3f3f47-2880-40f5-bbc0-5abf1ebe133c.webp new file mode 100644 index 000000000..ad1876482 Binary files /dev/null and b/static/graph/1756793046841-2a3f3f47-2880-40f5-bbc0-5abf1ebe133c.webp differ diff --git a/static/graph/1756793046915-da162050-e3d7-4ce6-aaa8-ba40c06f95e1.webp b/static/graph/1756793046915-da162050-e3d7-4ce6-aaa8-ba40c06f95e1.webp new file mode 100644 index 000000000..15668fda9 Binary files /dev/null and b/static/graph/1756793046915-da162050-e3d7-4ce6-aaa8-ba40c06f95e1.webp differ diff --git a/static/graph/1756793047023-6131a211-1165-4df9-83be-0b662b27e74b.png b/static/graph/1756793047023-6131a211-1165-4df9-83be-0b662b27e74b.png new file mode 100644 index 000000000..a04dcf30e Binary files /dev/null and b/static/graph/1756793047023-6131a211-1165-4df9-83be-0b662b27e74b.png differ diff --git a/static/graph/1756793047415-7db75b95-95db-48ff-a05f-361840def4d9.png b/static/graph/1756793047415-7db75b95-95db-48ff-a05f-361840def4d9.png new file mode 100644 index 000000000..5b2027edb Binary files /dev/null and b/static/graph/1756793047415-7db75b95-95db-48ff-a05f-361840def4d9.png differ diff --git a/static/graph/1756793047447-72d09aea-67f0-4e9c-abdf-c8fa84cb9ced.webp b/static/graph/1756793047447-72d09aea-67f0-4e9c-abdf-c8fa84cb9ced.webp new file mode 100644 index 000000000..4f403caad Binary files /dev/null and b/static/graph/1756793047447-72d09aea-67f0-4e9c-abdf-c8fa84cb9ced.webp differ diff --git a/static/graph/1756793098173-58d6a3ba-3daf-4f05-b8e5-696bc3279d79.png b/static/graph/1756793098173-58d6a3ba-3daf-4f05-b8e5-696bc3279d79.png new file mode 100644 index 000000000..05906f92b Binary files /dev/null and b/static/graph/1756793098173-58d6a3ba-3daf-4f05-b8e5-696bc3279d79.png differ diff --git a/static/graph/1756793098209-7021c573-42ab-433d-b50f-1bc3cc0408e5.webp b/static/graph/1756793098209-7021c573-42ab-433d-b50f-1bc3cc0408e5.webp new file mode 100644 index 000000000..778b5e86c Binary files /dev/null and b/static/graph/1756793098209-7021c573-42ab-433d-b50f-1bc3cc0408e5.webp differ diff --git a/static/graph/1756793098220-80f3e4fc-2d81-412d-a716-b3ebcbc0e0c5.webp b/static/graph/1756793098220-80f3e4fc-2d81-412d-a716-b3ebcbc0e0c5.webp new file mode 100644 index 000000000..019504529 Binary files /dev/null and b/static/graph/1756793098220-80f3e4fc-2d81-412d-a716-b3ebcbc0e0c5.webp differ diff --git a/static/graph/1756793098221-c1c1d7c1-7ccf-41db-a22d-2a3636b5c92d.webp b/static/graph/1756793098221-c1c1d7c1-7ccf-41db-a22d-2a3636b5c92d.webp new file mode 100644 index 000000000..f41d6ebd8 Binary files /dev/null and b/static/graph/1756793098221-c1c1d7c1-7ccf-41db-a22d-2a3636b5c92d.webp differ diff --git a/static/graph/1756793098507-d8e6e358-b42f-4fe9-a57b-83ceb1c09294.webp b/static/graph/1756793098507-d8e6e358-b42f-4fe9-a57b-83ceb1c09294.webp new file mode 100644 index 000000000..6ee0d56d1 Binary files /dev/null and b/static/graph/1756793098507-d8e6e358-b42f-4fe9-a57b-83ceb1c09294.webp differ diff --git a/static/graph/1756793098879-738e9387-65e2-4890-b7f1-c0b5c11c0785.webp b/static/graph/1756793098879-738e9387-65e2-4890-b7f1-c0b5c11c0785.webp new file mode 100644 index 000000000..39d0ff200 Binary files /dev/null and b/static/graph/1756793098879-738e9387-65e2-4890-b7f1-c0b5c11c0785.webp differ diff --git a/static/graph/1756793098881-c8d63f64-a55a-4630-860e-a9ac393c8d5f.png b/static/graph/1756793098881-c8d63f64-a55a-4630-860e-a9ac393c8d5f.png new file mode 100644 index 000000000..6aaf3e97e Binary files /dev/null and b/static/graph/1756793098881-c8d63f64-a55a-4630-860e-a9ac393c8d5f.png differ diff --git a/static/graph/1756793098892-0b69647c-ebfa-414a-b035-ee3f968eb739.webp b/static/graph/1756793098892-0b69647c-ebfa-414a-b035-ee3f968eb739.webp new file mode 100644 index 000000000..f403dfabc Binary files /dev/null and b/static/graph/1756793098892-0b69647c-ebfa-414a-b035-ee3f968eb739.webp differ diff --git a/static/graph/1756793098971-c35b69f1-c697-4c03-b0ab-63b07b3056a7.png b/static/graph/1756793098971-c35b69f1-c697-4c03-b0ab-63b07b3056a7.png new file mode 100644 index 000000000..06f4d01fb Binary files /dev/null and b/static/graph/1756793098971-c35b69f1-c697-4c03-b0ab-63b07b3056a7.png differ diff --git a/static/graph/1756793099252-b0c3eda1-6c4a-436c-8fdb-d5dc4d512b07.webp b/static/graph/1756793099252-b0c3eda1-6c4a-436c-8fdb-d5dc4d512b07.webp new file mode 100644 index 000000000..87c1f5629 Binary files /dev/null and b/static/graph/1756793099252-b0c3eda1-6c4a-436c-8fdb-d5dc4d512b07.webp differ diff --git a/static/graph/1756793099460-84ce2b9f-b039-4f4f-92a3-f190fffe0783.webp b/static/graph/1756793099460-84ce2b9f-b039-4f4f-92a3-f190fffe0783.webp new file mode 100644 index 000000000..35327a239 Binary files /dev/null and b/static/graph/1756793099460-84ce2b9f-b039-4f4f-92a3-f190fffe0783.webp differ diff --git a/static/graph/1756793099506-8abd75fa-9af7-45ca-aa38-daa2dbf283c1.webp b/static/graph/1756793099506-8abd75fa-9af7-45ca-aa38-daa2dbf283c1.webp new file mode 100644 index 000000000..02357faf7 Binary files /dev/null and b/static/graph/1756793099506-8abd75fa-9af7-45ca-aa38-daa2dbf283c1.webp differ diff --git a/static/graph/1756793099539-13301d00-9356-4a21-8bb2-93a2b2bcba65.webp b/static/graph/1756793099539-13301d00-9356-4a21-8bb2-93a2b2bcba65.webp new file mode 100644 index 000000000..e41f59d1d Binary files /dev/null and b/static/graph/1756793099539-13301d00-9356-4a21-8bb2-93a2b2bcba65.webp differ diff --git a/static/graph/1756793099797-57cd44ca-c0c2-4ac7-bee7-ee15c954a0ec.webp b/static/graph/1756793099797-57cd44ca-c0c2-4ac7-bee7-ee15c954a0ec.webp new file mode 100644 index 000000000..b3aa0c9b9 Binary files /dev/null and b/static/graph/1756793099797-57cd44ca-c0c2-4ac7-bee7-ee15c954a0ec.webp differ diff --git a/static/graph/1756793099804-10bdba95-7acf-4c2e-b761-35fc69e9d433.webp b/static/graph/1756793099804-10bdba95-7acf-4c2e-b761-35fc69e9d433.webp new file mode 100644 index 000000000..d64196c42 Binary files /dev/null and b/static/graph/1756793099804-10bdba95-7acf-4c2e-b761-35fc69e9d433.webp differ diff --git a/static/graph/1756793100051-eedd3155-4a9a-4da9-914e-dfb51052a80b.webp b/static/graph/1756793100051-eedd3155-4a9a-4da9-914e-dfb51052a80b.webp new file mode 100644 index 000000000..e9d7d4e8c Binary files /dev/null and b/static/graph/1756793100051-eedd3155-4a9a-4da9-914e-dfb51052a80b.webp differ diff --git a/static/graph/1756793100138-fcedef5b-fba1-4134-93da-a097aa788b51.webp b/static/graph/1756793100138-fcedef5b-fba1-4134-93da-a097aa788b51.webp new file mode 100644 index 000000000..1f4bb125f Binary files /dev/null and b/static/graph/1756793100138-fcedef5b-fba1-4134-93da-a097aa788b51.webp differ diff --git a/static/graph/1756793183950-2d44a88b-b840-49f1-998f-c883fc7ceb09.webp b/static/graph/1756793183950-2d44a88b-b840-49f1-998f-c883fc7ceb09.webp new file mode 100644 index 000000000..9beffe1cb Binary files /dev/null and b/static/graph/1756793183950-2d44a88b-b840-49f1-998f-c883fc7ceb09.webp differ diff --git a/static/graph/1756793183977-5952392b-e986-470e-9636-15b7737547a8.webp b/static/graph/1756793183977-5952392b-e986-470e-9636-15b7737547a8.webp new file mode 100644 index 000000000..b6ef9ca50 Binary files /dev/null and b/static/graph/1756793183977-5952392b-e986-470e-9636-15b7737547a8.webp differ diff --git a/static/graph/1756793183977-6437fe99-797d-470b-812f-bd48a39cd685.webp b/static/graph/1756793183977-6437fe99-797d-470b-812f-bd48a39cd685.webp new file mode 100644 index 000000000..df4f9a6c7 Binary files /dev/null and b/static/graph/1756793183977-6437fe99-797d-470b-812f-bd48a39cd685.webp differ diff --git a/static/graph/1756793184025-3b077813-e7af-45d2-8ead-e5764bddf94f.webp b/static/graph/1756793184025-3b077813-e7af-45d2-8ead-e5764bddf94f.webp new file mode 100644 index 000000000..8ecb1d638 Binary files /dev/null and b/static/graph/1756793184025-3b077813-e7af-45d2-8ead-e5764bddf94f.webp differ diff --git a/static/graph/1756793184030-ee4463dc-9aeb-4fd1-beb6-0cdfd088d8ce.webp b/static/graph/1756793184030-ee4463dc-9aeb-4fd1-beb6-0cdfd088d8ce.webp new file mode 100644 index 000000000..d57063ef7 Binary files /dev/null and b/static/graph/1756793184030-ee4463dc-9aeb-4fd1-beb6-0cdfd088d8ce.webp differ diff --git a/static/graph/1756793184729-9360feab-ba39-4c29-b1a7-2fe1c863a0b2.png b/static/graph/1756793184729-9360feab-ba39-4c29-b1a7-2fe1c863a0b2.png new file mode 100644 index 000000000..7fe72dd9b Binary files /dev/null and b/static/graph/1756793184729-9360feab-ba39-4c29-b1a7-2fe1c863a0b2.png differ diff --git a/static/graph/1756793184816-f3a57b58-9324-4dc3-9c52-f7e1b6bee163.png b/static/graph/1756793184816-f3a57b58-9324-4dc3-9c52-f7e1b6bee163.png new file mode 100644 index 000000000..01d50fcb2 Binary files /dev/null and b/static/graph/1756793184816-f3a57b58-9324-4dc3-9c52-f7e1b6bee163.png differ diff --git a/static/graph/1756793184833-dd7ab588-8ff3-480f-9b96-d0aaadc23d30.png b/static/graph/1756793184833-dd7ab588-8ff3-480f-9b96-d0aaadc23d30.png new file mode 100644 index 000000000..3d54f5b59 Binary files /dev/null and b/static/graph/1756793184833-dd7ab588-8ff3-480f-9b96-d0aaadc23d30.png differ diff --git a/static/graph/1756793184848-b374e007-b237-48fe-b346-5b5b9c8c09cd.webp b/static/graph/1756793184848-b374e007-b237-48fe-b346-5b5b9c8c09cd.webp new file mode 100644 index 000000000..fe97adbf1 Binary files /dev/null and b/static/graph/1756793184848-b374e007-b237-48fe-b346-5b5b9c8c09cd.webp differ diff --git a/versions/version-current/docs-cn/source/1.guide.md b/versions/version-current/docs-cn/source/1.guide.md index 64ece56b7..7d7f14b4b 100644 --- a/versions/version-current/docs-cn/source/1.guide.md +++ b/versions/version-current/docs-cn/source/1.guide.md @@ -11,7 +11,7 @@ ## 介绍 -GeaFlow 是蚂蚁集团开源的流图计算引擎,支持万亿级图存储、图表混合处理、实时图计算、交互式图分析等核心能力,目前广泛应用于数仓加速、金融风控、知识图谱以及社交网络等场景。 +Apache GeaFlow(孵化中)是一个分布式流图计算引擎,最初由蚂蚁集团发起,支持万亿级图存储、图表混合处理、实时图计算、交互式图分析等核心能力,目前广泛应用于数仓加速、金融风控、知识图谱以及社交网络等场景。 关于 GeaFlow 更多介绍请参考:[GeaFlow 介绍文档](2.introduction.md)