Replies: 5 comments 5 replies
-
|
很有必要。 |
Beta Was this translation helpful? Give feedback.
-
|
Topic:资源文件的变更历史是否需要完整记录?或者近几次的变更记录,这样能给Ai提供一些数据修改的历史信息。 |
Beta Was this translation helpful? Give feedback.
-
|
“”“ |
Beta Was this translation helpful? Give feedback.
-
|
增量更新会考虑通过 Unified Diff 文件直接Patch吗? |
Beta Was this translation helpful? Give feedback.
-
|
感谢 @myysy 这个高质量的 RFC,设计考虑得很全面。作为一个日常跑 400+ sessions 的重度用户(也是 PR #297 cold/hot lifecycle 和 #322 is_healthy 的贡献者),从实战角度补充几个想法。 1. 关于开放问题 #4:Session-Resource 版本绑定这是我最关心的点。我的场景中,定时任务会触发资源更新,同时有多个 agent session 正在检索。如果更新发生在 session mid-flight,引用的内容基础可能不一致。 建议引入轻量的版本绑定机制:
这和 PR #297 的 hotness 概念可以复用——被 session 引用的旧版本视为 hot,无引用的降级为 cold → GC 候选。实现上可以先做 TTL 兜底(比如旧版本保留 30min),引用计数作为后续优化。 2. 补充 @fyp711 关于 Unified Diff 的讨论同意 myysy 的判断,目录级别的 L0/L1 级联依赖让纯 Patch 模式不够用。不过换个角度看,Unified Diff 和 Bubble-Up 其实可以互补——Diff 解决「快速定位变化」,Bubble-Up 解决「摘要级联重算」。 具体来说,可以在 diff 阶段引入 Merkle Tree 式的目录级哈希:自顶向下快速跳过未变化的子树,只对变化部分递归到文件级。这和 Bubble-Up 的自底向上重算方向刚好相反,组合起来能把大仓库的 diff + 摘要更新从 O(n) 压到 O(changed)。 3. 建议
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
RFC:资源的增量更新(Incremental Resource Update)
概要
OpenViking 当前在执行
add_resource(资源添加/索引)时,往往需要对整个资源进行重新解析、摘要与向量化。对于大型仓库/文档库而言,这会带来显著的成本与等待时间。本文提出:当用户对同一个目标资源 URI 再次执行
add_resource时,系统自动切换为“增量更新”模式:通过内容哈希识别变化,复用未变化文件的既有摘要与向量,仅对新增/变更内容重新生成摘要与向量,并以原子方式发布新版本,确保不会出现“部分更新可见”的中间态。本 RFC 面向社区讨论,重点聚焦行为约束、关键机制、失败语义与开放问题。
动机与价值
目标
add_resource触发增量更新,而不是全量重建。非目标(本提案明确不做)
.meta)来存储 hash(可作为后续优化)。术语
viking://resources/my-repo。viking://temp/my-repo_update_<timestamp>。.abstract.md:目录的 L0 摘要(更抽象、更短)。.overview.md:目录的 L1 概览(更结构化、更展开)。用户体验(CLI / API)
CLI 行为(示例)
支持
--wait:本 RFC 建议
--wait的语义为:HTTP API(契约级)
POST /resources/add请求体(JSON):
path:本地路径或 URLtarget:目标资源 URI;若已存在触发更新reason:更新原因(可选,便于审计/追踪)instruction:处理指令(可选)wait:是否等待入队确认(默认 false)timeout:等待超时秒数(可选)响应建议:
200:{ status: "success", result: { root_uri, queue_status? } }409:资源锁冲突(同 URI 已有更新进行中)400:参数不合法500:内部错误核心设计
1)整体流程(高层)
对“更新同一目标 URI”的一次请求,推荐的生命周期如下:
2)变更检测(Diff)
检测维度:
分类规则:
哈希计算建议采用分块读取以适配大文件。
3)摘要与向量复用策略
注意:目录级
.abstract.md/.overview.md是落盘文件;未变化目录可通过从旧版本复制这两个文件到暂存区来复用,避免额外 LLM 调用。4)目录摘要更新:变更向上冒泡
规则:
.abstract.md与.overview.md必须重新生成。.abstract.md与.overview.md。实现上更易落地的方式:
5)并发与锁
锁粒度:资源 URI 级别(例如
viking://resources/my-repo)。锁机制建议:
.resource.lock)锁覆盖范围:
6)原子发布与一致性
发布必须尽量保证“对外可见版本”要么是旧版本,要么是新版本,不出现混合。
推荐策略:
该顺序的取舍:
7)失败语义与恢复
Fail-fast 原则:
特殊故障:文件系统切换成功后,索引更新失败
.corrupted标记文件8)可观测性(建议)
为了让社区能更容易定位问题、衡量收益,建议最少提供:
数据与存储布局(概念)
资源目录(示例):
上下文/向量记录(概念):
abstract字段可用于存储文件摘要(便于复用)parent_uri用于重建层级content_hash可作为未来优化字段(本轮可先“现算现比”)兼容性与迁移
add_resource入口同时支持新增与增量更新。安全与权限(讨论点)
开放问题(需要社区讨论)
--wait的最终语义是否需要扩展?--wait=complete)讨论引导(希望社区反馈)
English Version
RFC: Incremental Resource Update
Summary
Today, running
add_resourcein OpenViking often implies re-processing an entire resource: parsing, summarization, and embedding. For large repositories or document trees, this is expensive and slow.This RFC proposes that when a user calls
add_resourceagain for an already-existing target resource URI, the system switches to an incremental update mode: detect changes via content hashing, reuse existing summaries and vectors for unchanged files, re-summarize and re-embed only new/changed content, and publish the new version atomically to avoid partially-visible updates.This document is written for community discussion and includes the key behaviors, constraints, failure semantics, and open questions directly in the RFC.
Motivation
Goals
add_resourceperforms an incremental update instead of a full rebuild.Non-goals
.meta) in this iteration (can be a follow-up optimization).Terminology
viking://resources/my-repo.viking://temp/my-repo_update_<timestamp>..abstract.md: L0 directory summary (shorter, more abstract).overview.md: L1 directory overview (more structured/expanded)UX (CLI / API)
CLI behavior (example)
--waitsupport:This RFC recommends
--waitsemantics as:HTTP API (contract-level)
POST /resources/addRequest body (JSON):
path: local path or URLtarget: target resource URI; if it exists, triggers updatereason: optional reason for add/update (audit/traceability)instruction: optional processing instructionswait: boolean, default falsetimeout: optional seconds for the wait operationRecommended responses:
200:{ status: "success", result: { root_uri, queue_status? } }409: resource lock conflict (an update is already in progress)400: invalid request500: internal errorProposed Design
1) End-to-end lifecycle (high level)
For an update request targeting an existing resource URI:
2) Diff calculation
Dimensions:
Classification:
Hashing should be streaming/chunked for large files.
3) Summary and vector reuse
Directory summaries are persisted as files (
.abstract.md,.overview.md). For unchanged directories, copying these files from the previous version into staging enables reuse without LLM calls.4) Directory summary regeneration: recursive bubble-up
Rules:
.abstract.mdand.overview.md..abstract.mdand.overview.md.Practical approach:
5) Concurrency and locking
Lock scope: per resource URI (e.g.,
viking://resources/my-repo).Recommended lock mechanism:
.resource.lock) at the resource rootLock duration:
6) Atomic publish and consistency
Publish should make the externally visible state either “old version” or “new version”, not a mix.
Recommended strategy:
Trade-offs:
7) Failure semantics and recovery
Fail-fast principle:
Special failure: filesystem swap succeeds, then vector index activation fails
.corruptedmarker file at the resource root8) Observability (recommended)
Minimum recommended signals:
Data and storage layout (conceptual)
Resource directory:
Context/vector records (conceptual):
abstractcan store file summaries for reuseparent_urireconstructs hierarchycontent_hashis a potential future optimization field (this iteration can hash on the fly)Compatibility
add_resourcehandles both create and update.Security and permissions (discussion points)
Open questions for the community
--waitsupport additional modes beyond “enqueued” (e.g.,--wait=complete)?Feedback prompts
Beta Was this translation helpful? Give feedback.
All reactions