Skip to content

Comments

Implement blkreader crate for direct block device reading via FIEMAP#1

Merged
SF-Zhou merged 6 commits intomainfrom
copilot/implement-blkreader-crate
Jan 17, 2026
Merged

Implement blkreader crate for direct block device reading via FIEMAP#1
SF-Zhou merged 6 commits intomainfrom
copilot/implement-blkreader-crate

Conversation

Copy link
Contributor

Copilot AI commented Jan 17, 2026

  • Create Cargo.toml with library and CLI binary targets
  • Implement core library structure:
    • Define Options struct with configuration options
    • Define State struct with read results and extent info
    • Define BlkReader trait with blk_read_at and blk_read_at_opt methods
    • Use blkmap crate for extent query via FIEMAP ioctl
    • Use blkpath crate for block device discovery
    • Implement global block device cache (resolve_device called inside cache)
    • Implement direct I/O reading from block device
    • Implement trait for Path, PathBuf and File types
  • Implement CLI tool:
    • Parse command line arguments
    • Support --offset, --length, -v, -O options
    • Request sudo permissions (deferred until needed)
    • Output to console or file
    • Handle Direct I/O alignment (buffer, offset, length)
    • Support chunked reading for large files
  • Add comprehensive documentation:
    • Module and crate-level docs
    • README.md with usage examples
    • Document Direct I/O alignment requirements
  • Add unit tests
  • Simplify GitHub CI workflow
  • Run code review and security checks

Changes from code review:

  1. Cache refactoring: get_or_create_cached_device now takes &File and calls resolve_device() internally, so the expensive path resolution is only done when the device is not in cache.

  2. Fallback returns extents: State::fallback() now takes extents parameter, so extent information is available even when using regular file I/O.

  3. Renamed fill_unwritten to zero_unwritten: Better reflects the behavior - when disabled (default), unwritten extents are read from block device returning raw data. When enabled, they're filled with zeros.

Original prompt

我需要实现一个完整可靠、工业级、可公开发布的 Rust crate,crate 名为 blkreader,它可以通过路径或者 fd,查询到对应的 block device,再根据传入的 offset 和 length,查询得到文件 extent 信息,最后通过 extent 信息直接在 block device 上进行 direct io 读取。

  1. 场景:考虑一种 IO 模式,每次写入前已经确认使用 fallocate + fdatasync 申请了一块完整的存储空间,并且将申请的 extent 信息已经确认落盘,那么后续产生的 direct io 一定落在这些 extents 里。但写入时 extent written 状态可能没有来得及落盘而丢失。但我们能知道,block device 里一定有真实可靠的数据,如果我在其他可靠的地方维护了已经写入的长度,那我肯定可以从 block device 里将原始读出来;
  2. blkreader 就是用来实现该目的的 crate。它实现了一个 trait,trait 内提供了两种方法:一种是简单的 fn read_at(&self, buf: &mut [u8], offset: u64) -> io::Result<usize>,另一种是复杂的 fn read_at_opt(&self, buf: &mut [u8], offset: u64, opt: &Options) -> io::Result<State>,返回的 State 里需要包括 block device 的路径、详细的 extents 信息、以及成功读取的长度,输入的 Options 里包括是否全局的启用 block device 缓存、是否对 extents 空洞填 0 等选项。简单的接口内部调用复杂的接口来实现。该 trait 需要对 Path 相关类以及 File 类进行实现。具体的函数和类型命名你可以再斟酌;
  3. 关于全局的 block device 缓存,考虑到同一个文件系统下对应的 block device 都是一致的,可以在全局建立一个哈希表,使用 file.metadata()?.dev() 作为 key,缓存对应的 block device 路径以及打开的文件对象,这样频繁读取时可以获得更好的性能;
  4. 关于是否对 extents 空洞填 0,细分为两个选项,是否为真实的空洞填 0,是否为 UNWRITTEN extent 填 0。如果遇到这两种情况而传入的又是不填 0 时则认为已经 EOF 提前返回;
  5. 选项中再帮我增加一个读取降级选项,如果查询得到的 extent 是完整覆盖的、并且没有 unwritten 的,那么启用该选项时允许直接调用普通文件的 read_at 进行读取,避免需要 root 权限;
  6. 同时提供一个可安装的 cli 工具,使用 blkreader PATH --offset OFFSET --length LENGTH 直接读取文件,后两者均为可选项。启动工具时使用合适的库获取 sudo 权限。需要提供 -v verbose 选项,启用的打印更丰富的信息,包括 block device 路径、逻辑地址区间对应的物理地址区间等。需要提供 -o OUTPUT 选项,不提供时则输出到控制台;
  7. 提供充分的文档注释、README.md、完整的单元测试和 GitHub CI pipeline,总之要达到工业级、可公开发布的水平。

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits January 17, 2026 11:42
Co-authored-by: SF-Zhou <7477599+SF-Zhou@users.noreply.github.com>
Co-authored-by: SF-Zhou <7477599+SF-Zhou@users.noreply.github.com>
Copilot AI changed the title [WIP] Add blkreader crate for direct IO access Implement blkreader crate for direct block device reading via FIEMAP Jan 17, 2026
Copilot AI requested a review from SF-Zhou January 17, 2026 11:50
@SF-Zhou
Copy link
Owner

SF-Zhou commented Jan 17, 2026

@copilot 注意 CLI 工具中,你应该注意内存、offset 和 length 的 direct io 的对齐问题。并且文件可能很大,无法一次性读取完成,你需要考虑分批逐步读取。crate 接口的文档中增加对对齐的要求。另外现有的 CI pipeline 拆分的太细了,可以适当简化,这里有另外一个 CI pipeline 可以参考:

name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  contents: read

env:
  CARGO_TERM_COLOR: always

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v4

    - name: Build & Run
      run: |
        cargo fmt --all -- --check
        cargo clippy -- -D warnings
        cargo test
        cargo run --release -- README.md

Co-authored-by: SF-Zhou <7477599+SF-Zhou@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 17, 2026

@copilot 注意 CLI 工具中,你应该注意内存、offset 和 length 的 direct io 的对齐问题。并且文件可能很大,无法一次性读取完成,你需要考虑分批逐步读取。c...

Fixed in 87b6d91:

  1. Direct I/O alignment: CLI now handles buffer alignment (512 bytes), and adjusts offsets/lengths to aligned boundaries automatically using align_down/align_up helpers.

  2. Chunked reading: Large files are read in 1MB chunks to avoid memory issues. Each chunk uses an aligned buffer allocated via alloc_aligned_buffer.

  3. Documentation: Added "Direct I/O Alignment Requirements" section to lib.rs, reader.rs trait docs, and README.md explaining the 512-byte alignment requirements.

  4. Simplified CI: Consolidated to single job matching your template.

Copy link
Owner

@SF-Zhou SF-Zhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot 按 comment 帮我继续调整下代码

…rename zero_unwritten

Co-authored-by: SF-Zhou <7477599+SF-Zhou@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 17, 2026

@copilot 按 comment 帮我继续调整下代码

All three review comments addressed in 042e3da.

Copilot AI requested a review from SF-Zhou January 17, 2026 14:10
@SF-Zhou SF-Zhou marked this pull request as ready for review January 17, 2026 14:48
@SF-Zhou SF-Zhou merged commit 2a170bd into main Jan 17, 2026
1 check passed
@SF-Zhou SF-Zhou deleted the copilot/implement-blkreader-crate branch January 17, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants