Skip to content

Filesystem shrink leaves extents beyond device boundary #772

@ticpu

Description

@ticpu

Related to #439 but unlike this one, no LV operations were performed, only WinBtrfs shrink + (failed) mount + btrfsck.

WinBtrfs filesystem shrink operation leaves extent references pointing beyond the new device boundary, making the filesystem unmountable and failing btrfs check. The filesystem cannot be repaired without first growing it back beyond the orphaned extents.

Shrink did trigger a balance operation that relocated ~70-80 chunks for each devices but it miscalculates or doesn't verify completion before finalizing the shrink.

The shrink operation completed without error from WinBtrfs's perspective.

Environment

  • WinBtrfs version: 1.9
  • Windows version: Windows 10
  • Filesystem: 2 same size devices in RAID1

Error After Shrink

After shrinking a filesystem from 400GB to 300GB using WinBtrfs:

btrfsck /dev/mapper/vgP4-win10btrfs4p1
Opening filesystem to check...
[1/8] checking log skipped (none written)
[2/8] checking root items
[3/8] checking extents
ERROR: dev extent devid 1 physical offset 324300439552 len 1073741824 is beyond device boundary 324552097792
ERROR: errors found in extent allocation tree or chunk allocation

The error shows:

  • Dev extent at offset 324300439552 (302GB) with length 1GB
  • Device boundary at 324552097792 (302.2GB)
  • Extent extends ~250MB beyond the device boundary

The free space tree is also corrupted with hundreds of missing entries:

there is no free space entry for 1798516736-1799233536
cache appears valid but isn't 1104150528
wanted bytes 1073741824, found 40128512 for off 643201761280

Workaround

I could recover Linux access by:

  1. Growing the device larger than the orphaned extents (340GB was sufficient)
  2. Run btrfs check --repair on Linux, deletes orphaned extent references
  3. Clear space caches:
    btrfs rescue clear-space-cache v2 /dev/...
  4. Mount on Linux with space_cache=v2 to recreate free space tree
  5. Resize properly using Linux btrfs tools:
    btrfs fi resize 1:300G /mnt/...

Other than that, the filesystem becomes unmountable on Linux but still works fine on Windows.

Possible causes

I explored a bit in the code and there might be some issues in balance.c where is does not calculate chunk_left properly.

Device removal correctly checks if all chunks were relocated:

balance.c:3409

if (Vcb->balance.removing) {
    ...
    if (Vcb->balance.chunks_left == 0) {
        Status = finish_removing_device(Vcb, dev);  // Only if ALL chunks done
    } else
        dev->reloc = false;  // Abort if chunks remain

But shrink does not check chunks_left:

balance.c:3450

} else if (Vcb->balance.shrinking) {
    ...
    // No check for chunks_left == 0
    dev->devitem.num_bytes = Vcb->balance.opts[0].drange_start;  // Shrink anyway!
    Status = update_dev_item(Vcb, dev, NULL);

But this probably covers another issue since there were still extents remaining after the shrink.

  • Is balance preventing new allocations in the shrink-ed region if there's activity?
  • What happens if chunks are allocated within an extent after balance finishes, would it stay in-use and remain allocated?

Just giving ideas since this is a pretty bad case of corruption to have extents outside the filesystem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions