Filesystem shrink leaves extents beyond device boundary

Related to #439 but unlike this one, no LV operations were performed, only WinBtrfs shrink + (failed) mount + btrfsck.

WinBtrfs filesystem shrink operation leaves extent references pointing beyond the new device boundary, making the filesystem unmountable and failing `btrfs check`. The filesystem cannot be repaired without first growing it back beyond the orphaned extents.

Shrink did trigger a balance operation that relocated ~70-80 chunks for each devices but it miscalculates or doesn't verify completion before finalizing the shrink.

The shrink operation completed without error from WinBtrfs's perspective.

## Environment

- WinBtrfs version: 1.9
- Windows version: Windows 10
- Filesystem: 2 same size devices in RAID1

## Error After Shrink

After shrinking a filesystem from 400GB to 300GB using WinBtrfs:

```
btrfsck /dev/mapper/vgP4-win10btrfs4p1
Opening filesystem to check...
[1/8] checking log skipped (none written)
[2/8] checking root items
[3/8] checking extents
ERROR: dev extent devid 1 physical offset 324300439552 len 1073741824 is beyond device boundary 324552097792
ERROR: errors found in extent allocation tree or chunk allocation
```

The error shows:
- Dev extent at offset 324300439552 (302GB) with length 1GB
- Device boundary at 324552097792 (302.2GB)
- Extent extends ~250MB beyond the device boundary

The free space tree is also corrupted with hundreds of missing entries:
```
there is no free space entry for 1798516736-1799233536
cache appears valid but isn't 1104150528
wanted bytes 1073741824, found 40128512 for off 643201761280
```

## Workaround

I could recover Linux access by:

1. Growing the device larger than the orphaned extents (340GB was sufficient)
2. Run `btrfs check --repair` on Linux, deletes orphaned extent references
3. Clear space caches:
   ```bash
   btrfs rescue clear-space-cache v2 /dev/...
   ```
4. Mount on Linux with `space_cache=v2` to recreate free space tree
5. Resize properly using Linux btrfs tools:
   ```bash
   btrfs fi resize 1:300G /mnt/...
   ```

Other than that, the filesystem becomes unmountable on Linux but still works fine on Windows.

## Possible causes

I explored a bit in the code and there might be some issues in balance.c where is does not calculate `chunk_left` properly.

Device removal correctly checks if all chunks were relocated:

[`balance.c:3409`](https://github.com/maharmstone/btrfs/blob/b415418815f68f7abdc50fa55e92f7db94732842/src/balance.c#L3409)
```c
if (Vcb->balance.removing) {
    ...
    if (Vcb->balance.chunks_left == 0) {
        Status = finish_removing_device(Vcb, dev);  // Only if ALL chunks done
    } else
        dev->reloc = false;  // Abort if chunks remain
```

But shrink does not check `chunks_left`:

[`balance.c:3450`](https://github.com/maharmstone/btrfs/blob/b415418815f68f7abdc50fa55e92f7db94732842/src/balance.c#L3450)
```c
} else if (Vcb->balance.shrinking) {
    ...
    // No check for chunks_left == 0
    dev->devitem.num_bytes = Vcb->balance.opts[0].drange_start;  // Shrink anyway!
    Status = update_dev_item(Vcb, dev, NULL);
```

But this probably covers another issue since there were still extents remaining after the shrink.
- Is balance preventing new allocations in the shrink-ed region if there's activity?
- What happens if chunks are allocated within an extent after balance finishes, would it stay in-use and remain allocated?

Just giving ideas since this is a pretty bad case of corruption to have extents outside the filesystem.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filesystem shrink leaves extents beyond device boundary #772

Environment

Error After Shrink

Workaround

Possible causes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Filesystem shrink leaves extents beyond device boundary #772

Description

Environment

Error After Shrink

Workaround

Possible causes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions