-
Notifications
You must be signed in to change notification settings - Fork 279
Description
Related to #439 but unlike this one, no LV operations were performed, only WinBtrfs shrink + (failed) mount + btrfsck.
WinBtrfs filesystem shrink operation leaves extent references pointing beyond the new device boundary, making the filesystem unmountable and failing btrfs check. The filesystem cannot be repaired without first growing it back beyond the orphaned extents.
Shrink did trigger a balance operation that relocated ~70-80 chunks for each devices but it miscalculates or doesn't verify completion before finalizing the shrink.
The shrink operation completed without error from WinBtrfs's perspective.
Environment
- WinBtrfs version: 1.9
- Windows version: Windows 10
- Filesystem: 2 same size devices in RAID1
Error After Shrink
After shrinking a filesystem from 400GB to 300GB using WinBtrfs:
btrfsck /dev/mapper/vgP4-win10btrfs4p1
Opening filesystem to check...
[1/8] checking log skipped (none written)
[2/8] checking root items
[3/8] checking extents
ERROR: dev extent devid 1 physical offset 324300439552 len 1073741824 is beyond device boundary 324552097792
ERROR: errors found in extent allocation tree or chunk allocation
The error shows:
- Dev extent at offset 324300439552 (302GB) with length 1GB
- Device boundary at 324552097792 (302.2GB)
- Extent extends ~250MB beyond the device boundary
The free space tree is also corrupted with hundreds of missing entries:
there is no free space entry for 1798516736-1799233536
cache appears valid but isn't 1104150528
wanted bytes 1073741824, found 40128512 for off 643201761280
Workaround
I could recover Linux access by:
- Growing the device larger than the orphaned extents (340GB was sufficient)
- Run
btrfs check --repairon Linux, deletes orphaned extent references - Clear space caches:
btrfs rescue clear-space-cache v2 /dev/...
- Mount on Linux with
space_cache=v2to recreate free space tree - Resize properly using Linux btrfs tools:
btrfs fi resize 1:300G /mnt/...
Other than that, the filesystem becomes unmountable on Linux but still works fine on Windows.
Possible causes
I explored a bit in the code and there might be some issues in balance.c where is does not calculate chunk_left properly.
Device removal correctly checks if all chunks were relocated:
if (Vcb->balance.removing) {
...
if (Vcb->balance.chunks_left == 0) {
Status = finish_removing_device(Vcb, dev); // Only if ALL chunks done
} else
dev->reloc = false; // Abort if chunks remainBut shrink does not check chunks_left:
} else if (Vcb->balance.shrinking) {
...
// No check for chunks_left == 0
dev->devitem.num_bytes = Vcb->balance.opts[0].drange_start; // Shrink anyway!
Status = update_dev_item(Vcb, dev, NULL);But this probably covers another issue since there were still extents remaining after the shrink.
- Is balance preventing new allocations in the shrink-ed region if there's activity?
- What happens if chunks are allocated within an extent after balance finishes, would it stay in-use and remain allocated?
Just giving ideas since this is a pretty bad case of corruption to have extents outside the filesystem.