-
Notifications
You must be signed in to change notification settings - Fork 67
Account for local storage during region allocation #9670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
It was a mistake to not consider the rendezvous_local_storage_dataset size_used when computing `old_zpool_usage` during region allocation! Fix that here. The `LEFT JOIN` and `coalesce` are required for two reasons: - blueprint execution is currently disabled for our tests, so the rendezvous tables won't appear there - deployed systems that update to a Nexus that contains this query may not have had blueprint execution run yet, and it's not a good idea to block crucible disk creation until this occurs.
| builder.sql( | ||
| // Find all old regions associated with a particular volume | ||
| "WITH | ||
| builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should have at least one regression test for this!
| sum(coalesce(rendezvous_local_storage_dataset.size_used, 0)) | ||
| ) AS size_used | ||
| FROM | ||
| crucible_dataset LEFT JOIN rendezvous_local_storage_dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using a "LEFT JOIN" here, we're fully ignoring the local storage dataset in the case where "no crucible datasets have been allocated". That seems wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also, seems unlikely I suppose? I guess all our current configs have "crucible, and may or may not have the rendezvous_local_storage_dataset -- but this still seems like defensive programming worth doing)
| crucible_dataset.size_used IS NOT NULL AND | ||
| crucible_dataset.time_deleted IS NULL | ||
| GROUP BY crucible_dataset.pool_id),", | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this query is incorrectly calculating size, overcounting usage, which will inhibit our ability to actually allocate storage on the rack.
For example, if pool A has:
- 3 crucible_datasets (c1, c2, c3) with sizes 100, 200, 300
- 2 rendezvous_local_storage_datasets (r1, r2) with sizes 50, 60
The JOIN condition is "match on pool ID" -- every crucible dataset will match every local storage dataset, returning 6 rows:
- c1, r1 (100 + 50)
- c1, r2 (100 + 60)
- c2, r1 (200 + 50)
- c2, r2 (200 + 60)
- c3, r1 (300 + 50)
- c3, r2 (300 + 60)
I think the correct answer for space used should actually be: 100 + 200 + 300 + 50 + 60 = 710.
However, your query as-is more than double counts: I think the sum of all the rows in this example is 1530 instead.
Presumably we want to account for "each crucible dataset, once" and "each local storage dataset, once" within each pool, right?
Maybe try separating the two?
old_zpool_usage AS (
SELECT
COALESCE(crucible_usage.pool_id, local_usage.pool_id) AS pool_id,
COALESCE(crucible_usage.size_used, 0) + COALESCE(local_usage.size_used, 0) AS size_used
FROM (
SELECT pool_id, sum(size_used) AS size_used
FROM crucible_dataset
WHERE size_used IS NOT NULL AND time_deleted IS NULL
GROUP BY pool_id
) crucible_usage
FULL OUTER JOIN (
SELECT pool_id, sum(size_used) AS size_used
FROM rendezvous_local_storage_dataset
WHERE time_tombstoned IS NULL
GROUP BY pool_id
) local_usage
ON crucible_usage.pool_id = local_usage.pool_id
) This also should be testable. We should have coverage for cases where we're making an allocation, we need a precise amount of space to fit, and we have...
- A pool with only crucible datasets
- A pool with only local storage
- A pool with both
- A pool with neither
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
welp, I'm not sure how I managed to overlook this - maybe got too in the weeds on this query - but as you pointed out in chat, we really should have "zero or one" of these datasets on a zpool, not multiple.
So although my complaint is accurate in an academic point-of-view, based on the schema, practically, we shouldn't have "a bunch of crucible datasets" all residing on one zpool
It was a mistake to not consider the rendezvous_local_storage_dataset size_used when computing
old_zpool_usageduring region allocation! Fix that here. TheLEFT JOINandcoalesceare required for two reasons:blueprint execution is currently disabled for our tests, so the rendezvous tables won't appear there
deployed systems that update to a Nexus that contains this query may not have had blueprint execution run yet, and it's not a good idea to block crucible disk creation until this occurs.