-
Notifications
You must be signed in to change notification settings - Fork 463
[FLUSS-2262][lake] Improved Stability for Iceberg Log Table Compaction Test #2265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| checkFileStatusInIcebergTable(t1, 3, false); | ||
|
|
||
| // Ensure tiering job has fully processed the previous writes | ||
| assertReplicaStatus(t1Bucket, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about it. If the files count already is 3 in iceberg, shouldn't it also mean that the tiering has already tiered all records since we only write 3 records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it’s a bit odd. The additional replica assertion seems to alleviate any race conditions within this specific test case that we were seeing arise within the failing CI instances (at least based on my local testing to reproduce the issue).
It may simply be a testing artifact as opposed to a legitimate issue, but if it doesn’t seem like a fix, we can explore a few other avenues. It seemed like one of those common race conditions, so leveraging an existing function to help seemed like a decent approach. Happy to explore some additional avenues though, if we feel that we need a bit more exhaustive checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rionmonster I tried to reporduce it in my local. But fails. I think it'll be better to explore the root cause for I'm afraid of it is caused by another critical issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know! It seemed to be pretty consistent regarding pass/fail on my end, but clearly there’s something else at play.
I’ll do some more exploration and see what I find. Thanks for the feedback!
luoyuxia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rionmonster Thanks for the pr. Only one question
Purpose
Linked issue: close #2262
Per Issue #2262, this pull request addresses a race condition that could sometimes occur and result in the
IcebergRewriteITCase.testLogTableCompactiontest case failing (particularly during CI builds).Brief change log
This change updates the
testLogTableCompactiontest with an additional call to the existingassertReplicaStatus()function to ensure that the associated tiering job had fully processed the previous rewrites to avoid the race condition prior to the subsequent assertions.Tests
The
IcebergRewriteITCase.testLogTableCompactionwas initially updated to use an iterative approach (e.g., repeat 50 times) as mentioned in the original issue to reproduce the issue. After this was repeatably reproducible, the proposed fix was introduced to verify the test would repeatedly pass through all of the iterations successfully and repeatably.API and Format
N/A
Documentation
N/A
Reviewer(s) Requested
@swuferhong (as original reporter), @beryllw (as original author)