feat(uptime): Add ability to use queues to manage parallelism#2
Open
adamsaimi wants to merge 1 commit intokafka-consumer-parallel-beforefrom
Open
feat(uptime): Add ability to use queues to manage parallelism#2adamsaimi wants to merge 1 commit intokafka-consumer-parallel-beforefrom
adamsaimi wants to merge 1 commit intokafka-consumer-parallel-beforefrom
Conversation
One potential problem we have with batch processing is that any one slow item will clog up the whole batch. This pr implements a queueing method instead, where we keep N queues that each have their own workers. There's still a chance of individual items backlogging a queue, but we can try increased concurrency here to reduce the chances of that happening <!-- Describe your PR here. -->
adamsaimi
commented
Oct 25, 2025
| """ | ||
|
|
||
| def __init__(self) -> None: | ||
| self.all_offsets: dict[Partition, set[int]] = defaultdict(set) |
Owner
Author
There was a problem hiding this comment.
[Performance] Unbounded Offset Tracking Sets
- Problem: The
OffsetTrackerstores all observed and outstanding offsets in sets (all_offsets,outstanding) without size limits, leading to excessive memory consumption and potential OOM errors. - Fix: Implement a bounded size or an eviction policy for these sets to prevent unbounded memory growth.
adamsaimi
commented
Oct 25, 2025
| while not self.shutdown: | ||
| try: | ||
| work_item = self.work_queue.get() | ||
| except queue.ShutDown: |
Owner
Author
There was a problem hiding this comment.
[Bug] Incorrect Queue Shutdown Mechanism
- Problem: The
OrderedQueueWorkeruses a non-standardqueue.ShutDownexception andFixedQueuePoolcalls a non-existentshutdownmethod, preventing graceful worker termination and causing resource leaks. - Fix: Replace
queue.ShutDownwith a sentinel value or usequeue.Emptywith a timeout and aself.shutdownflag; remove the invalidq.shutdowncall and ensure workers are joined.
adamsaimi
commented
Oct 25, 2025
| start = max(last_committed + 1, min_offset) | ||
|
|
||
| highest_committable = last_committed | ||
| for offset in range(start, max_offset + 1): |
Owner
Author
There was a problem hiding this comment.
[Performance] Inefficient Linear Scan for Committable Offsets
- Problem: The
get_committable_offsetsfunction uses a linear scan, which can be a performance bottleneck for large gaps betweenlast_committedandmax_offset. - Fix: Optimize by tracking contiguous blocks of completed offsets or using a data structure for faster identification.
adamsaimi
commented
Oct 25, 2025
| with self._get_partition_lock(partition): | ||
| self.last_committed[partition] = offset | ||
| # Remove all offsets <= committed offset | ||
| self.all_offsets[partition] = {o for o in self.all_offsets[partition] if o > offset} |
Owner
Author
There was a problem hiding this comment.
[Performance] Inefficient Set Reconstruction for Offset Cleanup
- Problem: Reconstructing
self.all_offsets[partition]by iterating and filtering is computationally expensive for large sets, impacting commit loop performance. - Fix: Explore alternative data structures or methods for more efficient range-based deletion or filtering if
all_offsetsis expected to be large.
adamsaimi
commented
Dec 23, 2025
| self.workers: list[OrderedQueueWorker[T]] = [] | ||
|
|
||
| for i in range(num_queues): | ||
| work_queue: queue.Queue[WorkItem[T]] = queue.Queue() |
Owner
Author
There was a problem hiding this comment.
[Performance] Unbounded Queues Cause Resource Exhaustion
- Problem:
queue.Queue()instances without amaxsizecan grow indefinitely, leading to excessive memory consumption and OutOfMemory errors. - Fix: Define a reasonable
maxsizefor queues to prevent memory exhaustion and provide natural backpressure.
pythonwork_queue: queue.Queue[WorkItem[T]] = queue.Queue(maxsize=some_configurable_limit)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Test 9