-
-
Notifications
You must be signed in to change notification settings - Fork 10
Implement global concurrency per job/partition #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
chancy/app.py
Outdated
| job = job.job | ||
| if job.concurrency_key: | ||
| cursor.execute( | ||
| self._push_concurrency_config_sql(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got some concerns about doing this per-job - if users are doing this the "right" way they may be batching tens of thousands of jobs at a time, which is going to issue an extra query per job with a concurrency_key. IMO we should be gathering up all the unique concurrency keys and issuing 1 bulk upsert.
Alternatively, and which might be better, we should require the user to call a function to push a concurrency rule before they take effect - a concurrency_key with no rule set would just be ignored when fetching jobs. This way we can remove this upsert here entirely and never have to process individual jobs as they get inserted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right for the bulk insert, would be easy to do. Should we insert the jobs as one bulk insert as well ?
I went on this implementation because I found the API simpler for the user but maybe you're wright that might be clearer to push ahead of time a concurrency rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The think is, having a table with concurrency keys as rows that can be locked the time to fetch jobs in workers looks like the easiest way to implement concurrency and avoid race conditions. On job push is the first moment we know the full concurrency key when using a partition id.
We could use a trigger too if we do not support methods as concurrency key. If we use a bulk insert for jobs, the trigger would be more performant as it avoid a round trip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented deduplication of concurrency key and bulk insert. Performance should be fine even when inserting a high number of jobs.
docs/howto/jobs.rst
Outdated
| Concurrency | ||
| ----------------------- | ||
|
|
||
| Control the number of jobs that can run simultaneously using concurrency: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably reword this a bit to make it clear immediately that this impacts global concurrency. I see you've got the note down below but it's the core reason to use this. Something like:
"Control the number of jobs with the same concurrency key that can run simultaneously across all workers and queues using with_concurrency():"?
chancy/migrations/v6.py
Outdated
| await cursor.execute( | ||
| sql.SQL( | ||
| """ | ||
| CREATE TABLE {concurrency_configs} ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about high cardinality in the concurrency_key (like <func_name>_<user ID>), it might quickly result in a lot of rows. But, that gotcha might just need to be documented to be A-OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the table to increase infinitely, we could have the pruner plugin to clean this table on a regular basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have extended the pruner plugin to prune new concurrency_configs table.
|
Update: work is still ongoing. Need to find some time. |
9e04bf0 to
92fa930
Compare
|
Hey @TkTech, I have found the time to go on with the work after thinking extensively about it. Tradeoffs I did:
None of these 2 alternatives looked easy and fit into Chancy current architecture, so I decided to go for resolving concurrency limits when fetching jobs. It has downsides for sure:
Please tell me what you think about this work and if you could see it in chancy in the future ? Thanks again for all the work on chancy ! EDIT: |
Hey @TkTech,
I worked on an implementation to enforce global concurrency by job and partition before the case of rate limiting as it is more useful for my team and it looks easier.
I tried designing a SQL query in
fetch_jobsthat respects global concurrency and would still be performant. It is not perfect though as there is the edge case where a lot of jobs subject to concurrency constraint could block a queue but it gives us ground for discussion.I liked your idea of faking it using temporary queues too. Maybe we could use this for rate limiting ? I have the intuition we can solve the concurrency problem in SQL and stay performant.