[PULP-1118] Add better error handling for repover duplicate content#7280
[PULP-1118] Add better error handling for repover duplicate content#7280
Conversation
26e2e75 to
7e9e749
Compare
7e9e749 to
a42e341
Compare
pulpcore/exceptions/base.py
Outdated
| """ | ||
|
|
||
| def __init__(self, duplicate_count: int, correlation_id: str): | ||
| self.dup_count = duplicate_count |
There was a problem hiding this comment.
It would seem like this assumes that a RepositoryVersion creation failed due to duplicates always? Do we want to assume that? Should the error be more specific?
There was a problem hiding this comment.
Yeah, makes sense. I'll make more specific one
| def log_duplicate(pulp_type: str, duplicate: DuplicateEntry): | ||
| keyset_value = duplicate.keyset_value | ||
| duplicate_pks = duplicate.duplicate_pks | ||
| _logger.info(f"Duplicates found: {pulp_type=}; {keyset_value=}; {duplicate_pks=}") |
There was a problem hiding this comment.
Is there are particular reason to separate this into its own function if it's only used in one place?
There was a problem hiding this comment.
Just for the abstraction. At a glance, it's trivial do read the main validate function and understand what it does, and dig into the functions if you are interested in implementation details.
If the call cost is not relevant (i believe it isnt here), then it is just a matter of style. I can undo that, if you prefer..
There was a problem hiding this comment.
It's just a matter of style, but if only one line is doing work here, I'd rather just inline it personally
I do not consider it a blocker if you have strong feelings about it or if there are plans to re-use it in the future, though.
There was a problem hiding this comment.
Ok, I'll inline the log one. I'll keep the count and collect duplicates because it was at least convenient for testing them in isolation.
| def count_duplicates(content_qs, unique_keys: tuple[str]) -> int: | ||
| new_content_total = content_qs.count() | ||
| unique_new_content_total = content_qs.distinct(*unique_keys).count() | ||
| return new_content_total - unique_new_content_total |
There was a problem hiding this comment.
This case is at least more sensible than the other one though, since there's more than one line that's actually doing something
Added a proper error class for duplicate content handling and some more logging to inform exactly what are the conflicting content. Closes: pulp#7184
abc7a12 to
c43e01d
Compare
| keyset_value = duplicate.keyset_value | ||
| duplicate_pks = duplicate.duplicate_pks | ||
| _logger.info(f"Duplicates found: {pulp_type=}; {keyset_value=}; {duplicate_pks=}") | ||
| if dup_count > 0: |
There was a problem hiding this comment.
I think the logic is messed up a bit.
If you have two types in the loop and the first one has duplicates the second is fine, this will not raise.
Added a proper error class for duplicate content handling and some more logging to inform exactly what are the conflicting content.
When duplicates are detected, we do some extra work to collect duplicate content.
A simple performance test shows it's not too bad:
Closes: #7184
📜 Checklist
See: Pull Request Walkthrough