Skip to content

Conversation

@skumargupta83
Copy link
Contributor

@skumargupta83 skumargupta83 commented Dec 1, 2025

https://2u-internal.atlassian.net/browse/PROD-4437
This PR introduces a backoff exception decorator to handle transient API failures gracefully.

Title: [course-discovery] [edx-ingest-getsmarter-data] fetch_getsmarter_products() should retry

Key details Description

The Problem
The course-discovery k8s job edx-ingest-getsmarter-data regularly fails on networking errors and server errors when calling GetSmarter APIs. The k8s job itself is configured to restart the job when this occurs, but this starts the job from the very beginning and pollutes our job logs and job history with failed pods, and makes the job take hours longer than otherwise needed.

@skumargupta83 skumargupta83 changed the title Prod 4437 edx ingest getsmarter data cron job chore: Added backoff exception Dec 1, 2025
@skumargupta83 skumargupta83 force-pushed the PROD-4437-edx-ingest-getsmarter-data-cron-job branch from 3fca698 to 6465add Compare December 1, 2025 10:07
@skumargupta83 skumargupta83 force-pushed the PROD-4437-edx-ingest-getsmarter-data-cron-job branch from 6465add to bf7aaab Compare December 1, 2025 10:32
@skumargupta83 skumargupta83 force-pushed the PROD-4437-edx-ingest-getsmarter-data-cron-job branch from bf7aaab to de3d12c Compare December 1, 2025 10:42
@skumargupta83
Copy link
Contributor Author

@UsamaSadiq @openedx/committers-course-discovery could anyone please check and help review/merge this PR

@UsamaSadiq
Copy link
Member

@skumargupta83 I can't access the linked ticket.
The best practice for any upstream feature request is to copy the ticket details on the Pull Request instead of linking the 2U tickets since those won't be visible to community reviewers.
Once you add the details on why this change is needed in upstream, I'll review and merge it or escalate it to any concerned party if needed. Thanks.

@skumargupta83
Copy link
Contributor Author

Thank you for flagging this. I’ve added the description — please review and let me know if you need anything further.
Capture01

@skumargupta83
Copy link
Contributor Author

HI @UsamaSadiq,
As per your comment, I’ve added the description. Kindly review it.

@skumargupta83
Copy link
Contributor Author

@UsamaSadiq @openedx/committers-course-discovery
could anyone please check and help review/merge this PR ?

@skumargupta83
Copy link
Contributor Author

@UsamaSadiq @openedx/committers-course-discovery
Kindly request please check and help review/merge this PR ?

@julrusak
Copy link

julrusak commented Dec 8, 2025

Hi @UsamaSadiq - thanks for the original feedback on this PR. The ticket is now posted - anything else you need from the team?

@UsamaSadiq
Copy link
Member

@skumargupta83 My concern is more towards if this change is really needed for OpenedX community? The code seems only related to a fix of an internal edX job.
Could you share some further context on why do you think this should be added in the upstream code? or is there any issue posted by community where this issue was faced?

@skumargupta83
Copy link
Contributor Author

Hi @UsamaSadiq,
This is not a community‑reported issue, it was suggested by @pwnage101 to add a Bakeoff exception. Please see the previous PR (#4681)
for more details. This change enhances error handling and helps prevent excessive retries

@UsamaSadiq
Copy link
Member

Hi @UsamaSadiq, This is not a community‑reported issue, it was suggested by @pwnage101 to add a Bakeoff exception. Please see the previous PR (#4681) for more details. This change enhances error handling and helps prevent excessive retries

Thanks for the context. @pwnage101 would be a better reviewer for this change. I'll help out in merging this PR if needed once it is approved by Troy.

@skumargupta83
Copy link
Contributor Author

Hi @pwnage101,
Kindly request please check and help review/merge this PR ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants