Skip to content

Replace cron with in-script scheduler#25

Merged
justinkumpe merged 11 commits intomasterfrom
PreProd
Jun 28, 2025
Merged

Replace cron with in-script scheduler#25
justinkumpe merged 11 commits intomasterfrom
PreProd

Conversation

@justinkumpe
Copy link
Member

@justinkumpe justinkumpe commented Jun 28, 2025

Summary by Sourcery

Replace cron-based scheduling with an in-script scheduler, add graceful shutdown handling, and harden the Docker container by running as a non-root user.

New Features:

  • Schedule fetch_emails to run every hour using the schedule library.
  • Introduce SIGINT and SIGTERM handlers to enable graceful shutdown.

Enhancements:

  • Eliminate reliance on external cron jobs in favor of internal scheduling.

Build:

  • Update Dockerfile to run the application as a non-root user and adjust directory ownership.

Chores:

  • Add schedule==1.2.2 to project dependencies.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jun 28, 2025

Reviewer's Guide

Embed hourly email fetching and graceful shutdown directly in the script, add the schedule library dependency, and update the Dockerfile to run the container as a non-root user.

Sequence diagram for in-script email fetching and graceful shutdown

sequenceDiagram
    participant Script as get_emails.py
    participant Scheduler as schedule library
    participant Signal as OS Signal (SIGINT/SIGTERM)
    participant Email as fetch_emails()

    Script->>Scheduler: schedule hourly fetch_emails
    loop While not shutdown
        Scheduler->>Email: run fetch_emails if scheduled
    end
    Signal-->>Script: SIGINT/SIGTERM received
    Script->>Script: Set shutdown=True
    Script->>Scheduler: Stop scheduling
    Script->>Script: Log "Shipping Bot stopped."
Loading

File-Level Changes

Change Details Files
Embed hourly in-script scheduler for email fetching
  • Import and use the 'schedule' library
  • Schedule 'fetch_emails' to run every hour
  • Implement loop to run pending jobs and sleep based on idle_seconds()
get_emails.py
Implement graceful shutdown on SIGINT/SIGTERM
  • Import 'signal' and define 'handle_shutdown' to set a shutdown flag
  • Register handlers for SIGINT and SIGTERM
  • Use shutdown flag to exit the scheduling loop and log stop message
get_emails.py
Configure container to run as non-root user
  • Add 'USER non-root' directive to Dockerfile
  • Change ownership of '/app' to non-root before startup
Dockerfile
Add 'schedule' library to project dependencies
  • Include 'schedule==1.2.2' in requirements.txt
requirements.txt

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai bot changed the title @sourcery-ai title Replace cron with in-script scheduler Jun 28, 2025
sourcery-ai[bot]
sourcery-ai bot previously requested changes Jun 28, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @justinkumpe - I've reviewed your changes and they look great!

Blocking issues:

  • time.sleep() call; did you mean to leave this in? (link)
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `Dockerfile:9` </location>
<code_context>
 # Note: If additional files required for dependency installation (e.g., setup.py, pyproject.toml) are needed,
 # copy them here to maintain efficient caching.
 RUN pip install --no-cache-dir -r requirements.txt
 RUN service cron start
 COPY . /app
-RUN chmod 0644 /etc/cron.d/my_cron
</code_context>

<issue_to_address>
Starting the cron service is now redundant.

This line can be removed to simplify the Docker image.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
RUN pip install --no-cache-dir -r requirements.txt
RUN service cron start
COPY . /app
=======
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
>>>>>>> REPLACE

</suggested_fix>

## Security Issues

### Issue 1
<location> `get_emails.py:342` </location>

<issue_to_address>
**security (opengrep-rules.python.lang.best-practice.arbitrary-sleep):** time.sleep() call; did you mean to leave this in?

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@justinkumpe
Copy link
Member Author

@sourcery-ai review

sourcery-ai[bot]
sourcery-ai bot previously requested changes Jun 28, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @justinkumpe - I've reviewed your changes and they look great!

Blocking issues:

  • By not specifying a USER, a program in the container may run as 'root'. This is a security hazard. If an attacker can control a process running as root, they may have control over the container. Ensure that the last USER in a Dockerfile is a USER other than 'root'. (link)
  • time.sleep() call; did you mean to leave this in? (link)
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `get_emails.py:339` </location>
<code_context>
 if __name__ == "__main__":
     logger.info("Initializing Shipping Bot")
-    fetch_emails()
+    schedule.every().hour.do(fetch_emails)
+    while True:
+        schedule.run_pending()
+        time.sleep(20) # Sleep for 20 seconds before checking again
</code_context>

<issue_to_address>
Consider graceful shutdown handling for the infinite loop.

The loop should handle termination signals to ensure resources are properly cleaned up, especially in production or Docker environments.

Suggested implementation:

```python
import pymysql
import signal
import sys
from params import Params
from pirate_ship import scrape_pirateship

```

```python
shutdown = False

def handle_shutdown(signum, frame):
    global shutdown
    logger.info(f"Received signal {signum}, shutting down gracefully...")
    shutdown = True

if __name__ == "__main__":
    signal.signal(signal.SIGINT, handle_shutdown)
    signal.signal(signal.SIGTERM, handle_shutdown)
    logger.info("Initializing Shipping Bot")
    schedule.every().hour.do(fetch_emails)
    while not shutdown:
        schedule.run_pending()
        time.sleep(20) # Sleep for 20 seconds before checking again
    logger.info("Shipping Bot stopped.")

```
</issue_to_address>

## Security Issues

### Issue 1
<location> `Dockerfile:9` </location>

<issue_to_address>
**security (opengrep-rules.dockerfile.security.missing-user):** By not specifying a USER, a program in the container may run as 'root'. This is a security hazard. If an attacker can control a process running as root, they may have control over the container. Ensure that the last USER in a Dockerfile is a USER other than 'root'.

```suggestion
USER non-root
CMD ["python", "get_emails.py"]
```

*Source: opengrep*
</issue_to_address>

### Issue 2
<location> `get_emails.py:342` </location>

<issue_to_address>
**security (opengrep-rules.python.lang.best-practice.arbitrary-sleep):** time.sleep() call; did you mean to leave this in?

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@justinkumpe
Copy link
Member Author

@sourcery-ai review

sourcery-ai[bot]
sourcery-ai bot previously requested changes Jun 28, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @justinkumpe - I've reviewed your changes and they look great!

Blocking issues:

  • time.sleep() call; did you mean to leave this in? (link)
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `get_emails.py:353` </location>
<code_context>
     logger.info("Initializing Shipping Bot")
-    fetch_emails()
+    schedule.every().hour.do(fetch_emails)
+    while not shutdown:
+        schedule.run_pending()
+        time.sleep(20)  # Sleep for 20 seconds before checking again
</code_context>

<issue_to_address>
Consider reducing the sleep interval for more responsive shutdown.

A 20-second sleep may slow shutdown. Use a shorter interval or schedule.idle_seconds() for quicker response.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
    while not shutdown:
        schedule.run_pending()
        time.sleep(20)  # Sleep for 20 seconds before checking again
=======
    while not shutdown:
        schedule.run_pending()
        idle = schedule.idle_seconds()
        # Sleep for the lesser of idle time or 1 second for responsive shutdown
        time.sleep(min(idle if idle is not None else 1, 1))
>>>>>>> REPLACE

</suggested_fix>

## Security Issues

### Issue 1
<location> `get_emails.py:355` </location>

<issue_to_address>
**security (opengrep-rules.python.lang.best-practice.arbitrary-sleep):** time.sleep() call; did you mean to leave this in?

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
@justinkumpe
Copy link
Member Author

@sourcery-ai dismiss @sourcery-ai review

@sourcery-ai sourcery-ai bot dismissed stale reviews from themself June 28, 2025 20:27

Automated Sourcery review dismissed.

@justinkumpe
Copy link
Member Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @justinkumpe - I've reviewed your changes - here's some feedback:

  • The Dockerfile removed the CMD instruction without adding a new ENTRYPOINT/CMD, so the container won’t actually launch your script on startup.
  • Since you removed the initial fetch_emails() call, your first run could be delayed up to an hour—consider invoking the job once at startup or using schedule.run_all().
  • Wrap your scheduled fetch_emails() invocation in a try/except to catch and log errors so the scheduler doesn’t silently stop on exceptions.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The Dockerfile removed the CMD instruction without adding a new ENTRYPOINT/CMD, so the container won’t actually launch your script on startup.
- Since you removed the initial fetch_emails() call, your first run could be delayed up to an hour—consider invoking the job once at startup or using schedule.run_all().
- Wrap your scheduled fetch_emails() invocation in a try/except to catch and log errors so the scheduler doesn’t silently stop on exceptions.

## Individual Comments

### Comment 1
<location> `get_emails.py:352` </location>
<code_context>
+    signal.signal(signal.SIGTERM, handle_shutdown)
     logger.info("Initializing Shipping Bot")
-    fetch_emails()
+    schedule.every().hour.do(fetch_emails)
+    while not shutdown:
+        schedule.run_pending()
</code_context>

<issue_to_address>
Potential for missed executions if fetch_emails takes longer than the interval.

If fetch_emails exceeds one hour, scheduled runs may be skipped. Evaluate if this behavior is acceptable, or if you need to handle overlapping or queued jobs.
</issue_to_address>

### Comment 2
<location> `Dockerfile:9` </location>
<code_context>

-CMD [ "python", "get_emails.py" ]
\ No newline at end of file
+USER non-root
+CMD ["python", "get_emails.py"]
</code_context>

<issue_to_address>
Switching to a non-root user is a good security practice, but ensure permissions are set correctly.

Ensure /app and any writable files are owned by the non-root user to prevent runtime permission errors.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@justinkumpe
Copy link
Member Author

@sourcery-ai dismiss

@justinkumpe
Copy link
Member Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @justinkumpe - I've reviewed your changes - here's some feedback:

  • Wrap the scheduled fetch_emails job in a try/except block so that an exception in one run doesn’t prevent future executions.
  • Move the chown step before switching to USER non-root in the Dockerfile so that file permissions can actually be changed.
  • Replace the global shutdown flag with a threading.Event or similar construct to avoid using globals for lifecycle management.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Wrap the scheduled fetch_emails job in a try/except block so that an exception in one run doesn’t prevent future executions.
- Move the chown step before switching to USER non-root in the Dockerfile so that file permissions can actually be changed.
- Replace the global shutdown flag with a threading.Event or similar construct to avoid using globals for lifecycle management.

## Individual Comments

### Comment 1
<location> `get_emails.py:352` </location>
<code_context>
+    signal.signal(signal.SIGTERM, handle_shutdown)
     logger.info("Initializing Shipping Bot")
-    fetch_emails()
+    schedule.every().hour.do(fetch_emails)
+    while not shutdown:
+        schedule.run_pending()
</code_context>

<issue_to_address>
Potential for missed executions if fetch_emails takes longer than an hour.

Consider whether job overlap or missed runs are acceptable, or if you need to implement job queuing or prevent overlapping executions.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

signal.signal(signal.SIGTERM, handle_shutdown)
logger.info("Initializing Shipping Bot")
fetch_emails()
schedule.every().hour.do(fetch_emails)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Potential for missed executions if fetch_emails takes longer than an hour.

Consider whether job overlap or missed runs are acceptable, or if you need to implement job queuing or prevent overlapping executions.

@justinkumpe justinkumpe merged commit a02eb33 into master Jun 28, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant