Replace cron with in-script scheduler#25
Conversation
Reviewer's GuideEmbed hourly email fetching and graceful shutdown directly in the script, add the schedule library dependency, and update the Dockerfile to run the container as a non-root user. Sequence diagram for in-script email fetching and graceful shutdownsequenceDiagram
participant Script as get_emails.py
participant Scheduler as schedule library
participant Signal as OS Signal (SIGINT/SIGTERM)
participant Email as fetch_emails()
Script->>Scheduler: schedule hourly fetch_emails
loop While not shutdown
Scheduler->>Email: run fetch_emails if scheduled
end
Signal-->>Script: SIGINT/SIGTERM received
Script->>Script: Set shutdown=True
Script->>Scheduler: Stop scheduling
Script->>Script: Log "Shipping Bot stopped."
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey @justinkumpe - I've reviewed your changes and they look great!
Blocking issues:
- time.sleep() call; did you mean to leave this in? (link)
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `Dockerfile:9` </location>
<code_context>
# Note: If additional files required for dependency installation (e.g., setup.py, pyproject.toml) are needed,
# copy them here to maintain efficient caching.
RUN pip install --no-cache-dir -r requirements.txt
RUN service cron start
COPY . /app
-RUN chmod 0644 /etc/cron.d/my_cron
</code_context>
<issue_to_address>
Starting the cron service is now redundant.
This line can be removed to simplify the Docker image.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
RUN pip install --no-cache-dir -r requirements.txt
RUN service cron start
COPY . /app
=======
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
>>>>>>> REPLACE
</suggested_fix>
## Security Issues
### Issue 1
<location> `get_emails.py:342` </location>
<issue_to_address>
**security (opengrep-rules.python.lang.best-practice.arbitrary-sleep):** time.sleep() call; did you mean to leave this in?
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey @justinkumpe - I've reviewed your changes and they look great!
Blocking issues:
- By not specifying a USER, a program in the container may run as 'root'. This is a security hazard. If an attacker can control a process running as root, they may have control over the container. Ensure that the last USER in a Dockerfile is a USER other than 'root'. (link)
- time.sleep() call; did you mean to leave this in? (link)
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `get_emails.py:339` </location>
<code_context>
if __name__ == "__main__":
logger.info("Initializing Shipping Bot")
- fetch_emails()
+ schedule.every().hour.do(fetch_emails)
+ while True:
+ schedule.run_pending()
+ time.sleep(20) # Sleep for 20 seconds before checking again
</code_context>
<issue_to_address>
Consider graceful shutdown handling for the infinite loop.
The loop should handle termination signals to ensure resources are properly cleaned up, especially in production or Docker environments.
Suggested implementation:
```python
import pymysql
import signal
import sys
from params import Params
from pirate_ship import scrape_pirateship
```
```python
shutdown = False
def handle_shutdown(signum, frame):
global shutdown
logger.info(f"Received signal {signum}, shutting down gracefully...")
shutdown = True
if __name__ == "__main__":
signal.signal(signal.SIGINT, handle_shutdown)
signal.signal(signal.SIGTERM, handle_shutdown)
logger.info("Initializing Shipping Bot")
schedule.every().hour.do(fetch_emails)
while not shutdown:
schedule.run_pending()
time.sleep(20) # Sleep for 20 seconds before checking again
logger.info("Shipping Bot stopped.")
```
</issue_to_address>
## Security Issues
### Issue 1
<location> `Dockerfile:9` </location>
<issue_to_address>
**security (opengrep-rules.dockerfile.security.missing-user):** By not specifying a USER, a program in the container may run as 'root'. This is a security hazard. If an attacker can control a process running as root, they may have control over the container. Ensure that the last USER in a Dockerfile is a USER other than 'root'.
```suggestion
USER non-root
CMD ["python", "get_emails.py"]
```
*Source: opengrep*
</issue_to_address>
### Issue 2
<location> `get_emails.py:342` </location>
<issue_to_address>
**security (opengrep-rules.python.lang.best-practice.arbitrary-sleep):** time.sleep() call; did you mean to leave this in?
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey @justinkumpe - I've reviewed your changes and they look great!
Blocking issues:
- time.sleep() call; did you mean to leave this in? (link)
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `get_emails.py:353` </location>
<code_context>
logger.info("Initializing Shipping Bot")
- fetch_emails()
+ schedule.every().hour.do(fetch_emails)
+ while not shutdown:
+ schedule.run_pending()
+ time.sleep(20) # Sleep for 20 seconds before checking again
</code_context>
<issue_to_address>
Consider reducing the sleep interval for more responsive shutdown.
A 20-second sleep may slow shutdown. Use a shorter interval or schedule.idle_seconds() for quicker response.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
while not shutdown:
schedule.run_pending()
time.sleep(20) # Sleep for 20 seconds before checking again
=======
while not shutdown:
schedule.run_pending()
idle = schedule.idle_seconds()
# Sleep for the lesser of idle time or 1 second for responsive shutdown
time.sleep(min(idle if idle is not None else 1, 1))
>>>>>>> REPLACE
</suggested_fix>
## Security Issues
### Issue 1
<location> `get_emails.py:355` </location>
<issue_to_address>
**security (opengrep-rules.python.lang.best-practice.arbitrary-sleep):** time.sleep() call; did you mean to leave this in?
*Source: opengrep*
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
|
@sourcery-ai dismiss @sourcery-ai review |
Automated Sourcery review dismissed.
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey @justinkumpe - I've reviewed your changes - here's some feedback:
- The Dockerfile removed the CMD instruction without adding a new ENTRYPOINT/CMD, so the container won’t actually launch your script on startup.
- Since you removed the initial fetch_emails() call, your first run could be delayed up to an hour—consider invoking the job once at startup or using schedule.run_all().
- Wrap your scheduled fetch_emails() invocation in a try/except to catch and log errors so the scheduler doesn’t silently stop on exceptions.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The Dockerfile removed the CMD instruction without adding a new ENTRYPOINT/CMD, so the container won’t actually launch your script on startup.
- Since you removed the initial fetch_emails() call, your first run could be delayed up to an hour—consider invoking the job once at startup or using schedule.run_all().
- Wrap your scheduled fetch_emails() invocation in a try/except to catch and log errors so the scheduler doesn’t silently stop on exceptions.
## Individual Comments
### Comment 1
<location> `get_emails.py:352` </location>
<code_context>
+ signal.signal(signal.SIGTERM, handle_shutdown)
logger.info("Initializing Shipping Bot")
- fetch_emails()
+ schedule.every().hour.do(fetch_emails)
+ while not shutdown:
+ schedule.run_pending()
</code_context>
<issue_to_address>
Potential for missed executions if fetch_emails takes longer than the interval.
If fetch_emails exceeds one hour, scheduled runs may be skipped. Evaluate if this behavior is acceptable, or if you need to handle overlapping or queued jobs.
</issue_to_address>
### Comment 2
<location> `Dockerfile:9` </location>
<code_context>
-CMD [ "python", "get_emails.py" ]
\ No newline at end of file
+USER non-root
+CMD ["python", "get_emails.py"]
</code_context>
<issue_to_address>
Switching to a non-root user is a good security practice, but ensure permissions are set correctly.
Ensure /app and any writable files are owned by the non-root user to prevent runtime permission errors.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
@sourcery-ai dismiss |
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey @justinkumpe - I've reviewed your changes - here's some feedback:
- Wrap the scheduled fetch_emails job in a try/except block so that an exception in one run doesn’t prevent future executions.
- Move the chown step before switching to USER non-root in the Dockerfile so that file permissions can actually be changed.
- Replace the global shutdown flag with a threading.Event or similar construct to avoid using globals for lifecycle management.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Wrap the scheduled fetch_emails job in a try/except block so that an exception in one run doesn’t prevent future executions.
- Move the chown step before switching to USER non-root in the Dockerfile so that file permissions can actually be changed.
- Replace the global shutdown flag with a threading.Event or similar construct to avoid using globals for lifecycle management.
## Individual Comments
### Comment 1
<location> `get_emails.py:352` </location>
<code_context>
+ signal.signal(signal.SIGTERM, handle_shutdown)
logger.info("Initializing Shipping Bot")
- fetch_emails()
+ schedule.every().hour.do(fetch_emails)
+ while not shutdown:
+ schedule.run_pending()
</code_context>
<issue_to_address>
Potential for missed executions if fetch_emails takes longer than an hour.
Consider whether job overlap or missed runs are acceptable, or if you need to implement job queuing or prevent overlapping executions.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| signal.signal(signal.SIGTERM, handle_shutdown) | ||
| logger.info("Initializing Shipping Bot") | ||
| fetch_emails() | ||
| schedule.every().hour.do(fetch_emails) |
There was a problem hiding this comment.
suggestion (bug_risk): Potential for missed executions if fetch_emails takes longer than an hour.
Consider whether job overlap or missed runs are acceptable, or if you need to implement job queuing or prevent overlapping executions.
Summary by Sourcery
Replace cron-based scheduling with an in-script scheduler, add graceful shutdown handling, and harden the Docker container by running as a non-root user.
New Features:
Enhancements:
Build:
Chores: