Skip to content

Conversation

@nttg8100
Copy link
Member

@nttg8100 nttg8100 commented Jan 9, 2026

This pull request introduces several important security and reliability improvements to job submission and credential handling, as well as updates to job status management. The main focus is on preventing sensitive credentials from being exposed via the job queue, enhancing error handling for disconnected compute resources, and improving process cleanup. Below are the most significant changes:

Security & Credential Handling:

  • Storage credentials are now validated before job submission, and only minimal identifiers (not actual credentials) are passed to the job queue. Temporary credentials are regenerated inside the worker, preventing exposure through Redis/Celery. [1] [2] [3] [4]
  • The token field in the Github model is now stored as an encrypted text field, and a migration is included to update its database type. [1] [2]

Job Status & Error Handling:

  • The SERVER_ERROR status is replaced with HPC_DISCONNECTED in job models and database comments, reflecting more accurate error states when compute resources are unreachable. [1] [2] [3]
  • Job monitoring now handles HPC_DISCONNECTED status, and process cleanup for credentials is performed on job completion or failure, with improved logging and error handling. [1] [2] [3]

Process Management & Logging:

  • The SSH tunnel cleanup logic is improved to more reliably kill processes on the expected port and log when no process is found.

Database & Migration Updates:

  • Database migrations update status comments and the token field type for consistency and security. [1] [2]

These changes collectively harden the system against credential leaks, improve error reporting, and ensure that job lifecycle management is robust and secure.

@nttg8100 nttg8100 requested a review from Copilot January 9, 2026 12:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances security and reliability by preventing credential exposure in job queues, improving error handling for disconnected compute resources, and ensuring proper cleanup of sensitive data. The changes address credential leakage risks by regenerating temporary credentials within workers rather than passing them through Redis/Celery.

  • Credential handling now uses just-in-time regeneration in workers instead of passing credentials through the job queue
  • Job status reporting updated from SERVER_ERROR to HPC_DISCONNECTED for better clarity when compute resources are unreachable
  • GitHub tokens are now encrypted at the database level with corresponding migration

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
backend/app/service_job/controller.py Validates storage credentials before job submission and passes only storage ID and user email instead of actual credentials
backend/app/service_job/tasks.py Regenerates temporary credentials in worker, adds credential cleanup on job completion, and handles HPC_DISCONNECTED status
backend/app/service_job/models.py Renames SERVER_ERROR to HPC_DISCONNECTED in job status enum
backend/app/utils/executor/ssh.py Improves SSH tunnel cleanup with better error handling and adds HPC_DISCONNECTED to monitored statuses
backend/app/service_credential/models/personal.py Changes GitHub token field from CharField to EncryptedTextField
backend/migrations/models/5_20260105205918_update.py Updates database comments to replace SERVER_ERROR with HPC_DISCONNECTED
backend/migrations/models/6_20260105224743_update.py Migrates GitHub token column from VARCHAR(200) to TEXT for encrypted storage
Comments suppressed due to low confidence (1)

backend/app/service_job/tasks.py:1

  • Using positional arguments for this many parameters (6 total) makes the function call fragile and hard to maintain. Consider using keyword arguments in apply_async: submit_job.apply_async(kwargs={'job_id': str(job.id), 'is_web_job': analysis.allow_access, ...}) for better readability and to prevent argument order mismatches.
from celery import shared_task

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nttg8100 and others added 2 commits January 9, 2026 20:00
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@nttg8100 nttg8100 self-assigned this Jan 9, 2026
@nttg8100 nttg8100 merged commit 819d862 into dev Jan 9, 2026
3 checks passed
@nttg8100 nttg8100 deleted the RC-153-chore-improve-security branch January 9, 2026 13:27
nttg8100 added a commit that referenced this pull request Jan 9, 2026
* feat: handle server disconnect

* fix: add github token to encrypt

* feat: remove credentails from params in celery

* feat: increase max_length token

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants