Setup execute-kfp-localrunner on EC2 instance and add an option to provide the BASE_IMAGE URLs for qualification by shruthis4 · Pull Request #76 · opendatahub-io/data-processing

shruthis4 · 2025-12-22T20:29:09Z

Description

Update the execute-kfp-localrunner on ec2 instance and also enable docling-vlm
Adds a gate mechanism to the execute-kfp-localrunners workflow to ensure CI jobs requiring secrets only run on the upstream repository.

How Has This Been Tested?

This has been tested with a draft PR created:#75

Merge criteria:

[ x] The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

New Features
- Configurable base-image inputs and environment-driven defaults for Python and Docling images.
Chores
- CI flow enhanced to provision and run tests on ephemeral EC2 runners and to stop them afterward.
- Added registry authentication and improved artifact/log upload on failures.
- Standardized container image references across pipeline deployments for consistency.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ovide the BASE_IMAGE URLs for qualification

coderabbitai · 2025-12-22T20:29:19Z

Walkthrough

Introduce EC2-backed GitHub Actions runner flow for local pipeline tests, add workflow_dispatch image overrides and Quay registry env, make image constants environment-driven, and update compiled pipeline manifests to use a single quay.io/amaredia/aipcc-docling-image.

Changes

Cohort / File(s)	Summary
Workflows `\.github/workflows/compile-kfp.yml`, `\.github/workflows/execute-kfp-localrunners.yml`	Minor formatting in compile workflow; reworked execute workflow to add `workflow_dispatch` inputs (`python_base_image`, `docling_base_image`), env vars (`QUAY_REGISTRY`, `INSTANCE_TYPE`), and a multi-job flow: `pr-check`, `launch-ec2-runner` (provisions EC2 runner, outputs label & instance-id), `test-local-pipelines` (runs on EC2 runner with environment setup, Docker login, conditional image overrides, artifact upload on failure), and `stop-ec2-runner` (terminates runner).
Constants `kubeflow-pipelines/common/constants.py`	Replaced hard-coded `PYTHON_BASE_IMAGE` and `DOCLING_BASE_IMAGE` with environment-driven values using `os.getenv(..., "quay.io/amaredia/aipcc-docling-image")`.
Compiled Pipelines `kubeflow-pipelines/docling-standard/standard_convert_pipeline_compiled.yaml`, `kubeflow-pipelines/docling-vlm/vlm_convert_pipeline_compiled.yaml`	Replaced multiple container image references for executor tasks (exec-create-pdf-splits, exec-docling-chunk, exec-docling-convert-*, exec-download-docling-models, exec-import-pdfs) with `quay.io/amaredia/aipcc-docling-image`.

Sequence Diagram

sequenceDiagram
    participant GH as GitHub Actions
    participant AWS as AWS OIDC / EC2 API
    participant EC2 as EC2 Runner (instance)
    participant Quay as Quay Registry
    participant Tests as Local Pipeline Tests

    GH->>GH: pr-check
    GH->>AWS: launch-ec2-runner (request instance, role via OIDC)
    AWS->>EC2: Provision instance
    EC2-->>GH: Return runner label & instance-id

    GH->>EC2: test-local-pipelines (dispatch job on runner)
    EC2->>EC2: Install system deps, Python, Docker, pip packages
    EC2->>Quay: Docker login (QUAY_REGISTRY)
    EC2->>EC2: Apply python_base_image/docling_base_image env (if provided)
    EC2->>Tests: Execute pipeline tests (local)
    Tests-->>EC2: Emit results & logs
    EC2-->>GH: Upload artifacts on failure

    GH->>AWS: stop-ec2-runner (terminate by label/instance-id)
    AWS->>EC2: Terminate instance

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main changes: setting up EC2 instance execution and adding BASE_IMAGE URL configuration options for the execute-kfp-localrunner workflow.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b02cb4 and 9634515.

📒 Files selected for processing (1)

.github/workflows/compile-kfp.yml

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/compile-kfp.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: compile (docling-vlm, kubeflow-pipelines/docling-vlm, python vlm_convert_pipeline.py, vlm_convert...
GitHub Check: compile (docling-standard, kubeflow-pipelines/docling-standard, python standard_convert_pipeline....
GitHub Check: Summary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

.github/workflows/compile-kfp.yml (1)
54-59: Quay login may be unnecessary for pipeline compilation.

The compile workflow runs Python scripts to generate YAML files. It doesn't pull or push container images. Unless the compilation step validates image existence, this login step adds latency without benefit.

Also, the indentation under with: uses 1 space (lines 57-59) instead of 2, which is inconsistent with the rest of the file.
🔎 Proposed fix for indentation
       - name: Log in to Quay Container Registry
         uses: docker/login-action@v3
         with:
-         registry: ${{ env.QUAY_REGISTRY }}
-         username: ${{ secrets.QUAY_USERNAME }}
-         password: ${{ secrets.QUAY_PASSWORD }}
+          registry: ${{ env.QUAY_REGISTRY }}
+          username: ${{ secrets.QUAY_USERNAME }}
+          password: ${{ secrets.QUAY_PASSWORD }}
.github/workflows/execute-kfp-localrunners.yml (2)
108-108: Inconsistent checkout action version.

Line 108 uses actions/checkout@v4 while line 62 uses actions/checkout@v6. Unify to the same version for consistency across jobs.
🔎 Proposed fix
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v6
141-146: YAML indentation inconsistency under with:.

The indentation uses 1 space instead of 2 under the with: block, inconsistent with the rest of the file.
🔎 Proposed fix
       - name: Log in to Quay Container Registry
         uses: docker/login-action@v3
         with:
-         registry: ${{ env.QUAY_REGISTRY }}
-         username: ${{ secrets.QUAY_USERNAME }}
-         password: ${{ secrets.QUAY_PASSWORD }}
+          registry: ${{ env.QUAY_REGISTRY }}
+          username: ${{ secrets.QUAY_USERNAME }}
+          password: ${{ secrets.QUAY_PASSWORD }}

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8537f16 and 8b02cb4.

📒 Files selected for processing (5)

.github/workflows/compile-kfp.yml
.github/workflows/execute-kfp-localrunners.yml
kubeflow-pipelines/common/constants.py
kubeflow-pipelines/docling-standard/standard_convert_pipeline_compiled.yaml
kubeflow-pipelines/docling-vlm/vlm_convert_pipeline_compiled.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: compile (docling-standard, kubeflow-pipelines/docling-standard, python standard_convert_pipeline....
GitHub Check: compile (docling-vlm, kubeflow-pipelines/docling-vlm, python vlm_convert_pipeline.py, vlm_convert...
GitHub Check: Summary

🔇 Additional comments (5)

kubeflow-pipelines/common/constants.py (1)

1-9: LGTM!

The environment-driven image configuration allows flexible overrides for CI qualification workflows while maintaining sensible defaults. The consolidation to a single image simplifies maintenance.

.github/workflows/execute-kfp-localrunners.yml (2)

35-48: Good addition: Clear PR feedback for fork contributors.

The pr-check job provides helpful messaging for contributors from forks who won't have access to secrets. This is a good UX improvement.

50-91: EC2 runner provisioning looks well-structured.

The multi-AZ configuration with proper tagging for CloudTrail tracking and job outputs for downstream consumption is well designed.

kubeflow-pipelines/docling-vlm/vlm_convert_pipeline_compiled.yaml (1)

304-304: Image consolidation applied correctly.

All executor images in this compiled pipeline are consistently updated to use quay.io/amaredia/aipcc-docling-image, aligning with the centralized image strategy in constants.py.

kubeflow-pipelines/docling-standard/standard_convert_pipeline_compiled.yaml (1)

386-386: Image consolidation applied consistently with VLM pipeline.

All executor images are updated to quay.io/amaredia/aipcc-docling-image, matching the VLM pipeline and central constants configuration.

coderabbitai · 2025-12-22T20:33:34Z

.github/workflows/execute-kfp-localrunners.yml

+          # Add current user to docker group
+          sudo usermod -aG docker $(whoami)
+
+          # Apply group membership without logout (temporary for this session)
+          sudo chmod 666 /var/run/docker.sock 


⚠️ Potential issue | 🟡 Minor

Security concern: Overly permissive Docker socket permissions.

chmod 666 /var/run/docker.sock makes the socket world-readable/writable, which is a security risk on shared or multi-tenant environments. While this is an ephemeral EC2 instance, the approach sets a bad precedent.

🔎 Suggested alternative

# Add current user to docker group sudo usermod -aG docker $(whoami) - # Apply group membership without logout (temporary for this session) - sudo chmod 666 /var/run/docker.sock + # Verify docker access works with sudo (group takes effect next login) + sudo docker version

Or run subsequent docker commands with sudo since this is an ephemeral instance anyway.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Add current user to docker group

sudo usermod -aG docker $(whoami)

# Apply group membership without logout (temporary for this session)

sudo chmod 666 /var/run/docker.sock

# Add current user to docker group

sudo usermod -aG docker $(whoami)

# Verify docker access works with sudo (group takes effect next login)

sudo docker version

🤖 Prompt for AI Agents

.github/workflows/execute-kfp-localrunners.yml lines 122-126: do not set /var/run/docker.sock to 666; instead remove the chmod line and either run subsequent docker commands with sudo (keep ephemeral security) or apply the new docker group membership for the current shell using newgrp docker (or exec sg docker "$SHELL" / command) after running sudo usermod -aG docker $(whoami); ensure the docker group exists before adding the user and avoid making the socket world-writable.

coderabbitai · 2025-12-22T20:33:34Z

.github/workflows/execute-kfp-localrunners.yml

+  stop-ec2-runner:
+    if: github.repository == 'opendatahub-io/data-processing' && always()
+    permissions:
+      id-token: write # This is required for OIDC (AWS auth)
+      contents: read
+    needs:
+      - launch-ec2-runner
+      - test-local-pipelines
+
+    runs-on: ubuntu-latest
+    steps:
+      - name: Configure AWS credentials
+        uses: aws-actions/configure-aws-credentials@61815dcd50bd041e203e49132bacad1fd04d2708 # v5.1.1
+        with:
+          role-to-assume: "arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/${{ vars.DATA_PROCESSING_IAM_ROLE }}"
+          aws-region: us-east-2
+          role-session-name: odh-data-processing  # For tracking in CloudTrail
+
+      - name: Stop EC2 runner
+        uses: machulav/ec2-github-runner@a6dbcefcf8a31a861f5e078bb153ed332130c512 # v2.4.3
+        with:
+          mode: stop
+          github-token: "${{ secrets.DATA_PROCESSING_GH_PERSONAL_ACCESS_TOKEN }}"
+          label: ${{ needs.launch-ec2-runner.outputs.label }}
+          ec2-instance-id: ${{ needs.launch-ec2-runner.outputs.ec2-instance-id }}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/execute-kfp-localrunners.yml | sed -n '1,200p'

Repository: opendatahub-io/data-processing

Length of output: 9175

🏁 Script executed:

fd execute-kfp-localrunners.yml

Repository: opendatahub-io/data-processing

Length of output: 56

Add condition to stop-ec2-runner to skip if launch-ec2-runner fails.

The stop-ec2-runner job uses if: always() to run cleanup even on failure, but it receives empty label and ec2-instance-id outputs if launch-ec2-runner fails before emitting them. These inputs are required for the machulav/ec2-github-runner action in stop mode, and the action cannot properly terminate the instance with empty values. Add a check: if: github.repository == 'opendatahub-io/data-processing' && always() && needs.launch-ec2-runner.result == 'success' to skip cleanup only when outputs are guaranteed to be available.

🤖 Prompt for AI Agents

.github/workflows/execute-kfp-localrunners.yml lines 167-191: the stop-ec2-runner job currently uses if: always() which causes it to run even when launch-ec2-runner failed and its outputs (label and ec2-instance-id) are empty; update the job-level if condition to require that needs.launch-ec2-runner.result == 'success' in addition to the existing repository check and always() guard so the job only runs when the launch job completed successfully and outputs are present.

mergify · 2025-12-22T21:16:59Z

🎉 Auto-merged successfully!

✅ All reviewers approved: 1
✅ CI checks passed: All

Approved by:

@alimaredia

Setup execute-kfp-localrunner on EC2 instance and add an option to pr…

8b02cb4

…ovide the BASE_IMAGE URLs for qualification

shruthis4 requested a review from alimaredia December 22, 2025 20:29

shruthis4 requested a review from a team as a code owner December 22, 2025 20:29

coderabbitai bot reviewed Dec 22, 2025

View reviewed changes

Removed quay login step as we point to public quay account

9634515

alimaredia approved these changes Dec 22, 2025

View reviewed changes

mergify bot merged commit 37b97a0 into opendatahub-io:main Dec 22, 2025
7 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup execute-kfp-localrunner on EC2 instance and add an option to provide the BASE_IMAGE URLs for qualification#76

Setup execute-kfp-localrunner on EC2 instance and add an option to provide the BASE_IMAGE URLs for qualification#76
mergify[bot] merged 2 commits intoopendatahub-io:mainfrom
shruthis4:KFPWorkflow

shruthis4 commented Dec 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 22, 2025

Uh oh!

coderabbitai bot Dec 22, 2025

Uh oh!

Uh oh!

mergify bot commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

shruthis4 commented Dec 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

shruthis4 commented Dec 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 22, 2025 •

edited

Loading