Skip to content

feat: add GitHub Actions workflow for verifying reproducible builds#415

Open
Evrard-Nil wants to merge 3 commits intomainfrom
feat/gh-action-check-repro
Open

feat: add GitHub Actions workflow for verifying reproducible builds#415
Evrard-Nil wants to merge 3 commits intomainfrom
feat/gh-action-check-repro

Conversation

@Evrard-Nil
Copy link
Contributor

@Evrard-Nil Evrard-Nil commented Feb 4, 2026

Note

Low Risk
CI-only change that doesn’t affect runtime code paths, but it can introduce new build flakiness or longer CI times due to multi-runner builds and Docker pruning.

Overview
Adds a new GitHub Actions workflow (verify-reproducible-build.yml) that validates container build reproducibility by running ./build-image.sh on multiple Ubuntu runners and comparing the resulting OCI image digest.

The workflow also performs two sequential builds on the same runner with aggressive Docker/buildx cleanup to catch nondeterminism, uploads digests (and the OCI archives only on mismatch), and posts a step summary table plus a pass/fail result based on digest equality.

Written by Cursor Bugbot for commit 4fa37d1. This will update automatically on new commits. Configure here.

Copilot AI review requested due to automatic review settings February 4, 2026 14:42
@Evrard-Nil Evrard-Nil temporarily deployed to Cloud API test env February 4, 2026 14:42 — with GitHub Actions Inactive
@gemini-code-assist
Copy link
Contributor

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

@Evrard-Nil Evrard-Nil temporarily deployed to Cloud API test env February 4, 2026 14:43 — with GitHub Actions Inactive
@claude
Copy link

claude bot commented Feb 4, 2026

Code Review

This PR adds a reproducibility verification workflow for Docker builds. The approach is solid, but there are a few critical issues that could cause false positives/negatives or confusing failures:

Critical Issues

1. Builder cleanup hardcodes name (lines 98-104)

  • The cleanup step assumes buildkit_20 builder name, which is hardcoded in build-image.sh:27
  • If build-image.sh changes the builder name, sequential builds won't be properly isolated
  • Risk: False positives if cached layers persist between builds
  • Suggested fix: Extract builder name dynamically from build-image.sh to stay in sync

2. Summary shows empty digests on early failure (lines 240-266)

  • If verification fails before setting GITHUB_OUTPUT variables (e.g., at line 199), the summary will display an empty table
  • The summary step runs with if: always() but doesn't handle missing outputs gracefully
  • Suggested fix: Set default empty values at the start of the verify step (before line 166)

3. Potential disk space exhaustion

  • Uploads 5 OCI tar files simultaneously (each could be 500MB-2GB+)
  • GitHub Actions runners have limited disk space (~14GB available)
  • Risk: Intermittent failures unrelated to reproducibility
  • Recommendation: Consider uploading only digests by default, and OCI tars only on mismatch/failure for debugging

Minor Suggestions

  • Line 31: Add || true to apt-get update to handle transient repository failures
  • Line 14-15: Weekly cron may be unnecessary for reproducibility verification (this doesn't change often) - consider monthly or on-release only

Overall the workflow logic is sound, but these issues could make it unreliable or confusing when failures occur. Recommend addressing issues #1 and #2 before merge.

⚠️ Issues found

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a GitHub Actions workflow to verify that Docker image builds are reproducible across different environments and sequential builds. The workflow tests reproducibility by building on multiple Ubuntu versions in parallel and performing sequential builds on the same runner with a clean environment between builds, then verifying that all builds produce identical image digests.

Changes:

  • Added a new workflow file verify-reproducible-build.yml that runs on pull requests affecting build files, manually via workflow_dispatch, and weekly on schedule
  • Implemented three jobs: parallel builds on different runners, sequential builds on the same runner, and verification that all digests match
  • Configured artifact uploads for digests and OCI archives with 1-day retention
Comments suppressed due to low confidence (1)

.github/workflows/verify-reproducible-build.yml:266

  • The summary generation step runs with 'if: always()' which means it will run even if the verify step fails. However, if the verify step fails early (e.g., during digest extraction), the output variables may not be set, and the summary will show empty digests. Consider adding error handling to display a more helpful message when outputs are missing, or checking if the outputs exist before using them in the summary.
            echo ""
            echo "| Environment | Digest |"
            echo "|-------------|--------|"
            echo "| ubuntu-latest | \`${{ steps.verify.outputs.digest_ubuntu_latest }}\` |"
            echo "| ubuntu-22.04 | \`${{ steps.verify.outputs.digest_ubuntu_22 }}\` |"
            echo "| ubuntu-24.04 | \`${{ steps.verify.outputs.digest_ubuntu_24 }}\` |"
            echo "| Sequential (first) | \`${{ steps.verify.outputs.digest_seq_first }}\` |"
            echo "| Sequential (second) | \`${{ steps.verify.outputs.digest_seq_second }}\` |"
            echo ""

            if [ "${{ steps.verify.outputs.result }}" = "passed" ]; then
              echo "### Result: REPRODUCIBLE"
              echo ""
              echo "All builds produced identical digests."
            else
              echo "### Result: NOT REPRODUCIBLE"
              echo ""
              echo "Build digests do not match across environments."
            fi
          } >> "$GITHUB_STEP_SUMMARY"


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 3 to 15
on:
workflow_dispatch:
pull_request:
schedule:
# Run weekly on Sunday at 00:00 UTC
- cron: '0 0 * * 0'

jobs:
# Build on multiple runners in parallel to test cross-machine reproducibility
build:
name: Build (${{ matrix.runner }})
runs-on: ${{ matrix.runner }}
strategy:
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow runs on every pull request that modifies build-related files, plus weekly on schedule. Each run performs 5 complete builds (3 parallel + 2 sequential), which can be resource-intensive. Consider whether this frequency is necessary, or if the workflow should only run on schedule or manual trigger, with PR runs being optional via a label or comment.

Copilot uses AI. Check for mistakes.
Comment on lines +189 to +200
done

if [ "$FAILED" = true ]; then
exit 1
fi

# Compare all digests - they should all match
REFERENCE_DIGEST="$DIGEST_UBUNTU_LATEST"
MISMATCH=false

echo "=== Comparing all digests against reference (ubuntu-latest) ==="
echo ""
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script sets FAILED=true but continues checking all digests before exiting. However, if FAILED is true, the script will exit with code 1 before reaching the digest comparison logic. This means if any digest is empty, the comparison step never runs and the summary will not show which specific builds failed. Consider moving the empty check failure handling to after all comparisons, or providing more specific error information in the output before exiting.

Copilot uses AI. Check for mistakes.

# Remove any cached layers
docker builder prune -af || true

Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup step removes the buildkit builder with a hardcoded name 'buildkit_20', but this builder is created with a conditional check in build-image.sh. If the builder doesn't exist when the cleanup runs, the '|| true' will suppress the error. However, if the builder was never created (because it already existed from a previous run), the state won't be truly fresh. Consider also cleaning up the builder before the first build to ensure a completely clean state, or document that this is intentional to test reproducibility with an existing builder.

Copilot uses AI. Check for mistakes.
Comment on lines +207 to +218
else
echo "[FAIL] ${name}: MISMATCH"
echo " Expected: ${REFERENCE_DIGEST}"
echo " Got: ${digest}"
MISMATCH=true
fi
}

compare_digest "ubuntu-22.04" "$DIGEST_UBUNTU_22"
compare_digest "ubuntu-24.04" "$DIGEST_UBUNTU_24"
compare_digest "Sequential first" "$DIGEST_SEQ_FIRST"
compare_digest "Sequential second" "$DIGEST_SEQ_SECOND"
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable MISMATCH is set inside the compare_digest function but is used after the function calls to determine the exit status. In bash, variables set inside functions are global by default (unless declared with 'local'), so this will work. However, this pattern can be error-prone and makes the code harder to understand. Consider making the function return a status code or using a more explicit approach to track mismatches.

Suggested change
else
echo "[FAIL] ${name}: MISMATCH"
echo " Expected: ${REFERENCE_DIGEST}"
echo " Got: ${digest}"
MISMATCH=true
fi
}
compare_digest "ubuntu-22.04" "$DIGEST_UBUNTU_22"
compare_digest "ubuntu-24.04" "$DIGEST_UBUNTU_24"
compare_digest "Sequential first" "$DIGEST_SEQ_FIRST"
compare_digest "Sequential second" "$DIGEST_SEQ_SECOND"
return 0
else
echo "[FAIL] ${name}: MISMATCH"
echo " Expected: ${REFERENCE_DIGEST}"
echo " Got: ${digest}"
return 1
fi
}
compare_digest "ubuntu-22.04" "$DIGEST_UBUNTU_22" || MISMATCH=true
compare_digest "ubuntu-24.04" "$DIGEST_UBUNTU_24" || MISMATCH=true
compare_digest "Sequential first" "$DIGEST_SEQ_FIRST" || MISMATCH=true
compare_digest "Sequential second" "$DIGEST_SEQ_SECOND" || MISMATCH=true

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +45
# Save digest to file for artifact upload
echo "${DIGEST}" > digest.txt

- name: Upload digest artifact
uses: actions/upload-artifact@v4
with:
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The digest extraction uses 'jq -r' to get the raw digest value, which is correct. However, there's no validation that the digest matches the expected format (e.g., sha256:...). If skopeo returns an unexpected format or the '.Digest' field doesn't exist, jq will return 'null' which will fail the empty check. Consider adding validation that the digest matches the expected format to provide clearer error messages.

Copilot uses AI. Check for mistakes.
Comment on lines 22 to 24
- name: Install build dependencies
run: |
sudo apt-get update
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matrix includes 'ubuntu-latest', 'ubuntu-22.04', and 'ubuntu-24.04', but 'ubuntu-latest' is currently an alias that points to one of these specific versions (typically the latest LTS). This means two runners in the matrix will be running the same underlying OS version, which doesn't add value for testing cross-machine reproducibility. Consider either removing 'ubuntu-latest' from the matrix or clarifying in comments that this is intentional to test the alias behavior.

Copilot uses AI. Check for mistakes.
Comment on lines 154 to 170
run: |
echo "=== Reproducibility Verification Results ==="
echo ""

# Read digests from artifact files
DIGEST_UBUNTU_LATEST=$(cat digest-ubuntu-latest/digest.txt)
DIGEST_UBUNTU_22=$(cat digest-ubuntu-22.04/digest.txt)
DIGEST_UBUNTU_24=$(cat digest-ubuntu-24.04/digest.txt)
DIGEST_SEQ_FIRST=$(cat digest-sequential/digest-first.txt)
DIGEST_SEQ_SECOND=$(cat digest-sequential/digest-second.txt)

# Export for summary step
echo "digest_ubuntu_latest=${DIGEST_UBUNTU_LATEST}" >> "$GITHUB_OUTPUT"
echo "digest_ubuntu_22=${DIGEST_UBUNTU_22}" >> "$GITHUB_OUTPUT"
echo "digest_ubuntu_24=${DIGEST_UBUNTU_24}" >> "$GITHUB_OUTPUT"
echo "digest_seq_first=${DIGEST_SEQ_FIRST}" >> "$GITHUB_OUTPUT"
echo "digest_seq_second=${DIGEST_SEQ_SECOND}" >> "$GITHUB_OUTPUT"
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow downloads artifacts with 'merge-multiple: false', which means each artifact will be downloaded into its own directory. This is correct given the subsequent code that reads from 'digest-ubuntu-latest/digest.txt', 'digest-ubuntu-22.04/digest.txt', etc. However, if the artifact names change (for example, if the runner names change), the hardcoded paths in lines 166-170 will break. Consider adding error handling or using a more dynamic approach to read from the downloaded artifacts.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@think-in-universe think-in-universe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Evrard. I was also thinking about automate the reproducible build verification yesterday. Thanks for adding this!

pull_request:
schedule:
# Run weekly on Sunday at 00:00 UTC
- cron: '0 0 * * 0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For weekly cron job, I think we can compare the generated image digest with the one already published in docker registry.

Copy link
Contributor

@think-in-universe think-in-universe Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we may consider running for each push to main instead of pull_request I think.

@Evrard-Nil Evrard-Nil temporarily deployed to Cloud API test env February 6, 2026 11:45 — with GitHub Actions Inactive
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on February 9

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

echo "=== Cleaning build environment ==="

# Extract builder name from build-image.sh to stay in sync
BUILDER_NAME=$(grep -oP '(?<=--name )\w+' build-image.sh | head -1) || true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex fails to extract builder names with hyphens

Medium Severity

The regex pattern \w+ used to extract the builder name from build-image.sh only matches word characters (letters, digits, underscore) and stops at hyphens. If the builder name contains a hyphen (e.g., my-builder), only the portion before the hyphen is captured, resulting in docker buildx rm attempting to remove a non-existent builder. The actual builder remains with its cached state, potentially invalidating the sequential build reproducibility test.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants