feat: add GitHub Actions workflow for verifying reproducible builds#415
feat: add GitHub Actions workflow for verifying reproducible builds#415Evrard-Nil wants to merge 3 commits intomainfrom
Conversation
|
Note Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported. |
Code ReviewThis PR adds a reproducibility verification workflow for Docker builds. The approach is solid, but there are a few critical issues that could cause false positives/negatives or confusing failures: Critical Issues1. Builder cleanup hardcodes name (lines 98-104)
2. Summary shows empty digests on early failure (lines 240-266)
3. Potential disk space exhaustion
Minor Suggestions
Overall the workflow logic is sound, but these issues could make it unreliable or confusing when failures occur. Recommend addressing issues #1 and #2 before merge. |
There was a problem hiding this comment.
Pull request overview
This PR adds a GitHub Actions workflow to verify that Docker image builds are reproducible across different environments and sequential builds. The workflow tests reproducibility by building on multiple Ubuntu versions in parallel and performing sequential builds on the same runner with a clean environment between builds, then verifying that all builds produce identical image digests.
Changes:
- Added a new workflow file
verify-reproducible-build.ymlthat runs on pull requests affecting build files, manually via workflow_dispatch, and weekly on schedule - Implemented three jobs: parallel builds on different runners, sequential builds on the same runner, and verification that all digests match
- Configured artifact uploads for digests and OCI archives with 1-day retention
Comments suppressed due to low confidence (1)
.github/workflows/verify-reproducible-build.yml:266
- The summary generation step runs with 'if: always()' which means it will run even if the verify step fails. However, if the verify step fails early (e.g., during digest extraction), the output variables may not be set, and the summary will show empty digests. Consider adding error handling to display a more helpful message when outputs are missing, or checking if the outputs exist before using them in the summary.
echo ""
echo "| Environment | Digest |"
echo "|-------------|--------|"
echo "| ubuntu-latest | \`${{ steps.verify.outputs.digest_ubuntu_latest }}\` |"
echo "| ubuntu-22.04 | \`${{ steps.verify.outputs.digest_ubuntu_22 }}\` |"
echo "| ubuntu-24.04 | \`${{ steps.verify.outputs.digest_ubuntu_24 }}\` |"
echo "| Sequential (first) | \`${{ steps.verify.outputs.digest_seq_first }}\` |"
echo "| Sequential (second) | \`${{ steps.verify.outputs.digest_seq_second }}\` |"
echo ""
if [ "${{ steps.verify.outputs.result }}" = "passed" ]; then
echo "### Result: REPRODUCIBLE"
echo ""
echo "All builds produced identical digests."
else
echo "### Result: NOT REPRODUCIBLE"
echo ""
echo "Build digests do not match across environments."
fi
} >> "$GITHUB_STEP_SUMMARY"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| on: | ||
| workflow_dispatch: | ||
| pull_request: | ||
| schedule: | ||
| # Run weekly on Sunday at 00:00 UTC | ||
| - cron: '0 0 * * 0' | ||
|
|
||
| jobs: | ||
| # Build on multiple runners in parallel to test cross-machine reproducibility | ||
| build: | ||
| name: Build (${{ matrix.runner }}) | ||
| runs-on: ${{ matrix.runner }} | ||
| strategy: |
There was a problem hiding this comment.
The workflow runs on every pull request that modifies build-related files, plus weekly on schedule. Each run performs 5 complete builds (3 parallel + 2 sequential), which can be resource-intensive. Consider whether this frequency is necessary, or if the workflow should only run on schedule or manual trigger, with PR runs being optional via a label or comment.
| done | ||
|
|
||
| if [ "$FAILED" = true ]; then | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Compare all digests - they should all match | ||
| REFERENCE_DIGEST="$DIGEST_UBUNTU_LATEST" | ||
| MISMATCH=false | ||
|
|
||
| echo "=== Comparing all digests against reference (ubuntu-latest) ===" | ||
| echo "" |
There was a problem hiding this comment.
The script sets FAILED=true but continues checking all digests before exiting. However, if FAILED is true, the script will exit with code 1 before reaching the digest comparison logic. This means if any digest is empty, the comparison step never runs and the summary will not show which specific builds failed. Consider moving the empty check failure handling to after all comparisons, or providing more specific error information in the output before exiting.
|
|
||
| # Remove any cached layers | ||
| docker builder prune -af || true | ||
|
|
There was a problem hiding this comment.
The cleanup step removes the buildkit builder with a hardcoded name 'buildkit_20', but this builder is created with a conditional check in build-image.sh. If the builder doesn't exist when the cleanup runs, the '|| true' will suppress the error. However, if the builder was never created (because it already existed from a previous run), the state won't be truly fresh. Consider also cleaning up the builder before the first build to ensure a completely clean state, or document that this is intentional to test reproducibility with an existing builder.
| else | ||
| echo "[FAIL] ${name}: MISMATCH" | ||
| echo " Expected: ${REFERENCE_DIGEST}" | ||
| echo " Got: ${digest}" | ||
| MISMATCH=true | ||
| fi | ||
| } | ||
|
|
||
| compare_digest "ubuntu-22.04" "$DIGEST_UBUNTU_22" | ||
| compare_digest "ubuntu-24.04" "$DIGEST_UBUNTU_24" | ||
| compare_digest "Sequential first" "$DIGEST_SEQ_FIRST" | ||
| compare_digest "Sequential second" "$DIGEST_SEQ_SECOND" |
There was a problem hiding this comment.
The variable MISMATCH is set inside the compare_digest function but is used after the function calls to determine the exit status. In bash, variables set inside functions are global by default (unless declared with 'local'), so this will work. However, this pattern can be error-prone and makes the code harder to understand. Consider making the function return a status code or using a more explicit approach to track mismatches.
| else | |
| echo "[FAIL] ${name}: MISMATCH" | |
| echo " Expected: ${REFERENCE_DIGEST}" | |
| echo " Got: ${digest}" | |
| MISMATCH=true | |
| fi | |
| } | |
| compare_digest "ubuntu-22.04" "$DIGEST_UBUNTU_22" | |
| compare_digest "ubuntu-24.04" "$DIGEST_UBUNTU_24" | |
| compare_digest "Sequential first" "$DIGEST_SEQ_FIRST" | |
| compare_digest "Sequential second" "$DIGEST_SEQ_SECOND" | |
| return 0 | |
| else | |
| echo "[FAIL] ${name}: MISMATCH" | |
| echo " Expected: ${REFERENCE_DIGEST}" | |
| echo " Got: ${digest}" | |
| return 1 | |
| fi | |
| } | |
| compare_digest "ubuntu-22.04" "$DIGEST_UBUNTU_22" || MISMATCH=true | |
| compare_digest "ubuntu-24.04" "$DIGEST_UBUNTU_24" || MISMATCH=true | |
| compare_digest "Sequential first" "$DIGEST_SEQ_FIRST" || MISMATCH=true | |
| compare_digest "Sequential second" "$DIGEST_SEQ_SECOND" || MISMATCH=true |
| # Save digest to file for artifact upload | ||
| echo "${DIGEST}" > digest.txt | ||
|
|
||
| - name: Upload digest artifact | ||
| uses: actions/upload-artifact@v4 | ||
| with: |
There was a problem hiding this comment.
The digest extraction uses 'jq -r' to get the raw digest value, which is correct. However, there's no validation that the digest matches the expected format (e.g., sha256:...). If skopeo returns an unexpected format or the '.Digest' field doesn't exist, jq will return 'null' which will fail the empty check. Consider adding validation that the digest matches the expected format to provide clearer error messages.
| - name: Install build dependencies | ||
| run: | | ||
| sudo apt-get update |
There was a problem hiding this comment.
The matrix includes 'ubuntu-latest', 'ubuntu-22.04', and 'ubuntu-24.04', but 'ubuntu-latest' is currently an alias that points to one of these specific versions (typically the latest LTS). This means two runners in the matrix will be running the same underlying OS version, which doesn't add value for testing cross-machine reproducibility. Consider either removing 'ubuntu-latest' from the matrix or clarifying in comments that this is intentional to test the alias behavior.
| run: | | ||
| echo "=== Reproducibility Verification Results ===" | ||
| echo "" | ||
|
|
||
| # Read digests from artifact files | ||
| DIGEST_UBUNTU_LATEST=$(cat digest-ubuntu-latest/digest.txt) | ||
| DIGEST_UBUNTU_22=$(cat digest-ubuntu-22.04/digest.txt) | ||
| DIGEST_UBUNTU_24=$(cat digest-ubuntu-24.04/digest.txt) | ||
| DIGEST_SEQ_FIRST=$(cat digest-sequential/digest-first.txt) | ||
| DIGEST_SEQ_SECOND=$(cat digest-sequential/digest-second.txt) | ||
|
|
||
| # Export for summary step | ||
| echo "digest_ubuntu_latest=${DIGEST_UBUNTU_LATEST}" >> "$GITHUB_OUTPUT" | ||
| echo "digest_ubuntu_22=${DIGEST_UBUNTU_22}" >> "$GITHUB_OUTPUT" | ||
| echo "digest_ubuntu_24=${DIGEST_UBUNTU_24}" >> "$GITHUB_OUTPUT" | ||
| echo "digest_seq_first=${DIGEST_SEQ_FIRST}" >> "$GITHUB_OUTPUT" | ||
| echo "digest_seq_second=${DIGEST_SEQ_SECOND}" >> "$GITHUB_OUTPUT" |
There was a problem hiding this comment.
The workflow downloads artifacts with 'merge-multiple: false', which means each artifact will be downloaded into its own directory. This is correct given the subsequent code that reads from 'digest-ubuntu-latest/digest.txt', 'digest-ubuntu-22.04/digest.txt', etc. However, if the artifact names change (for example, if the runner names change), the hardcoded paths in lines 166-170 will break. Consider adding error handling or using a more dynamic approach to read from the downloaded artifacts.
think-in-universe
left a comment
There was a problem hiding this comment.
Thanks Evrard. I was also thinking about automate the reproducible build verification yesterday. Thanks for adding this!
| pull_request: | ||
| schedule: | ||
| # Run weekly on Sunday at 00:00 UTC | ||
| - cron: '0 0 * * 0' |
There was a problem hiding this comment.
For weekly cron job, I think we can compare the generated image digest with the one already published in docker registry.
There was a problem hiding this comment.
Also we may consider running for each push to main instead of pull_request I think.
…e error handling and artifact management
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
This is the final PR Bugbot will review for you during this billing cycle
Your free Bugbot reviews will reset on February 9
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| echo "=== Cleaning build environment ===" | ||
|
|
||
| # Extract builder name from build-image.sh to stay in sync | ||
| BUILDER_NAME=$(grep -oP '(?<=--name )\w+' build-image.sh | head -1) || true |
There was a problem hiding this comment.
Regex fails to extract builder names with hyphens
Medium Severity
The regex pattern \w+ used to extract the builder name from build-image.sh only matches word characters (letters, digits, underscore) and stops at hyphens. If the builder name contains a hyphen (e.g., my-builder), only the portion before the hyphen is captured, resulting in docker buildx rm attempting to remove a non-existent builder. The actual builder remains with its cached state, potentially invalidating the sequential build reproducibility test.


Note
Low Risk
CI-only change that doesn’t affect runtime code paths, but it can introduce new build flakiness or longer CI times due to multi-runner builds and Docker pruning.
Overview
Adds a new GitHub Actions workflow (
verify-reproducible-build.yml) that validates container build reproducibility by running./build-image.shon multiple Ubuntu runners and comparing the resulting OCI image digest.The workflow also performs two sequential builds on the same runner with aggressive Docker/buildx cleanup to catch nondeterminism, uploads digests (and the OCI archives only on mismatch), and posts a step summary table plus a pass/fail result based on digest equality.
Written by Cursor Bugbot for commit 4fa37d1. This will update automatically on new commits. Configure here.