Skip to content

feat: Add GitHub Actions workflow for GHCR image publishing#207

Open
jevy wants to merge 9 commits intomainfrom
feat/ghcr-image-publishing
Open

feat: Add GitHub Actions workflow for GHCR image publishing#207
jevy wants to merge 9 commits intomainfrom
feat/ghcr-image-publishing

Conversation

@jevy
Copy link
Contributor

@jevy jevy commented Jan 27, 2026

Summary

  • Add automated Docker image publishing to GitHub Container Registry (GHCR)
  • Update server Jib config to publish to ghcr.io/typestreamio/server
  • Publish 3 images: server, demo-data, kafka-connect
  • Only triggers on version tags (v*) or manual dispatch
  • Smoke tests verify images before publishing

Changes

New workflow: .github/workflows/publish-images.yml

  • Triggers: Release tags (v*), manual dispatch only (no push to main)
  • Build & Test Jobs (run in parallel):
    • build-and-test-server: Builds with Jib, verifies server starts and stays running
    • build-and-test-demo-data: Builds image, verifies container can run
    • build-and-test-kafka-connect: Builds image, verifies Debezium connectors installed
  • Publish Job: Only runs after all smoke tests pass, pushes all 3 images to GHCR

Updated: server/build.gradle.kts

  • Changed image from typestream/server (Docker Hub) to ghcr.io/typestreamio/server (GHCR)
  • Added latest tag alongside version tag
  • Added GHCR authentication via GITHUB_ACTOR and GITHUB_TOKEN env vars

Test plan

  • Create a test tag and verify workflow triggers
  • Verify smoke tests run before publishing
  • Check images appear at https://github.com/orgs/typestreamio/packages
  • Verify images are tagged with latest, version, and commit SHA
  • Test manual workflow dispatch

Closes typestream-0c2

@claude
Copy link

claude bot commented Jan 27, 2026

Code Review - PR #207: Add GitHub Actions workflow for GHCR image publishing

Overall, this is a well-structured PR that adds automated Docker image publishing to GitHub Container Registry. The implementation is solid with good use of modern GitHub Actions practices. Below are my findings:

✅ Strengths

  1. Good Action Versions: Using up-to-date and stable action versions (@V3, @v4, @v5, @v6)
  2. Proper Permissions: Correctly scoped permissions (contents: read, packages: write)
  3. Efficient Caching: Using GitHub Actions cache for both Docker layers and Nix
  4. Proper Metadata: Using docker/metadata-action for consistent tagging
  5. Nix Integration: Correctly integrates with existing Nix development environment
  6. Multiple Triggers: Supports push, tags, and manual dispatch

🐛 Potential Issues

Critical

  1. Missing Dockerfile Dependencies (server/build.gradle.kts:56-59, .github/workflows/publish-images.yml:105-108)
    • The kafka-connect Dockerfile references scripts that need to be copied from the docker/ directory:
      COPY register-connector.sh /usr/local/bin/register-connector.sh
      COPY custom-entrypoint.sh /usr/local/bin/custom-entrypoint.sh
    • The build context is . (repo root) but these scripts are in docker/
    • Fix: Update COPY commands to:
      COPY docker/register-connector.sh /usr/local/bin/register-connector.sh
      COPY docker/custom-entrypoint.sh /usr/local/bin/custom-entrypoint.sh

High Priority

  1. Hardcoded Version Instead of Dynamic (server/build.gradle.kts:55)

    • Using project.version.toString() which defaults to "unspecified" based on the version-info plugin
    • This means images will be tagged as "unspecified" and "latest"
    • Recommendation: Consider using commit SHA or a proper version from git tags:
      tags = mutableSetOf("latest", System.getenv("GITHUB_SHA")?.take(7) ?: "dev")
  2. No Workflow Path Triggers (.github/workflows/publish-images.yml:2-6)

    • Workflow triggers on any push to main, even if no relevant files changed
    • Recommendation: Add path filters like other workflows:
      on:
        push:
          branches: [main]
          paths:
            - 'server/**'
            - 'connectors/demo-data/**'
            - 'docker/**'
            - '.github/workflows/publish-images.yml'
          tags: ['v*']

Medium Priority

  1. Redundant Docker Login (.github/workflows/publish-images.yml:31-36)

    • The server job logs into GHCR but Jib handles auth via GITHUB_TOKEN env var
    • The docker/login-action step might be unnecessary here
    • Recommendation: Verify if this step is needed or if Jib's auth is sufficient
  2. Missing Build Failure Handling

    • No strategy for failed builds (continue-on-error, fail-fast, etc.)
    • Recommendation: Add explicit failure handling or at least document expected behavior

🔒 Security Considerations

  1. GITHUB_TOKEN Permissions: ✅ Correctly scoped to minimum necessary
  2. Public Registry: Images will be public by default on GHCR - ensure this is intended
  3. No Image Scanning: Consider adding vulnerability scanning with Trivy or similar:
    - name: Run Trivy vulnerability scanner
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ghcr.io/typestreamio/server:${{ github.sha }}
        format: 'sarif'
        output: 'trivy-results.sarif'

⚡ Performance Considerations

  1. Parallel Jobs: ✅ Three images build in parallel - good for CI time
  2. Cache Strategy: ✅ Using cache-from: type=gha and cache-to: type=gha,mode=max
  3. Nix Magic Cache: ✅ Should speed up Gradle builds significantly

📋 Best Practices & Suggestions

  1. Add Workflow Status Badge to README.md:

    [![Publish Docker Images](https://github.com/typestreamio/typestream/actions/workflows/publish-images.yml/badge.svg)](https://github.com/typestreamio/typestream/actions/workflows/publish-images.yml)
  2. Consider Multi-arch Builds for demo-data and kafka-connect:

    platforms: linux/amd64,linux/arm64
  3. Add Image Labels for better metadata:

    labels: |
      org.opencontainers.image.source=${{ github.event.repository.html_url }}
      org.opencontainers.image.revision=${{ github.sha }}
  4. Document Image URLs in README or CONTRIBUTING guide

🧪 Test Coverage

  • Manual testing checklist in PR body is good
  • Consider adding a smoke test job that pulls and verifies images after publish
  • No automated tests for the workflow itself (acceptable for CI/CD workflows)

📝 Code Quality

  • Clean, readable YAML
  • Consistent naming conventions
  • Good job/step naming for CI visibility
  • Follows existing repository patterns (Nix usage, action choices)

Summary

Recommended Actions Before Merge:

  1. ✅ Fix kafka-connect Dockerfile COPY paths (Critical)
  2. ⚠️ Add path triggers to prevent unnecessary builds (High)
  3. ⚠️ Review version tagging strategy for server image (High)
  4. 💡 Consider adding image vulnerability scanning (Nice-to-have)

Approval Status: Approve with minor changes requested

The workflow is well-designed and follows good practices. The critical issue with the kafka-connect Dockerfile paths must be fixed to prevent build failures. The version tagging concern should be addressed to ensure proper image versioning.

@jevy
Copy link
Contributor Author

jevy commented Jan 27, 2026

Thanks for the review! Fixed the issues:

  1. Fixed Dockerfile COPY paths - Updated to docker/register-connector.sh and docker/custom-entrypoint.sh
  2. Added path triggers - Workflow now only runs on changes to:
    • server/**
    • connectors/demo-data/**
    • docker/Dockerfile.kafka-connect
    • docker/register-connector.sh
    • docker/custom-entrypoint.sh
    • .github/workflows/publish-images.yml

Re: version tagging - the current project.version.toString() is fine for now. The metadata-action also adds SHA-based tags for the Docker-built images. We can improve server versioning in a follow-up.

@claude
Copy link

claude bot commented Jan 27, 2026

Code Review: GitHub Container Registry Image Publishing

Summary

This PR adds automated Docker image publishing to GHCR for the server, demo-data, and kafka-connect images. The implementation is well-structured and follows GitHub Actions best practices.


✅ Strengths

  1. Good workflow organization: Three separate jobs with appropriate path filters to avoid unnecessary builds
  2. Proper authentication: Uses GITHUB_TOKEN with correct permissions (packages: write)
  3. Smart caching: Uses Docker BuildX with GitHub Actions cache for demo-data and kafka-connect
  4. Consistent with existing patterns: Follows the same Nix setup used in server-check.yml and connectors-check.yml
  5. Build context fix: Correctly fixed the Dockerfile.kafka-connect COPY paths to work from repo root

🔍 Issues & Recommendations

1. Critical: Missing version.txt for Server Image

The server build uses Jib with project.version.toString() but there's no explicit version management visible. The typestream.version-info plugin is referenced but:

  • Ensure the version is properly generated before the Jib task runs
  • Consider adding git describe --tags --always output for traceability
  • Recommendation: Add a step to display the version being built for audit purposes:
- name: Display version
  run: nix develop --command gradle properties | grep ^version:

2. Missing Jib dependencies path filter

The workflow triggers on changes to key paths but is missing important Jib dependencies:

paths:
  - 'server/**'
  # Add these:
  - 'buildSrc/**'      # Convention plugins
  - 'gradle/**'         # Gradle wrapper
  - 'build.gradle.kts'  # Root build config
  - 'settings.gradle.kts'
  - 'gradle.properties'
  - 'flake.nix'         # Nix environment
  - 'flake.lock'

3. Security: Consider image scanning

For production images, add vulnerability scanning:

- name: Run Trivy vulnerability scanner
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: ${{ env.REGISTRY }}/typestreamio/server:latest
    format: 'sarif'
    output: 'trivy-results.sarif'

4. Tag strategy consideration

Currently using latest + version tag. Consider:

  • For main branch: latest, commit SHA, and main-<sha>
  • For release tags: v1.2.3, 1.2, 1, and latest
  • This gives users more control over image pinning

The docker/metadata-action already generates SHA tags for demo-data and kafka-connect but not for server. Consider adding this to the server job:

- name: Set image tags
  id: meta
  run: |
    TAGS="${{ env.REGISTRY }}/typestreamio/server:latest"
    TAGS="${TAGS},${{ env.REGISTRY }}/typestreamio/server:${{ github.sha }}"
    echo "tags=${TAGS}" >> $GITHUB_OUTPUT

- name: Build and push server image
  run: nix develop --command gradle :server:jib
  env:
    GITHUB_ACTOR: ${{ github.actor }}
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    JIB_TAGS: ${{ steps.meta.outputs.tags }}

5. Build matrix for demo-data Dockerfile

The demo-data Dockerfile has a multi-stage build with Gradle. Consider:

  • Using --cache-from and --cache-to for the Gradle layer too
  • Or switch to using Nix + Gradle like the server for consistency
  • Current approach is fine but may have slower builds than Nix

6. Missing workflow concurrency control

To prevent multiple simultaneous publishes:

concurrency:
  group: publish-images-${{ github.ref }}
  cancel-in-progress: false  # Don't cancel, let it finish

🐛 Potential Bugs

Minor: Kafka Connect registration script robustness

In docker/register-connector.sh, the connector check could fail silently:

# Line 17: Consider adding error context
if curl -sf "${CONNECT_URL}/connectors/postgres-cdc" > /dev/null 2>&1; then
  echo "Connector postgres-cdc already exists, skipping registration"
  exit 0
fi

Recommendation: Add -v flag when debugging or add a final status check after registration.


📊 Performance Considerations

  1. Nix cache hit rate: The Magic Nix Cache should speed up builds significantly
  2. Docker layer caching: Demo-data and kafka-connect use GitHub Actions cache effectively
  3. Parallel jobs: All three images build concurrently - good for CI time

Estimated build times:

  • Server (Nix + Jib): ~3-5 minutes (after cache)
  • Demo-data: ~5-8 minutes (Gradle download + build)
  • Kafka-connect: ~8-10 minutes (downloads multiple connectors)

✅ Test Coverage

The PR description mentions a test plan but it's incomplete:

  • ✅ Tests that workflow triggers correctly
  • ✅ Tests image publishing
  • Missing: Test that images actually work
    • Consider adding a smoke test job that pulls and runs each image
    • Example: Pull server image and check /health endpoint

🔒 Security Considerations

  1. ✅ Proper use of GITHUB_TOKEN (scoped to packages)
  2. ✅ No hardcoded credentials
  3. ⚠️ Consider: Making images private initially until ready for public release
  4. ⚠️ Consider: Adding SBOM (Software Bill of Materials) generation
  5. ⚠️ Consider: Signing images with cosign

📝 Documentation Needs

Consider updating:

  1. README.md: Document the new GHCR images and how to pull them
  2. docker-compose.yml: Update to use GHCR images instead of local builds (or make it configurable)
  3. CONTRIBUTING.md: Explain the image publishing process for contributors

🎯 Verdict

Recommendation: Approve with minor changes

This is a solid foundation for automated image publishing. The critical issues are:

  1. Add missing path filters for server job
  2. Verify version generation works correctly
  3. Add concurrency control

The optional improvements (scanning, signing, smoke tests) can be added in follow-up PRs.


📋 Suggested Follow-ups

  1. Add integration tests that pull and validate published images
  2. Implement image scanning with Trivy or similar
  3. Add SBOM generation for compliance
  4. Update docker-compose files to reference GHCR images
  5. Set up image retention policies in GHCR settings

@claude
Copy link

claude bot commented Jan 28, 2026

Code Review: GitHub Actions GHCR Publishing

Summary

This PR adds automated Docker image publishing to GitHub Container Registry with smoke tests. The approach is well-structured with proper separation of concerns. Here's my detailed feedback:


Strengths

  1. Smart workflow design: Separating build/test from publish ensures images are validated before pushing
  2. Proper triggers: Only on version tags (v*) and manual dispatch prevents accidental publishes
  3. Good smoke tests: Each image has appropriate verification (server startup, container runs, connectors installed)
  4. Build caching: Proper use of cache-from/cache-to for Docker builds
  5. Clean Dockerfile fix: Correcting COPY paths in kafka-connect Dockerfile (docker/register-connector.sh) is the right approach

🔍 Issues & Concerns

1. Duplicate Server Build (Performance/Cost)

The server is built twice:

  • Line 42: gradle :server:jibDockerBuild (smoke test)
  • Line 177: gradle :server:jib (publish)

Impact: Wastes ~5-10 minutes and GitHub Actions minutes

Recommendation: Build once, save as artifact, then push in publish job:

- name: Build and save server image
  run: |
    nix develop --command gradle :server:jibDockerBuild
    docker save ghcr.io/typestreamio/server:latest > /tmp/server.tar
    
- uses: actions/upload-artifact@v4
  with:
    name: server-image
    path: /tmp/server.tar

Then in publish job:

- uses: actions/download-artifact@v4
- run: docker load < /tmp/server.tar
- run: docker tag ... && docker push ...

2. Weak Server Smoke Test (Reliability)

Line 53: grep -q "started" is fragile

Issues:

  • What if log message changes?
  • Could match partial startup before crash
  • No verification of actual functionality

Recommendation: Add health check:

# After waiting for "started"
- name: Health check
  run: |
    # Wait for gRPC server to respond (port 4242 based on server config)
    timeout 10 bash -c 'until nc -z localhost 4242; do sleep 1; done'
    echo "Server is responding on gRPC port"

Or use grpcurl to verify the server actually responds:

- name: Install grpcurl
  run: |
    curl -sSL https://github.com/fullstorydev/grpcurl/releases/download/v1.8.9/grpcurl_1.8.9_linux_x86_64.tar.gz | tar -xz
    
- name: Health check
  run: ./grpcurl -plaintext localhost:4242 list

3. Missing Output Reuse (Minor)

Line 19: outputs.image-tag is defined but never used

Fix: Either use it in the publish job or remove it:

# Remove this if not needed:
outputs:
  image-tag: ${{ steps.meta.outputs.version }}

4. Inconsistent Tagging Strategy

  • Server (line 55): Uses latest + version
  • Demo-data (line 188-191): Uses latest + version + SHA
  • Kafka-connect (line 211-214): Uses latest + version + SHA

Recommendation: Be consistent. Either all get SHA tags or none:

# In server build.gradle.kts, add SHA tag:
tags = mutableSetOf(
    project.version.toString(), 
    "latest",
    System.getenv("GITHUB_SHA")?.take(7) ?: "dev"
)

5. Demo-data Smoke Test Could Be Better (Minor)

Line 110-111: Using || true swallows all errors

Current:

docker run --rm demo-data:test --help || true
timeout 5 docker run --rm demo-data:test coinbase 2>&1 | head -20 || true

Better approach:

# Verify --help exits with 0
if ! docker run --rm demo-data:test --help; then
  echo "FAIL: --help failed"
  exit 1
fi

# Verify coinbase starts (expect it to fail on Kafka connection, not crash)
if ! timeout 5 docker run --rm demo-data:test coinbase 2>&1 | grep -E "(coinbase|Kafka)"; then
  echo "FAIL: No expected output"
  exit 1
fi

6. Security: Consider Image Scanning

No vulnerability scanning before publishing to registry

Recommendation: Add Trivy scanning:

- name: Scan server image for vulnerabilities
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: 'ghcr.io/typestreamio/server:latest'
    format: 'sarif'
    output: 'trivy-results.sarif'
    severity: 'CRITICAL,HIGH'
    
- name: Upload Trivy results to GitHub Security
  uses: github/codeql-action/upload-sarif@v3
  if: always()
  with:
    sarif_file: 'trivy-results.sarif'

7. Missing Rollback Strategy

If a bad image is published, there's no easy way to revert

Recommendation: Document rollback process or add:

# In publish job, save previous latest tag
- name: Backup current latest
  run: |
    docker pull ghcr.io/typestreamio/server:latest || true
    docker tag ghcr.io/typestreamio/server:latest ghcr.io/typestreamio/server:latest-backup || true
    docker push ghcr.io/typestreamio/server:latest-backup || true

📋 Minor Improvements

  1. Add workflow status badge: Update README with:

    ![Publish Images](https://github.com/typestreamio/typestream/actions/workflows/publish-images.yml/badge.svg)
  2. Add timeout to jobs: Prevent runaway jobs:

    jobs:
      build-and-test-server:
        runs-on: ubuntu-latest
        timeout-minutes: 30  # Add this
  3. Job naming: Consider renaming build-and-test-servertest-server-image for clarity


🔒 Security Review

  • ✅ Proper use of GITHUB_TOKEN (scoped to workflow)
  • ✅ Minimal permissions (contents: read, packages: write)
  • ✅ No secrets hardcoded
  • ⚠️ Consider adding image signing (cosign) for supply chain security

🧪 Test Coverage

Good smoke test coverage, but consider:

  • Testing image size (regression check)
  • Testing startup time (performance regression)
  • Testing with actual Kafka for demo-data

📊 Priority Fixes

P0 (Must fix before merge):

  1. Fix duplicate server build (wastes resources)
  2. Strengthen server smoke test (reliability)

P1 (Should fix):
3. Make tagging consistent across all images
4. Improve demo-data smoke test

P2 (Nice to have):
5. Add vulnerability scanning
6. Add job timeouts
7. Document rollback strategy


🎯 Recommendation

Approve with changes. The core workflow is solid, but the duplicate build and weak health check should be addressed before merging. The other issues can be tackled in follow-up PRs if time is tight.

Great work on the smoke test approach - that's a pattern more teams should follow! 🚀

@claude
Copy link

claude bot commented Jan 28, 2026

PR Review: GitHub Actions GHCR Publishing Workflow

Thanks for this comprehensive PR! The workflow structure is solid with good separation of concerns (build/test → publish). Here's my detailed feedback:


Strengths

  1. Excellent smoke testing strategy - Testing images before publishing prevents broken releases
  2. Parallel job execution - All three builds run concurrently, maximizing efficiency
  3. Proper gating - Publish job only runs after all smoke tests pass
  4. Multiple tagging strategy - latest, version tag, and SHA provide flexibility
  5. Proper permissions - Minimal permissions with contents: read and packages: write
  6. Conservative triggers - Only tags and manual dispatch prevents accidental publishes

🐛 Critical Issues

1. Version extraction will fail on manual dispatch (.github/workflows/publish-images.yml:197-213)

When triggered via workflow_dispatch, GITHUB_REF_NAME won't have a v prefix to strip. This will cause tagging failures.

Fix:

# Tag with version and SHA
if [[ "$GITHUB_REF_TYPE" == "tag" ]]; then
  VERSION="${GITHUB_REF_NAME#v}"
else
  VERSION="dev-${GITHUB_SHA::7}"
fi
docker tag ghcr.io/typestreamio/server:latest ghcr.io/typestreamio/server:${VERSION}
docker tag ghcr.io/typestreamio/server:latest ghcr.io/typestreamio/server:${GITHUB_SHA::7}

# Push all tags
docker push ghcr.io/typestreamio/server:latest
if [[ "$GITHUB_REF_TYPE" == "tag" ]]; then
  docker push ghcr.io/typestreamio/server:${VERSION}
fi
docker push ghcr.io/typestreamio/server:${GITHUB_SHA::7}

Apply the same pattern to demo-data and kafka-connect publishing steps.


2. Server smoke test grep pattern may be too fragile (.github/workflows/publish-images.yml:46)

The pattern grep -q "started" is very generic and might match startup messages before the server is actually ready.

Recommendation:
Check the actual server logs to identify a more specific startup message. For example:

if docker logs server-smoke-test 2>&1 | grep -q "gRPC server started on port 4242"; then

3. Race condition in server health check (.github/workflows/publish-images.yml:45-56)

The loop checks logs and immediately breaks on finding "started", but doesn't verify the container is still running afterward. A container could print "started" and then crash.

Fix: Move the container status check inside the loop:

for i in {1..30}; do
  # First check container is still running
  if ! docker ps | grep -q server-smoke-test; then
    echo "Server container exited unexpectedly"
    docker logs server-smoke-test
    exit 1
  fi
  
  if docker logs server-smoke-test 2>&1 | grep -q "started"; then
    echo "Server started successfully"
    break
  fi
  
  if [ $i -eq 30 ]; then
    echo "Server failed to start in time"
    docker logs server-smoke-test
    exit 1
  fi
  sleep 1
done

⚠️ Medium Priority Issues

4. Demo-data smoke test is too lenient (.github/workflows/publish-images.yml:116)

The regex (coinbase|Kafka|bootstrap) will pass even on error messages. An error like "Failed to connect to Kafka bootstrap servers" would incorrectly pass.

Recommendation:

# Verify coinbase generator starts without immediate crashes
timeout 5 docker run --rm demo-data:test coinbase 2>&1 | tee /tmp/demo-output.log || true

# Check for expected startup patterns (not errors)
if grep -qE "(Starting coinbase|Connecting to|Generating)" /tmp/demo-output.log; then
  echo "Coinbase generator starts correctly"
else
  echo "Unexpected output - generator may have crashed:"
  cat /tmp/demo-output.log
  exit 1
fi

5. Missing dependency validation in kafka-connect smoke test (.github/workflows/publish-images.yml:159)

The test only checks if the directory exists, not if it contains valid connector files.

Enhancement:

docker run --rm --entrypoint /bin/bash kafka-connect:test -c "
  ls -la /kafka/connect/ && 
  test -d /kafka/connect/debezium-connector-postgres && 
  test -f /kafka/connect/debezium-connector-postgres/*.jar &&
  echo 'Debezium connector JARs verified'
"

💡 Best Practice Suggestions

6. Add image size reporting

Track image sizes to catch bloat over time:

- name: Report image sizes
  run: |
    echo "## Image Sizes" >> $GITHUB_STEP_SUMMARY
    echo "- server: $(docker images ghcr.io/typestreamio/server:latest --format '{{.Size}}')" >> $GITHUB_STEP_SUMMARY

7. Add cleanup on failure

Ensure containers are cleaned up even on failure:

- name: Smoke test - server starts and responds
  run: |
    # Your smoke test script here
  
- name: Cleanup smoke test container
  if: always()
  run: |
    docker stop server-smoke-test 2>/dev/null || true
    docker rm server-smoke-test 2>/dev/null || true

8. Consider adding SBOM/vulnerability scanning

For production images, consider adding:

- name: Scan for vulnerabilities
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: ghcr.io/typestreamio/server:latest
    format: 'sarif'
    output: 'trivy-results.sarif'

📝 Minor Issues

9. Dockerfile path fix is correct but undocumented (docker/Dockerfile.kafka-connect:75-77)

The change from COPY register-connector.sh to COPY docker/register-connector.sh is necessary because GitHub Actions builds from repo root, but this differs from local builds. Consider adding a comment explaining this dual-context support.

10. Missing workflow documentation

Add a comment at the top of the workflow explaining:

  • When it triggers
  • What it publishes
  • How to manually trigger
  • How to test changes

🔒 Security Considerations

  1. GITHUB_TOKEN permissions are appropriate - Read-only for contents, write for packages
  2. No secrets are exposed - Credentials properly handled via GitHub Actions context
  3. Build context is trusted - Using official actions and base images

🧪 Test Coverage Assessment

Missing test scenarios:

  • No verification that published images are actually pullable from GHCR
  • No cross-platform build testing (only linux/amd64 implicit)
  • No rollback strategy if a broken image is published

Recommendation: Consider adding a final validation step:

- name: Verify published images are pullable
  run: |
    docker logout ${{ env.REGISTRY }}
    docker pull ghcr.io/typestreamio/server:latest
    docker pull ghcr.io/typestreamio/demo-data:latest
    docker pull ghcr.io/typestreamio/kafka-connect:latest
    echo "All images successfully pulled from registry"

Summary

Must fix before merge:

Should fix:

Nice to have:

Overall, this is a well-structured workflow with good separation of concerns. The main concerns are around edge cases (manual dispatch) and test reliability. Once the critical issues are addressed, this will be production-ready.

Great work on the comprehensive smoke testing approach! 🎉

@claude
Copy link

claude bot commented Jan 28, 2026

PR Review: GitHub Actions Workflow for GHCR Image Publishing

Overall Assessment

This is a well-structured PR that adds automated Docker image publishing to GHCR. The workflow includes comprehensive smoke tests before publishing, which is excellent. However, there are a few issues that should be addressed.


Critical Issues

1. Race Condition in Server Smoke Test (.github/workflows/publish-images.yml:46-49)

The smoke test checks if the container is running with docker ps | grep -q server-smoke-test, but this creates a race condition. If the container crashes immediately after the grep succeeds but before the log check, the test could miss the failure.

Recommendation:

# Check container health first, then logs
if ! docker inspect server-smoke-test --format='{{.State.Running}}' | grep -q true; then
  echo "Server container exited unexpectedly"
  docker logs server-smoke-test
  exit 1
fi

2. Incomplete Demo-Data Smoke Test (.github/workflows/publish-images.yml:114-122)

The demo-data smoke test only verifies that the container starts but times out after 5 seconds. This doesn't verify that the connector actually works - it only checks that it fails in an expected way (Kafka connection error). A successful start that immediately exits would also pass this test.

Recommendation:

# Run in background and check exit code
docker run -d --name demo-test demo-data:test coinbase
sleep 2
# Should still be running (waiting for Kafka)
if ! docker inspect demo-test --format='{{.State.Running}}' | grep -q true; then
  docker logs demo-test
  exit 1
fi
docker rm -f demo-test

Code Quality Issues

3. Duplicate Version Logic (.github/workflows/publish-images.yml:193-199, 210-216, 227-233)

The version tag extraction logic is duplicated three times. This violates DRY principles and makes maintenance harder.

Recommendation:
Add a step before the publish steps to extract version once:

- name: Determine version tags
  id: version
  run: |
    if [[ "$GITHUB_REF_TYPE" == "tag" ]]; then
      VERSION="${GITHUB_REF_NAME#v}"
    else
      VERSION="dev-${GITHUB_SHA::7}"
    fi
    echo "version=${VERSION}" >> $GITHUB_OUTPUT
    echo "sha=${GITHUB_SHA::7}" >> $GITHUB_OUTPUT

Then use ${{ steps.version.outputs.version }} in subsequent steps.

4. Hardcoded Image Names

The workflow uses hardcoded image names like ghcr.io/typestreamio/server. Consider using environment variables at the top of the workflow for easier maintenance.

Recommendation:

env:
  REGISTRY: ghcr.io
  IMAGE_PREFIX: ghcr.io/typestreamio

Security Considerations

5. GITHUB_TOKEN Permissions

Good: The workflow correctly requests only contents: read and packages: write permissions. This follows the principle of least privilege.

6. Image Artifact Retention

The artifacts are retained for only 1 day, which is appropriate for build artifacts. However, consider whether this is sufficient for debugging failed releases.


Performance Considerations

7. Parallel Job Execution

Excellent: The build-and-test jobs run in parallel, which is efficient. The publish job correctly waits for all tests via needs:.

8. Docker Layer Caching

Good: Demo-data and kafka-connect builds use GitHub Actions cache (cache-from: type=gha), which will speed up subsequent builds.

9. Image Compression

The images are compressed with gzip before upload, which is good. However, consider using zstd compression for faster decompression:

docker save image:tag | zstd -T0 > /tmp/image.tar.zst

Best Practices

10. Dockerfile Context Fix

The fix to docker/Dockerfile.kafka-connect (lines 75-77) correctly updates the COPY paths to account for the repo root build context. This was necessary and properly documented.

11. Missing Workflow Timeout Protection

While individual jobs have timeouts (20-30 min), the overall workflow could theoretically run indefinitely if jobs keep getting queued.

Recommendation:
Consider adding timeout-minutes at the workflow level or reducing individual job timeouts to 15 minutes.

12. No Rollback Strategy

If a bad image is published with the latest tag, there's no automated rollback mechanism. Consider:

  • Publishing with SHA tags first
  • Only updating latest after additional validation
  • Keeping the previous latest as previous-latest for quick rollback

Testing Gaps

13. Integration Between Images

The workflow tests each image independently but doesn't verify they work together (e.g., kafka-connect + server). Consider adding an integration test job that runs all three containers together.

14. Missing Test: Image Size

Consider adding a check for image size to catch bloat:

SIZE=$(docker image inspect image:tag --format='{{.Size}}' | awk '{print $1/1024/1024}')
if (( $(echo "$SIZE > 500" | bc -l) )); then
  echo "Warning: Image larger than 500MB"
fi

Documentation

15. Missing Documentation

The PR should document:

  • Where published images can be found (https://github.com/orgs/typestreamio/packages)
  • How to manually trigger a release
  • How to use the published images (update docker-compose.yml examples)
  • Rollback procedures if a bad release is published

Minor Issues

16. Inconsistent Quoting

Some bash variables use quotes, others don't. Be consistent for safety:

docker tag demo-data:test "ghcr.io/typestreamio/demo-data:${VERSION}"

17. Error Messages Could Be More Helpful

The smoke tests print logs on failure, but could benefit from more context:

echo "ERROR: Server failed to start within 30 seconds"
echo "Last 50 lines of logs:"
docker logs --tail 50 server-smoke-test

Summary

Strengths:

  • Well-structured workflow with proper separation of concerns
  • Comprehensive smoke tests before publishing
  • Proper GHCR authentication and permissions
  • Good use of parallel execution
  • Appropriate Docker layer caching

Must Fix:

  1. Server smoke test race condition
  2. Demo-data smoke test reliability

Should Fix:
3. Duplicate version logic (DRY violation)
4. Add version extraction step
5. Add integration tests between images

Nice to Have:

  • Improved error messages
  • Image size checks
  • Better documentation
  • Rollback strategy

This is solid work overall. The smoke tests are a great addition that will catch issues before they reach production. Once the critical issues are addressed, this will be production-ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant