Skip to content

Add GitHub Action for automated schema validation #18

@MALathon

Description

@MALathon

Summary

Create a GitHub Action workflow that periodically validates all schemas and creates issues for broken ones.

Workflow Design

# .github/workflows/validate-schemas.yml
name: Validate Schemas

on:
  schedule:
    - cron: '0 0 * * 0'  # Weekly on Sunday at midnight
  workflow_dispatch:  # Allow manual trigger
  push:
    paths:
      - 'fetcharoo/schemas/**'  # Run on schema changes

jobs:
  validate:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      
      - name: Install dependencies
        run: |
          pip install -e .
          pip install pytest
      
      - name: Validate schemas
        id: validate
        run: |
          fetcharoo --validate-schemas --validation-output json > validation-results.json
          echo "results=$(cat validation-results.json)" >> $GITHUB_OUTPUT
        continue-on-error: true
      
      - name: Check for broken schemas
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const results = JSON.parse(fs.readFileSync('validation-results.json', 'utf8'));
            
            const broken = Object.entries(results)
              .filter(([name, health]) => health.status === 'broken');
            
            if (broken.length === 0) {
              console.log('All schemas healthy!');
              return;
            }
            
            // Check for existing issue
            const issues = await github.rest.issues.listForRepo({
              owner: context.repo.owner,
              repo: context.repo.repo,
              labels: 'schema-broken',
              state: 'open'
            });
            
            for (const [name, health] of broken) {
              const title = `Schema broken: ${name}`;
              const existing = issues.data.find(i => i.title === title);
              
              if (existing) {
                // Update existing issue
                await github.rest.issues.createComment({
                  owner: context.repo.owner,
                  repo: context.repo.repo,
                  issue_number: existing.number,
                  body: `Schema still broken as of ${new Date().toISOString()}:\n\`\`\`\n${health.error || 'No PDFs found'}\n\`\`\``
                });
              } else {
                // Create new issue
                await github.rest.issues.create({
                  owner: context.repo.owner,
                  repo: context.repo.repo,
                  title: title,
                  body: `## Schema Validation Failed\n\n**Schema:** ${name}\n**Status:** ${health.status}\n**Error:** ${health.error || 'Found 0 PDFs'}\n**Expected:** ${health.expected_pdfs} PDFs\n\nThis issue was automatically created by the schema validation workflow.`,
                  labels: ['schema-broken', 'automated']
                });
              }
            }
            
            // Fail the workflow if schemas are broken
            if (broken.length > 0) {
              core.setFailed(`${broken.length} schema(s) are broken`);
            }
      
      - name: Upload validation results
        uses: actions/upload-artifact@v4
        with:
          name: schema-validation-results
          path: validation-results.json

Features

  1. Scheduled runs: Weekly validation
  2. Manual trigger: Can run on-demand via workflow_dispatch
  3. PR validation: Runs when schemas are modified
  4. Auto-issue creation: Creates GitHub issues for broken schemas
  5. Issue updates: Comments on existing issues if still broken
  6. Artifact storage: Saves validation results

Labels

Create these labels in the repository:

  • schema-broken: For auto-created schema issues
  • automated: For issues created by workflows

Tasks

  • Create .github/workflows/validate-schemas.yml
  • Create schema-broken and automated labels
  • Test workflow with manual dispatch
  • Document workflow in README or CONTRIBUTING.md
  • Consider adding Slack/Discord notifications

Acceptance Criteria

  • Workflow runs weekly on schedule
  • Can be triggered manually
  • Creates issues for broken schemas
  • Updates existing issues instead of duplicating
  • Fails workflow when schemas are broken
  • Stores results as artifacts

Dependencies

Part of

Parent issue: #10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions