This repository contains several GitHub Actions workflows that automate various tasks related to issue management, data collection, and repository maintenance. Below is a detailed explanation of each workflow.
Generated datasets are stored in the "dataset" repository. This includes datasets with multiple files. The workflow for generating datasets is as follows:
- Individual dataset items are generated using the "Update Issue dataset data" workflow, which creates a pull request with the generated file to the "dataset" repository.
- Multiple dataset items can be generated at once using the "Generate Dataset Data" workflow, which processes issues matching a GitHub search query.
- Dataset items are automatically generated by a scheduled task in the "Generate Dataset Data" workflow for issues matching the query "-label:Epic label:Verified".
- Created pull requests in the "dataset" repository must be verified by users and merged into the main branch.
- The "Export Dataset" workflow collects already generated dataset items into one file and creates a pull request with the resulting dataset to the "ee-dataset" repository.
File: .github/workflows/update-issue-data.yml
Purpose: Generates a single benchmark dataset item from issue commits and creates a pull request to the "dataset" repository.
Trigger: Manual (workflow_dispatch)
Inputs:
generator: Profile to use (default: 'java')organization: GitHub organization name (default: 'dpaia')repository: Repository nameissue_id: Issue numberauto_merge: Automatically merge data updates (default: false)
Description: This workflow generates a single benchmark dataset item from issue commits. It extracts data from the specified issue, retrieves commit information, and generates structured benchmark data. The generated dataset item is stored in the "dataset" repository through a pull request, which must be verified by users and merged into the main branch. This workflow is used for generating individual dataset items one at a time.
File: .github/workflows/generate-dataset-data.yml
Purpose: Generates dataset items for multiple issues matching a GitHub search query and creates pull requests to the "dataset" repository.
Trigger: Manual (workflow_dispatch) or Scheduled (daily at 2:00 AM UTC)
Inputs:
organization: GitHub organization name (default: 'dpaia')topic: Repositories topic (default: 'Java')generator: Profile to use (default: 'java')search_query: GitHub issue search query (default: '-label:Epic label:Verified')update: Update mode (options: create-new, update-outdated, force-update)auto_merge: Automatically merge data updates (default: false)
Description: This workflow automates the generation of multiple dataset items by processing issues that match a specified search query. It can create new dataset items, update outdated ones, or force update all matching issues. The workflow runs automatically on a daily schedule with the default search query "-label:Epic label:Verified", generating dataset items that must be verified by users and merged into the main branch of the "dataset" repository. This workflow is ideal for batch processing multiple issues at once.
Update Mode Options:
create-new: Processes only issues that haven't been marked as "Done" in the project. This option is useful for generating dataset items for newly added issues without modifying existing ones.update-outdated: Checks if the latest commit for an issue matches the commit stored in the project. If they don't match or if the commit field is empty, it processes the issue to update the dataset. This option is useful for updating dataset items when the underlying issue has been updated with new commits.force-update: Processes all issues matching the search query regardless of their current status or commit information. This option is useful when you need to regenerate all dataset items, such as after making changes to the data generation process.
File: .github/workflows/export-dataset.yml
Purpose: Collects already generated dataset items into one file and creates a pull request with the resulting dataset to the "ee-dataset" repository.
Trigger: Manual (workflow_dispatch)
Inputs:
organization: GitHub organization name (default: 'dpaia')search_query: GitHub issue search query (default: '-label:Epic label:Verified')output_file: Export file name (default: "dataset.json")datasets_repository: Repository for exported dataset (default: "dpaia/ee-dataset")create_pull_request: Create a pull request to the result datasets repository (default: false)
Description: This workflow aggregates already generated dataset items from the "dataset" repository into a comprehensive dataset file. It searches for issues matching the provided query, retrieves the data for each issue from the "Data" field, and combines all the individual issue data into a single JSON file. The workflow then creates a pull request to the "ee-dataset" repository with the resulting dataset. This workflow is the final step in the dataset generation process, collecting and organizing the benchmark data generated by the "Update Issue dataset Data" and "Generate Dataset Data" workflows into a unified dataset that can be used for software engineering benchmarks, analysis, reporting, or machine learning purposes.
File: .github/workflows/sync-labels.yml
Purpose: Synchronizes GitHub issue labels across multiple repositories.
Trigger: Manual (workflow_dispatch)
Inputs:
profiles: Comma-separated list of label profiles (e.g., common,spring)topics: Repository topics to filter byrepositories: Optional specific repositories to target
Description: This workflow helps maintain consistent issue labels across multiple repositories in the organization. It can target repositories based on topics or a specific list, and apply different label profiles to them.
File: .github/workflows/add-issues-to-project.yml
Purpose: Automatically adds GitHub issues to a specified project board.
Trigger: Manual (workflow_dispatch) or Scheduled (daily at midnight UTC)
Inputs:
organization: GitHub organization name (default: 'dpaia')project_number: Project number (default: '2')search_query: GitHub issue search query (default: '-label:Epic is:issue')
Description: This workflow automates the process of adding GitHub issues to a project board. It searches for issues matching the specified query and adds them to the designated project. The workflow consists of three main jobs: finding issues that match the search criteria, retrieving the project data, and adding each matching issue to the project.
File: .github/workflows/share-custom-workflows.yml
Purpose: Shares GitHub workflow files between repositories by creating pull requests.
Trigger: Manual (workflow_dispatch)
Inputs:
organization: GitHub organization name (default: 'dpaia')topic: Repository topic filter (default: empty, which means all repositories)workflow_path: Path to GitHub workflow file to share (relative to shared/)
Description: This workflow automates the process of sharing GitHub workflow files across multiple repositories within an organization. It finds all repositories matching the specified organization and topic criteria, then creates pull requests to add the specified workflow file to each repository. The workflow consists of three main jobs: finding repositories that match the criteria, creating pull requests for each repository, and summarizing the results.
The workflow handles various scenarios gracefully:
- If the workflow file already exists in a target repository, it skips creating a pull request
- If a pull request already exists for the workflow file, it uses the existing PR URL
- If there's an error creating a pull request, it captures the error and continues with other repositories
After completion, the workflow generates a summary report that categorizes the results into "Pull Requests Created", "Repositories with No Changes Needed", and "Failed Pull Requests", making it easy to see the outcome at a glance. This workflow is particularly useful for maintaining consistent CI/CD processes across multiple repositories in an organization.
File: .github/workflows/shared-collect-process-tests.yml
Purpose: Collects and processes test information from issues to identify tests that should change from FAIL to PASS and tests that should remain PASS.
Trigger: Reusable workflow (workflow_call)
Inputs:
issue-number: Optional issue number to extract test names from (default: '')
Secrets:
github-token: Required GitHub token for API access
Outputs:
fail_to_pass: Comma-separated list of tests that should change from FAIL to PASSpass_to_pass: Comma-separated list of tests that should remain PASStests: Comma-separated list of all tests to runcomment_id: ID of the comment where FAIL_TO_PASS or PASS_TO_PASS was manually described
Description: This reusable workflow collects test names from issues and processes them to identify tests that should change from FAIL to PASS and tests that should remain PASS. It consists of several jobs: collecting issue numbers based on event type, extracting test names from issues, combining test results from all issues, and checking if FAIL_TO_PASS or PASS_TO_PASS were found. This workflow is designed to be called by other workflows, such as the Shared Run Tests Maven workflow.
File: .github/workflows/shared-run-tests-maven.yml
Purpose: Runs Maven tests for a project, focusing on tests that should change from FAIL to PASS and tests that should remain PASS.
Trigger: Reusable workflow (workflow_call)
Inputs:
java-version: Java version to set up (default: '24')distribution: Java distribution to use (default: 'temurin')pom-file: Path to the pom.xml file (default: 'pom.xml')issue-number: Issue number to extract test names from (default: '')
Secrets:
github-token: Required GitHub token for API access
Description: This reusable workflow runs Maven tests for a project, focusing on tests that should change from FAIL to PASS and tests that should remain PASS. It uses the Shared Collect and Process Tests workflow to collect and process tests, creates a placeholder comment on the issue, sets up Java/Maven and runs the tests, and updates the issue comment with the final test status. This workflow is designed to be called by other workflows that need to run Maven tests.
File: shared/.github/workflows/maven.yml
Purpose: Runs Maven tests for a project, focusing on tests that should change from FAIL to PASS and tests that should remain PASS.
Trigger:
- Push to branches: main, scenario/, eval/, feature/*
- Pull requests to branches: main, scenario/, eval/, feature/*
- Issue comments (created)
Description: This shared workflow is designed to be distributed to other repositories using the "Share Custom Workflows" workflow. It runs Maven tests for a project, focusing on tests that should change from FAIL to PASS and tests that should remain PASS. The workflow collects and processes test information from issues, creates a placeholder comment on the issue, sets up Java/Maven and runs the tests, and updates the issue comment with the final test status. Unlike the "Shared Run Tests Maven" workflow, this is a standalone workflow rather than a reusable workflow, making it easier to share across repositories.