Skip to content

GitHub Issues #35

@ncoop57

Description

@ncoop57

GitHub Issues

Dataset URL - here

Does the dataset exists in a scraped format ?
URL if Yes - here
Only for HF datasets repository

Description

GitHub Issues are bug reports, feature requests, and discussions related to a repository. It contains text in a special GitHub markdown format and contains comments and reactions.

Procedure

We can use the procedure discuss in this blog post, which outlines how to do it for a specific repository. We just need to apply the exact same procedure, but for multiple repositories.

Tests

Include a dummy_dataset.parquet file to test your code against. This dummy_dataset should include the columns for the data and metadata associated with the dataset, which will then be converted into the final format for language model consumption, along with an example row or rows that you can verify your code correctly collects. In addition to this file, include the unit test that evaluates your code against this dummy_dataset.

Give an example of the columns and data:

issue_post comments authors reactions
issue_text [comment_1, comment_2, ...] [issue_author, comment_1_author, comment_2_author, ...] [[reactions], [reactions], ...]

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataset-requestRequest for addition of new dataset

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions