Skip to content

Read the docs dataset from scratch#518

Open
LawrenceBorst wants to merge 2 commits intomasterfrom
read-the-docs-dataset-from-scratch
Open

Read the docs dataset from scratch#518
LawrenceBorst wants to merge 2 commits intomasterfrom
read-the-docs-dataset-from-scratch

Conversation

@LawrenceBorst
Copy link
Member

Add a bit of documentation on how a project might be made from scratch, kind of follows the Diataxis "how-to" approach.

I have to teach Camila how to use ChildProject and maybe it's best to have this page available to everyone. The need for a page like this was clear to me at LFRAZ. Still in draft

The intention of this page is for research teams outside of LAAC. I'm guessing a typical research team will want to set up the dataset, have segmentations, conversational data, gold standard annotations, and benchmarking to know how well the data fares.

Nice to have, but optional:

  • An example of a script for ChildProject
  • Adding human annotations
  • Show a bit of Datalad as well
  • Model benchmarking

Also added a CONTRIBUTOR file. Quite useful and I needed a place to store how the docs are being generated.

@LawrenceBorst LawrenceBorst marked this pull request as ready for review October 13, 2025 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant