Skip to content

Conversation

@Camier
Copy link
Member

@Camier Camier commented Nov 14, 2025

Summary

  • Reframe the Oct 27 practice report to note that it only validated workflow rehearsal and not retrieval quality
  • Link to the Nov 5 validation README and summarize the flaws that invalidated the 10-query dataset (undersized corpus, page-based chunking, circular ground truth, NaN-prone metrics)
  • Direct readers to the updated evaluation roadmap and explicitly document what, if anything, from the session remains useful

Testing

  • Not run (documentation-only change)

Codex Task

Copilot AI review requested due to automatic review settings November 14, 2025 19:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the Phase 2b Day 1 practice report to clarify that the October 27 RAGAs validation was only a workflow rehearsal, not a production-quality evaluation. The update acknowledges that the dataset used was later invalidated due to fundamental methodology flaws discovered on November 5, 2025.

  • Reframes the practice session as a workflow/infrastructure rehearsal rather than a validation of retrieval quality
  • Documents the specific flaws that invalidated the dataset (undersized corpus, page-based chunking, circular ground truth, NaN-prone metrics)
  • Clarifies what remains valuable from the session (workflow validation, cost/latency measurements, team readiness) while discarding the metrics

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants