Skip to content

Conversation

@lpi-tn
Copy link
Collaborator

@lpi-tn lpi-tn commented Jan 8, 2026

This pull request introduces new features and schema changes to support document quantity reporting per source and corpus, as well as the addition of a new column to the corpus table. The main changes are the creation of new materialized views for reporting, the corresponding data models, and the addition of a main_url field to corpus-related data.

Database schema and migration changes:

  • Added a new nullable main_url column to the corpus table in the corpus_related schema, with an Alembic migration for upgrade and downgrade. (welearn_database/alembic/versions/4f5a188dd614_add_main_url_column.py, welearn_database/data/models/corpus_related.py) [1] [2]
  • Created two new materialized views: qty_document_in_qdrant_per_corpus and qty_document_per_corpus in the document_related schema, with an Alembic migration to manage their lifecycle. (welearn_database/alembic/versions/0e0bc0fca384_doc_qty_per_source.py)

Data model updates:

  • Added new read-only ORM models QtyDocumentInQdrantPerCorpus and QtyDocumentPerCorpus to represent the new materialized views for document quantity reporting, including relevant columns and primary keys. (welearn_database/data/models/document_related.py)

Project metadata:

  • Updated the project version in pyproject.toml to 1.1.1 to reflect these changes. (pyproject.toml)

@lpi-tn lpi-tn requested a review from Copilot January 8, 2026 11:06
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces document quantity reporting features per corpus/source by adding new materialized views and expanding the corpus data model with a URL field.

  • Creates two materialized views for tracking document counts per corpus (all documents and Qdrant-indexed documents)
  • Adds a main_url field to the corpus table to store primary URLs for each corpus
  • Updates project version from 1.0.0 to 1.1.1

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
welearn_database/alembic/versions/0e0bc0fca384_doc_qty_per_source.py Creates materialized views for document quantity reporting
welearn_database/alembic/versions/4f5a188dd614_add_main_url_column.py Adds nullable main_url column to corpus table
welearn_database/data/models/document_related.py Defines ORM models for the new materialized views
welearn_database/data/models/corpus_related.py Adds main_url field to Corpus model
pyproject.toml Bumps version to 1.1.1

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

__read_only__ = True

source_name: Mapped[str] = mapped_column(primary_key=True)
count: Mapped[int] = mapped_column(primary_key=True)
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The count column should not be part of the primary key. A count value is an aggregate result that can change when the materialized view is refreshed, making it unsuitable as a primary key component. Consider using only source_name as the primary key, or add a composite key with source_name and a timestamp if uniqueness across refreshes is needed.

Copilot uses AI. Check for mistakes.
__read_only__ = True

source_name: Mapped[str] = mapped_column(primary_key=True)
count: Mapped[int] = mapped_column(primary_key=True)
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The count column should not be part of the primary key. A count value is an aggregate result that can change when the materialized view is refreshed, making it unsuitable as a primary key component. Consider using only source_name as the primary key, or add a composite key with source_name and a timestamp if uniqueness across refreshes is needed.

Copilot uses AI. Check for mistakes.
@lpi-tn lpi-tn merged commit a1b5340 into main Jan 8, 2026
4 checks passed
@lpi-tn lpi-tn deleted the Feature/doc-qty-per-source branch January 8, 2026 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants