Skip to content

Conversation

@MagnusSandgren
Copy link
Collaborator

@MagnusSandgren MagnusSandgren commented Dec 12, 2025

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 12, 2025

📝 Walkthrough

Walkthrough

Adds an EF Core migration that creates a tsvector aggregate and applies a new version (V3) of the search."VDialogDocument" SQL view on Up and reverts to V2 on Down; the V3 view builds weighted, language-aware tsvector documents with truncation and token cleanup.

Changes

Cohort / File(s) Summary
Migration
src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Migrations/20251212140020_CapSearchVectorSize.cs
Adds public partial class CapSearchVectorSize : Migration with Up(MigrationBuilder) loading and executing SQL scripts for creating the tsvector aggregate and applying Dialog/Search/View.VDialogDocument.V3.sql, and Down(MigrationBuilder) loading and executing Dialog/Search/View.VDialogDocument.V2.sql and dropping the public.tsvector_agg(tsvector) aggregate via MigrationSqlLoader.
SQL View Definition (V3)
src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/Dialog/Search/View.VDialogDocument.V3.sql
Adds/defines search."VDialogDocument" view that aggregates weighted tsvectors from related VDialogContent rows using language-specific tsconfigs with fallback to 'simple', concatenates weighted vectors, truncates the result to 1,048,575 characters, and cleans trailing partial tokens with REGEXP_REPLACE; projects DialogId and Party.
SQL Aggregate
src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/Dialog/Search/Aggregate.TsVector_Agg.sql
Adds CREATE OR REPLACE AGGREGATE public.tsvector_agg(tsvector) using SFUNC = pg_catalog.tsvector_concat, STYPE = tsvector, and INITCOND = ''.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify migration loads the correct SQL file paths and that the SQL files exist in the expected locations.
  • Confirm the aggregate creation is idempotent and compatible with existing database objects.
  • Review language-selection logic and truncation/REGEXP_REPLACE for edge cases and performance.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is largely incomplete, missing most required template sections including detailed description, verification checklist, and documentation updates. Expand description to include detailed changes, mark verification checklist items, and clarify documentation status. Link to issue #3100 with more context about the fix.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: capping overly large dialog content to avoid indexing errors in search functionality.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/cap-search-vector-length

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/Dialog/Search/View.VDialogDocument.V3.sql (1)

12-12: Document the rationale for the 100,000 character limit.

The truncation to 100,000 characters per content piece addresses the PR objective to cap overly large content. However, since multiple content pieces are aggregated, the total tsvector size could still grow large.

Consider documenting:

  • Why 100,000 characters was chosen
  • Whether aggregate size should also be monitored
  • Any PostgreSQL tsvector size limits that informed this decision
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3985162 and c9f69b0.

⛔ Files ignored due to path filters (1)
  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Migrations/20251212140020_CapSearchVectorSize.Designer.cs is excluded by !**/Migrations/**/*Designer.cs
📒 Files selected for processing (2)
  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Migrations/20251212140020_CapSearchVectorSize.cs (1 hunks)
  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/Dialog/Search/View.VDialogDocument.V3.sql (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.cs

📄 CodeRabbit inference engine (AGENTS.md)

**/*.cs: Use file-scoped namespaces with using directives outside the namespace
Use PascalCase for classes and methods in C#
Use camelCase for variables and parameters in C#
Prefer expression bodies for single-line members in C#
Use var when the type is apparent in C#
Enable nullable reference types and keep entities immutable in C#
Use OneOf for union returns when applicable in C#
All code must compile with TreatWarningsAsErrors=true and pass .NET analyzers

Files:

  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Migrations/20251212140020_CapSearchVectorSize.cs
**/*.{cs,json,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Use 4-space indentation with LF line endings

Files:

  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Migrations/20251212140020_CapSearchVectorSize.cs
🧠 Learnings (3)
📓 Common learnings
Learnt from: elsand
Repo: Altinn/dialogporten PR: 2849
File: src/Digdir.Domain.Dialogporten.Janitor/README.md:35-88
Timestamp: 2025-10-13T18:40:25.376Z
Learning: In the Dialogporten project, when reviewing documentation changes, prefer to maintain consistency with existing formatting conventions in the file, even if unconventional, rather than suggesting formatting changes that are out-of-scope for the PR's main objectives.
📚 Learning: 2025-10-13T08:14:27.518Z
Learnt from: MagnusSandgren
Repo: Altinn/dialogporten PR: 2841
File: src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Repositories/DialogSearchRepository.cs:82-83
Timestamp: 2025-10-13T08:14:27.518Z
Learning: In PostgreSQL SQL queries within the Dialogporten codebase, prefer explicit type casts (e.g., `''::tsvector`) over relying on implicit conversions for clarity and to avoid ambiguity.

Applied to files:

  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/Dialog/Search/View.VDialogDocument.V3.sql
📚 Learning: 2025-10-09T08:50:14.740Z
Learnt from: oskogstad
Repo: Altinn/dialogporten PR: 2841
File: src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Migrations/20251009083408_AddDialogSearch.cs:9-10
Timestamp: 2025-10-09T08:50:14.740Z
Learning: Files in `src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Migrations/*/**` are auto-generated Entity Framework Core migrations and should not be reviewed for formatting or coding style issues.

Applied to files:

  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Migrations/20251212140020_CapSearchVectorSize.cs
🧬 Code graph analysis (1)
src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Migrations/20251212140020_CapSearchVectorSize.cs (1)
src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/MigrationSqlLoader.cs (1)
  • MigrationSqlLoader (5-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Dry run deploy apps / Deploy web-api-so to test
  • GitHub Check: Dry run deploy apps / Deploy service to test
  • GitHub Check: Dry run deploy apps / Deploy web-api-eu to test
  • GitHub Check: Dry run deploy apps / Deploy graphql to test
  • GitHub Check: build / build-and-test
🔇 Additional comments (2)
src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/Dialog/Search/View.VDialogDocument.V3.sql (1)

10-10: Good use of explicit regconfig cast.

The explicit cast to regconfig follows PostgreSQL best practices and matches the coding patterns established in this codebase.

Based on learnings, explicit type casts are preferred in this codebase for clarity.

src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Migrations/20251212140020_CapSearchVectorSize.cs (1)

14-22: Migration structure looks correct.

The Up migration properly loads and executes the V3 SQL script using MigrationSqlLoader, which aligns with the established pattern in this codebase.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70d3935 and c8dd1fa.

📒 Files selected for processing (3)
  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Migrations/20251212140020_CapSearchVectorSize.cs (1 hunks)
  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/Dialog/Search/Aggregate.TsVector_Agg.sql (1 hunks)
  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/Dialog/Search/View.VDialogDocument.V3.sql (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Migrations/20251212140020_CapSearchVectorSize.cs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: MagnusSandgren
Repo: Altinn/dialogporten PR: 2841
File: src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Repositories/DialogSearchRepository.cs:82-83
Timestamp: 2025-10-13T08:14:27.518Z
Learning: In PostgreSQL SQL queries within the Dialogporten codebase, prefer explicit type casts (e.g., `''::tsvector`) over relying on implicit conversions for clarity and to avoid ambiguity.
📚 Learning: 2025-10-13T08:14:27.518Z
Learnt from: MagnusSandgren
Repo: Altinn/dialogporten PR: 2841
File: src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Repositories/DialogSearchRepository.cs:82-83
Timestamp: 2025-10-13T08:14:27.518Z
Learning: In PostgreSQL SQL queries within the Dialogporten codebase, prefer explicit type casts (e.g., `''::tsvector`) over relying on implicit conversions for clarity and to avoid ambiguity.

Applied to files:

  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/Dialog/Search/View.VDialogDocument.V3.sql
  • src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/Dialog/Search/Aggregate.TsVector_Agg.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Dry run deploy apps / Deploy web-api-eu to test
  • GitHub Check: Dry run deploy apps / Deploy service to test
  • GitHub Check: Dry run deploy apps / Deploy graphql to test
  • GitHub Check: Dry run deploy apps / Deploy web-api-so to test
  • GitHub Check: build / build-and-test
🔇 Additional comments (3)
src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/Dialog/Search/Aggregate.TsVector_Agg.sql (1)

1-5: Excellent implementation of tsvector aggregation.

This aggregate correctly implements the pattern recommended in the previous review using pg_catalog.tsvector_concat to preserve position information and weights when aggregating tsvectors across rows.


Note: This implementation directly addresses the previous review concern about using string_agg with text casting.

src/Digdir.Domain.Dialogporten.Infrastructure/Persistence/Sql/Dialog/Search/View.VDialogDocument.V3.sql (2)

6-10: Language-aware search configuration is well implemented.

The use of language-specific regconfig with a fallback to 'simple' ensures proper tokenization and stemming for multilingual content while gracefully handling unmapped languages.


Note: The explicit type casts (::"char" and ::regconfig) align with the codebase preference for clarity.


12-15: Aggregation scope is correctly constrained.

The WHERE c."DialogId" = d."Id" clause properly ensures each dialog's document only includes its own content, and the LEFT JOIN appropriately handles content without mapped language codes.

Comment on lines +1 to +18
-- Produces weighted tsvectors per dialog so upserts remain a simple INSERT ... SELECT.
CREATE OR REPLACE VIEW search."VDialogDocument" AS
SELECT d."Id" AS "DialogId"
, (
SELECT public.tsvector_agg(
SETWEIGHT(
-- Fall back to simple when the language map lacks a match.
TO_TSVECTOR(COALESCE(isomap."TsConfigName", 'simple')::regconfig, c."Value"),
c."Weight"::"char"
)
)
FROM search."VDialogContent" c
LEFT JOIN search."Iso639TsVectorMap" isomap
ON c."LanguageCode" = isomap."IsoCode"
WHERE c."DialogId" = d."Id"
) AS "Document"
, d."Party" AS "Party"
FROM "Dialog" d;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing size capping logic contradicts PR objectives.

The view correctly uses the new tsvector_agg aggregate (addressing the previous review concern), but the PR is titled "Cap overly large dialog content to avoid errors when indexing dialogs" and references issue #3100. However, this implementation contains no size limiting, truncation, or cleanup logic.

The AI summary mentions "truncated to 1,048,575 characters" and "REGEXP_REPLACE cleanup," but neither appears in this code. Without capping, this PR may not resolve the indexing errors it's intended to fix.

Verify whether:

  1. Size capping logic was accidentally omitted from this version
  2. The truncation is applied elsewhere in the pipeline
  3. The summary description is outdated

If truncation is needed, consider wrapping the aggregated document with a length check and substring operation, for example:

, (
    SELECT 
        CASE 
            WHEN LENGTH(doc_text) > 1048575 
            THEN REGEXP_REPLACE(
                SUBSTRING(doc_text, 1, 1048575),
                '\s+\S*$',  -- Remove trailing partial token
                ''
            )::tsvector
            ELSE doc_text::tsvector
        END
    FROM (
        SELECT public.tsvector_agg(
            SETWEIGHT(
                TO_TSVECTOR(COALESCE(isomap."TsConfigName", 'simple')::regconfig, c."Value"),
                c."Weight"::"char"
            )
        )::text AS doc_text
        FROM search."VDialogContent" c
        LEFT JOIN search."Iso639TsVectorMap" isomap
            ON c."LanguageCode" = isomap."IsoCode"
        WHERE c."DialogId" = d."Id"
    ) agg
) AS "Document"

Run this script to check if there are references to truncation or size limits in related migration or configuration files:

#!/bin/bash
# Search for truncation, size limits, or the specific 1048575 limit mentioned in summary
rg -i "1048575|truncat|substring.*tsvector|regexp_replace.*tsvector" --type sql --type cs -C 3

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 8, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant