Skip to content

fix: Make content_all search work with words containing dashes#1162

Open
eikek wants to merge 7 commits intomainfrom
eikek/fix/1160-search-with-dash
Open

fix: Make content_all search work with words containing dashes#1162
eikek wants to merge 7 commits intomainfrom
eikek/fix/1160-search-with-dash

Conversation

@eikek
Copy link
Member

@eikek eikek commented Jan 8, 2026

Changes the text fields content_all, name and description configuration. It uses a simple whitespace tokenizer and does the more complex splitting via the wordDelimiterGraphFilter (docs).

This adds more variants of concatenated and splitted phrases to the index (for example splits camelCase and hyphens but also includes the concatenated and original version).

While testing I noticed that the reindexing after a migration requiring it wouldn't happen anymore. I tried a lot of things but couldn't pass data from the main_process_start handler to the after_server_start handler - but only in the latter it is possible to submit tasks. This change now writes a temp file to communicate across these hooks.

/deploy extra-values=enableInternalGitlab=false

@RenkuBot
Copy link
Contributor

RenkuBot commented Jan 8, 2026

You can access the deployment of this PR at https://renku-ci-ds-1162.dev.renku.ch

@eikek eikek force-pushed the eikek/fix/1160-search-with-dash branch 3 times, most recently from 7f2f13b to 2cab73b Compare January 9, 2026 15:38
@eikek eikek marked this pull request as ready for review January 9, 2026 15:38
@eikek eikek requested review from a team, SalimKayal and sgaist as code owners January 9, 2026 15:38
@eikek eikek linked an issue Jan 9, 2026 that may be closed by this pull request
@coveralls
Copy link

Pull Request Test Coverage Report for Build 20951512889

Details

  • 3 of 3 (100.0%) changed or added relevant lines in 2 files are covered.
  • 3 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.07%) to 86.325%

Files with Coverage Reduction New Missed Lines %
components/renku_data_services/connected_services/core.py 1 81.08%
components/renku_data_services/crc/core.py 2 79.59%
Totals Coverage Status
Change from base Build 20856172354: 0.07%
Covered Lines: 24461
Relevant Lines: 28336

💛 - Coveralls

@eikek eikek force-pushed the eikek/fix/1160-search-with-dash branch from df44217 to 75112e0 Compare January 13, 2026 17:16
@olevski olevski self-requested a review January 14, 2026 09:44
olevski
olevski previously approved these changes Jan 14, 2026
@eikek eikek force-pushed the eikek/fix/1160-search-with-dash branch 2 times, most recently from 57d7a30 to 5346e09 Compare January 15, 2026 15:57
@eikek eikek requested a review from olevski January 15, 2026 16:11
eikek added 7 commits January 20, 2026 10:21
WIP: the re-indexing after migration on start doesn't work anymore
Using app.ctx doesn't work anymore, as any other memory-variant I tried.
It is not needed and I couldn't find it in the documentation. It now
is constistent with the other calls.
- on reprovision, first do migration, then delete+insert so the
correct schema is ensured
- remove the fuzzy operator which doesn't play well with multiple
tokens in a query
- don't split words on numbers, retaining those unique words like
`a56bd3e` used in our tests
@eikek eikek force-pushed the eikek/fix/1160-search-with-dash branch from 5346e09 to 16de543 Compare January 20, 2026 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Projects with dashes in their name not found via generic search

4 participants