feat(solr): add exact-match field and ranking results #417

PouyaMohseni · 2025-11-13T21:47:40Z

In feature requests: ranking, fuzzy matching #407, the ranking and fuzzy matching of search results were discussed. The system previously used n-gram matching to compare the query with language labels.
This adds exact matching which, in contrast to n-gram matching, requires an exact match of the term.
In the final results, exact matches are weighted twice as strongly as n-gram matches.

dchiller · 2025-11-14T13:23:34Z

Can you explain how in the PR description? The solution discussed in the issue doesn't seem to be the solution you went with here...that's fine! But you should briefly discuss the steps here.

dchiller

Can you say more about why we are adding this? What problem does this solve?

I see in the linked issue that both fuzzy matching and ranking are discussed. It looks to me like this is more related to the ranking than the fuzzy matching. Is that right?

What is the type of ranking that we are hoping to see? And how does this approach get us towards that ranking?

For example, I see in the issue that we want a search of tar to return more the instrument with the name "tar" over the ones with the name "guitar". I imagine your solution will achieve this result. But what if I search "ta"? Do I still want "tar" to come up before "guitar" in the results? Will this achieve that? Would something like Solr's Edge N-gram Tokenizer be more what we want?

I'm not sure that that's exactly what we want, but it certainly seems to me like a case where "tar" returns "tar" over "guitar", but "ta" doesn't isn't necessarily what we want.

dchiller · 2025-11-19T21:04:45Z

solr/cores/conf/schema.xml

    <field name="text" type="text_general" indexed="true" stored="true" multiValued="true"/>

+    <!-- exact match field -->
+    <field name="text_exact" type="text_vector" indexed="true" stored="true" multiValued="true"/>


Does this field really need to be stored? Are we ever going to use its value? Is it really a multiValued field?

I think both text (text_ngram) and text_exact should not be stored but are multiValued.

I think both text (text_ngram) and text_exact should not be stored but are multiValued.

Agreed about not storing them. Not sure about multiValued though... what are the multiple values?

As far as I can see, both are creating by coping a number of other fields in to this field and indexing the result...

I believe text fields, particularly those that are used as targets for copyfields, should generally be multivalued fields. As fields are copied into the text field I believe they are kept as a distinct value in that field, and not simply appended as one big string.

See: https://solr.apache.org/guide/solr/latest/indexing-guide/copy-fields.html

"In the example above, if the text destination field has data of its own in the input documents, the contents of the cat field will be added as additional values – just as if all of the values had originally been specified by the client. Remember to configure your fields as multivalued="true" if they will ultimately get multiple values (either from a multivalued source or from multiple copyField directives)."

- remove unused fieldType - add and weight text_exact in \select compared to text_ngram - moved wikidata_id_s to text_exact from text_ngram

PouyaMohseni · 2025-12-04T17:52:34Z

Here, exact_match gives higher weight to queries that match the labels or aliases exactly, without changing with the overall matching and ranking performed by the n-gram. For example, ta results in a higher rank for guitar, as before.

solr/cores/conf/solrconfig.xml

dchiller · 2025-12-10T18:57:43Z

solr/cores/conf/schema.xml

-    <fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100">
+
+    <!-- Exact match text field with term vectors -->
+    <fieldType name="text_vector" class="solr.TextField" positionIncrementGap="100" termVectors="true">


What's the purpose of termVectors = true here? What is our current use-case for them?

Thank you. I removed that. It was originally used for experimenting with highlighting matched parts, and I forgot to clear it.

kyrieb-ekat · 2026-01-07T21:09:06Z

quick note about the failed E2E test, which looks like Google Translate isn't able to find the element .nav-link:has-text("À propos"), resulting in timeouts when checking its visibility. Is there an actual navigation link for "Á propos" when the site language is in French? Is is an encoding thing with the language switch, with the accent present?

yinanazhou · 2026-01-08T00:18:41Z

I've updated the E2E test in another PR. Should not be a problem after the changes get merged.

feat(solr): add exact-match field and ranking results

cc989fc

dchiller reviewed Nov 19, 2025

View reviewed changes

PouyaMohseni marked this pull request as draft December 2, 2025 17:34

fix: enhance Solr exact match

be3d0cc

- remove unused fieldType - add and weight text_exact in \select compared to text_ngram - moved wikidata_id_s to text_exact from text_ngram

PouyaMohseni marked this pull request as ready for review December 4, 2025 17:52

ahankinson reviewed Dec 5, 2025

View reviewed changes

solr/cores/conf/solrconfig.xml Outdated Show resolved Hide resolved

PouyaMohseni requested a review from dchiller December 9, 2025 22:43

dchiller reviewed Dec 10, 2025

View reviewed changes

PouyaMohseni marked this pull request as draft December 17, 2025 17:11

fix: update qf configuration and remove redundant df setting

52b7b5f

PouyaMohseni force-pushed the solr-rank branch from 2091127 to 52b7b5f Compare December 17, 2025 18:02

PouyaMohseni marked this pull request as ready for review December 17, 2025 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(solr): add exact-match field and ranking results #417

feat(solr): add exact-match field and ranking results #417

Uh oh!

PouyaMohseni commented Nov 13, 2025 •

edited

Loading

Uh oh!

dchiller commented Nov 14, 2025

Uh oh!

dchiller left a comment

Uh oh!

dchiller Nov 19, 2025

Uh oh!

PouyaMohseni Dec 4, 2025

Uh oh!

dchiller Dec 4, 2025

Uh oh!

ahankinson Dec 5, 2025

Uh oh!

PouyaMohseni commented Dec 4, 2025

Uh oh!

Uh oh!

dchiller Dec 10, 2025

Uh oh!

PouyaMohseni Dec 17, 2025

Uh oh!

kyrieb-ekat commented Jan 7, 2026

Uh oh!

yinanazhou commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

feat(solr): add exact-match field and ranking results #417

Are you sure you want to change the base?

feat(solr): add exact-match field and ranking results #417

Uh oh!

Conversation

PouyaMohseni commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dchiller commented Nov 14, 2025

Uh oh!

dchiller left a comment

Choose a reason for hiding this comment

Uh oh!

dchiller Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

PouyaMohseni Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

dchiller Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ahankinson Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

PouyaMohseni commented Dec 4, 2025

Uh oh!

Uh oh!

dchiller Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

PouyaMohseni Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

kyrieb-ekat commented Jan 7, 2026

Uh oh!

yinanazhou commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

PouyaMohseni commented Nov 13, 2025 •

edited

Loading