-
Notifications
You must be signed in to change notification settings - Fork 2
feat(solr): add exact-match field and ranking results #417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Can you explain how in the PR description? The solution discussed in the issue doesn't seem to be the solution you went with here...that's fine! But you should briefly discuss the steps here. |
dchiller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more about why we are adding this? What problem does this solve?
I see in the linked issue that both fuzzy matching and ranking are discussed. It looks to me like this is more related to the ranking than the fuzzy matching. Is that right?
What is the type of ranking that we are hoping to see? And how does this approach get us towards that ranking?
For example, I see in the issue that we want a search of tar to return more the instrument with the name "tar" over the ones with the name "guitar". I imagine your solution will achieve this result. But what if I search "ta"? Do I still want "tar" to come up before "guitar" in the results? Will this achieve that? Would something like Solr's Edge N-gram Tokenizer be more what we want?
I'm not sure that that's exactly what we want, but it certainly seems to me like a case where "tar" returns "tar" over "guitar", but "ta" doesn't isn't necessarily what we want.
solr/cores/conf/schema.xml
Outdated
| <field name="text" type="text_general" indexed="true" stored="true" multiValued="true"/> | ||
|
|
||
| <!-- exact match field --> | ||
| <field name="text_exact" type="text_vector" indexed="true" stored="true" multiValued="true"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this field really need to be stored? Are we ever going to use its value? Is it really a multiValued field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both text (text_ngram) and text_exact should not be stored but are multiValued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both text (text_ngram) and text_exact should not be stored but are multiValued.
Agreed about not storing them. Not sure about multiValued though... what are the multiple values?
As far as I can see, both are creating by coping a number of other fields in to this field and indexing the result...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe text fields, particularly those that are used as targets for copyfields, should generally be multivalued fields. As fields are copied into the text field I believe they are kept as a distinct value in that field, and not simply appended as one big string.
See: https://solr.apache.org/guide/solr/latest/indexing-guide/copy-fields.html
"In the example above, if the text destination field has data of its own in the input documents, the contents of the cat field will be added as additional values – just as if all of the values had originally been specified by the client. Remember to configure your fields as multivalued="true" if they will ultimately get multiple values (either from a multivalued source or from multiple copyField directives)."
- remove unused fieldType - add and weight text_exact in \select compared to text_ngram - moved wikidata_id_s to text_exact from text_ngram
|
Here, |
solr/cores/conf/schema.xml
Outdated
| <fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100"> | ||
|
|
||
| <!-- Exact match text field with term vectors --> | ||
| <fieldType name="text_vector" class="solr.TextField" positionIncrementGap="100" termVectors="true"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of termVectors = true here? What is our current use-case for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I removed that. It was originally used for experimenting with highlighting matched parts, and I forgot to clear it.
2091127 to
52b7b5f
Compare
|
quick note about the failed E2E test, which looks like Google Translate isn't able to find the element .nav-link:has-text("À propos"), resulting in timeouts when checking its visibility. Is there an actual navigation link for "Á propos" when the site language is in French? Is is an encoding thing with the language switch, with the accent present? |
|
I've updated the E2E test in another PR. Should not be a problem after the changes get merged. |
Uh oh!
There was an error while loading. Please reload this page.