-
Notifications
You must be signed in to change notification settings - Fork 42
adds include_vector to .query() #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds include_vector to .query() #104
Conversation
|
@olirice what do you think? |
|
nice feature... getting vectors back would be great and matches with the behavior of most other vector db libs. right now I have to re-query the point ids again to get the vectors back |
olirice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give me an example of where its important to get the actual vector back?
src/vecs/collection.py
Outdated
| filters: Optional[Dict] = None, | ||
| measure: Union[IndexMeasure, str] = IndexMeasure.cosine_distance, | ||
| include_value: bool = False, | ||
| include_vector: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new arguments have to go at the end of the args list but before kwargs. Otherwise, the meaning of
bar.query(
data=query_vec,
limit=top_k,
filters=None,
measure="cosine_distance",
True,
True
)
would change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use case - say we have the financial report of a company, and we vectorize the doc into N chunks. From here, one simple way to construct a vector representation of the company is to take the average of these N chunks, for which we need the query to return the raw vectors themselves.
|
to get the pre-commit hooks to pass you can run then commit the changes it makes and push that back up |
|
Oops, that's my bad. Linted and pushed. |
|
Just bumping this -- I'm pretty sure it's ready to merge :) |
|
Could you please update the corresponding docs for the query function? |
What kind of change does this PR introduce?
Feature: adds a
include_vectorargument to thecollections.query()method. Defaults false, but it'll return the actual vector along with theids.What is the current behavior?
Current behavior is you can't get back the vectors along with the
idsin the response. See the issue I made.What is the new behavior?
You can get back the vector itself.
Additional context
Ideally I'd actually re-order some of the
include_*on the query method. I think the best ordering is:include_vector,include_metadata,include_score.Because that way if you want to get everything back, you still get the (id, vec, metadata) in the same order as the original record, just with
scoreappended. However, I think that'd technically be a breaking change because of re-ordering returns.Idk, I don't feel that strongly about it.