Skip to content

Conversation

@dan-jacobson
Copy link
Contributor

@dan-jacobson dan-jacobson commented Mar 14, 2025

What kind of change does this PR introduce?

Feature: adds a include_vector argument to the collections.query() method. Defaults false, but it'll return the actual vector along with the ids.

What is the current behavior?

Current behavior is you can't get back the vectors along with the ids in the response. See the issue I made.

What is the new behavior?

You can get back the vector itself.

Additional context

Ideally I'd actually re-order some of the include_* on the query method. I think the best ordering is:

include_vector,
include_metadata,
include_score.

Because that way if you want to get everything back, you still get the (id, vec, metadata) in the same order as the original record, just with score appended. However, I think that'd technically be a breaking change because of re-ordering returns.

Idk, I don't feel that strongly about it.

@dan-jacobson
Copy link
Contributor Author

@olirice what do you think?

@wang-sanity
Copy link

nice feature... getting vectors back would be great and matches with the behavior of most other vector db libs. right now I have to re-query the point ids again to get the vectors back

Copy link
Collaborator

@olirice olirice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give me an example of where its important to get the actual vector back?

filters: Optional[Dict] = None,
measure: Union[IndexMeasure, str] = IndexMeasure.cosine_distance,
include_value: bool = False,
include_vector: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new arguments have to go at the end of the args list but before kwargs. Otherwise, the meaning of

bar.query(
        data=query_vec,
        limit=top_k,
        filters=None,
        measure="cosine_distance",
        True,
        True
)

would change

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use case - say we have the financial report of a company, and we vectorize the doc into N chunks. From here, one simple way to construct a vector representation of the company is to take the average of these N chunks, for which we need the query to return the raw vectors themselves.

@olirice olirice self-requested a review March 24, 2025 21:34
@olirice
Copy link
Collaborator

olirice commented Mar 24, 2025

to get the pre-commit hooks to pass you can run

pip install pre-commit
pre-commit install
pre-commit run --all

then commit the changes it makes and push that back up

@dan-jacobson
Copy link
Contributor Author

Oops, that's my bad. Linted and pushed.

@dan-jacobson
Copy link
Contributor Author

Just bumping this -- I'm pretty sure it's ready to merge :)

@olirice
Copy link
Collaborator

olirice commented Apr 8, 2025

Could you please update the corresponding docs for the query function?

@olirice olirice merged commit 240d870 into supabase:main Apr 9, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants