GitHub - aiden-liu/rag-postgres: A RAG play with pgvector, azure doc intelligence and azure openai

Why this repo

I've been building AI related skills lately, like playing around with AI platforms, LLM models, and learning Langchain. Try to build something that usable, so RAG is a good start.

This post is a good instruction of how.

Chucking

This notebook is awesome. And visualisation as well, for different type of chunkers.

Embeddings

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness, and large distances suggest low relatedness.

In this demo, we use Azure OpenAI embedding model text-embedding-ada-002.

Pgvector

For tsvector column in table document_chunk parsing document, see Postgres doc here.

At the same page, see also:

to_tsquery, for parsing queries;
ts_rank, for ranking search results;
ts_headline, for highlighting results;

An example can be found here.

To calculate the distance or similarity between two vectors, pgvector supports below operators:

[vector] <-> [vector]: L2 distance, or Euclidean Distance
[vector] <+> [vector]: L1 distance, or Manhattan Distance, or Taxicab Distance
[vector] <=> [vector]: cosine distance, equals (1 - cosine similariy) where cosine similariy is the cosine value of the angle between two vectors.
[vector] <#> [vector]: inner product, returns negative value from the normal calculation result, since Postgres only supports ASC order index scans on operators.

For understanding vector distance and similarity, this blog is pretty neat.

Questions

How to manage outdated documents?
How to evaluate search results?
How to tune the model, on places like embedding calculate operators, what else?

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.devcontainer		.devcontainer
.vscode		.vscode
libs		libs
models		models
psqls		psqls
services		services
static		static
test		test
.gitignore		.gitignore
ingest_pipeline.py		ingest_pipeline.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Why this repo

Chucking

Embeddings

Pgvector

Questions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

aiden-liu/rag-postgres

Folders and files

Latest commit

History

Repository files navigation

Why this repo

Chucking

Embeddings

Pgvector

Questions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages