A simple example of implementing semantic search using wikipedia as a test corpus.
It uses a postgres database with the pgvector extension and create embeddings with the nomic-embed-text model using ollama.
See corresponding blog post (in norwegian).
Run nmp install
Run npm run dotenv-init
Or just copy the content of the .env.example file to the .env file.
Make sure you have docker and docker-compose installed.
According to Ollama's documentation, the Ollama docker container on Mac doesn't support GPU acceleration, so we will proceed by installing Ollama locally.
Install Ollama: brew install ollama
And start the Ollama server: ollama serve
Then you should download the embedding model we will use: ollama pull nomic-embed-text
docker-compose up -d
Initially you must run the migrations to create the tables in the database.
npm run migrate:up
If something goes wrong, you can rollback the latest migration with: npm run migrate:down.
If something goes even more wrong, you can always stop the database first with docker-compose down and remove it completely with rm -rf db/.pgdata, and then start over.
Create embeddings for 100 random wikipedia articles and store them in the local postgres database.
npm run index 100
Check if there are any rock artists among our indexed data:
npm run search "Rock artist"
Sample output from my 1000 random indexed articles:
> embeddings-node@0.0.1 search
> tsx src/index.ts -s Rock artist
Connecting to database postgres://dev:dev@localhost:5432/dev
Searching for content: Rock artist
Search Results: [
{
resource_link: 'https://en.wikipedia.org/wiki/Dark_Horse_%E2%80%93_A_Live_Collection',
caption: 'Dark Horse – A Live Collection',
cosine_distance: 0.39703900903717815
},
{
resource_link: 'https://en.wikipedia.org/wiki/Phil_Stack',
caption: 'Phil Stack',
cosine_distance: 0.4036332663110951
},
{
resource_link: 'https://en.wikipedia.org/wiki/Russell_B_Jackson',
caption: 'Russell B Jackson',
cosine_distance: 0.4270296447349413
},