From 98c5bf3d38e213f23ba221ea17357fc55ddc96e4 Mon Sep 17 00:00:00 2001 From: Chris Pietschmann Date: Sun, 26 Oct 2025 11:29:25 -0400 Subject: [PATCH 1/6] Update README to emphasize local database feature Clarify that Build5Nines SharpVector is a local text vector database for .NET applications. --- README.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 665d328..38cfb72 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# Build5Nines SharpVector - The lightweight, in-memory, Semantic Search, Text Vector Database for any C# / .NET Applications +# Build5Nines SharpVector - The lightweight, in-memory, local, Semantic Search, Text Vector Database for any C# / .NET Applications `Build5Nines.SharpVector` is an in-memory vector database library designed for .NET applications. It allows you to store, search, and manage text data using vector representations. The library is customizable and extensible, enabling support for different vector comparison methods, preprocessing techniques, and vectorization strategies. @@ -14,7 +14,9 @@ Vector databases are used with Semantic Search and [Generative AI](https://build5nines.com/what-is-generative-ai/?utm_source=github&utm_medium=sharpvector) solutions augmenting the LLM (Large Language Model) with the ability to load additional context data with the AI prompt using the [RAG (Retrieval-Augmented Generation)](https://build5nines.com/what-is-retrieval-augmented-generation-rag/?utm_source=github&utm_medium=sharpvector) design pattern. -While there are lots of large databases that can be used to build Vector Databases (like Azure CosmosDB, PostgreSQL w/ pgvector, Azure AI Search, Elasticsearch, and more), there are not many options for a lightweight vector database that can be embedded into any .NET application. Build5Nines SharpVector is the lightweight in-memory Text Vector Database for use in any .NET application that you're looking for! +While there are lots of large databases that can be used to build Vector Databases (like Azure CosmosDB, PostgreSQL w/ pgvector, Azure AI Search, Elasticsearch, and more), there are not many options for a lightweight vector database that can be embedded into any .NET application to provide a local text vector database. + +> Build5Nines SharpVector is the lightweight, local, in-memory Text Vector Database for implementing semantic search into any .NET application! ### [Documentation](https://sharpvector.build5nines.com) | [Get Started](https://sharpvector.build5nines.com/get-started/) | [Samples](https://sharpvector.build5nines.com/samples/) From 9d8e7a1ba81b93f050ba3f56aa22ce884802dedd Mon Sep 17 00:00:00 2001 From: Chris Pietschmann Date: Sun, 26 Oct 2025 11:31:12 -0400 Subject: [PATCH 2/6] Add testimonial for Build5Nines.SharpVector Added a quote from Tulika Chaudharie about Build5Nines.SharpVector and its use in RAG implementation. --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 38cfb72..8b09f5a 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,11 @@ Vector databases are used with Semantic Search and [Generative AI](https://build While there are lots of large databases that can be used to build Vector Databases (like Azure CosmosDB, PostgreSQL w/ pgvector, Azure AI Search, Elasticsearch, and more), there are not many options for a lightweight vector database that can be embedded into any .NET application to provide a local text vector database. -> Build5Nines SharpVector is the lightweight, local, in-memory Text Vector Database for implementing semantic search into any .NET application! +> "For the in-memory vector database, we're using Build5Nines.SharpVector, an excellent open-source project by Chris Pietschmann. SharpVector makes it easy to store and retrieve vectorized data, making it an ideal choice for our sample RAG implementation." +> +> [Tulika Chaudharie, Principal Product Manager at Microsoft for Azure App Service](https://azure.github.io/AppService/2024/09/03/Phi3-vector.html) + +Build5Nines SharpVector is the lightweight, local, in-memory Text Vector Database for implementing semantic search into any .NET application! ### [Documentation](https://sharpvector.build5nines.com) | [Get Started](https://sharpvector.build5nines.com/get-started/) | [Samples](https://sharpvector.build5nines.com/samples/) From c0e1860e5d490808c2c908562eb4d5e1e126657c Mon Sep 17 00:00:00 2001 From: Chris Pietschmann Date: Sun, 23 Nov 2025 10:44:42 -0500 Subject: [PATCH 3/6] Fix database file load issue where Ids get reset #76 --- .../Build5Nines.SharpVector.csproj | 2 +- .../Id/ISequentialIdGenerator.cs | 15 +++ .../Id/NumericIdGenerator.cs | 9 +- .../MemoryVectorDatabaseBase.cs | 24 ++++- .../BasicOpenAIMemoryVectorDatabaseTest.cs | 95 +++++++++++++++++++ .../SharpVectorOpenAITest.csproj | 3 +- src/SharpVectorTest/VectorDatabaseTests.cs | 34 +++++++ 7 files changed, 178 insertions(+), 4 deletions(-) create mode 100644 src/Build5Nines.SharpVector/Id/ISequentialIdGenerator.cs diff --git a/src/Build5Nines.SharpVector/Build5Nines.SharpVector.csproj b/src/Build5Nines.SharpVector/Build5Nines.SharpVector.csproj index 2ac8e59..5d8ec2d 100644 --- a/src/Build5Nines.SharpVector/Build5Nines.SharpVector.csproj +++ b/src/Build5Nines.SharpVector/Build5Nines.SharpVector.csproj @@ -9,7 +9,7 @@ Build5Nines.SharpVector https://sharpvector.build5nines.com https://github.com/Build5Nines/SharpVector - 2.1.1 + 2.1.2 Lightweight In-memory Vector Database to embed in any .NET Applications Copyright (c) 2025 Build5Nines LLC README.md diff --git a/src/Build5Nines.SharpVector/Id/ISequentialIdGenerator.cs b/src/Build5Nines.SharpVector/Id/ISequentialIdGenerator.cs new file mode 100644 index 0000000..ff78c8c --- /dev/null +++ b/src/Build5Nines.SharpVector/Id/ISequentialIdGenerator.cs @@ -0,0 +1,15 @@ +namespace Build5Nines.SharpVector.Id; + +/// +/// Interface for ID generators that support setting the most recent generated ID (sequential/numeric style). +/// +/// The ID type. +public interface ISequentialIdGenerator : IIdGenerator + where TId : notnull +{ + /// + /// Sets the most recent ID value so the next generated ID will continue the sequence. + /// + /// The most recently used/generated ID. + void SetMostRecent(TId mostRecentId); +} diff --git a/src/Build5Nines.SharpVector/Id/NumericIdGenerator.cs b/src/Build5Nines.SharpVector/Id/NumericIdGenerator.cs index 7842a32..e8c3bea 100644 --- a/src/Build5Nines.SharpVector/Id/NumericIdGenerator.cs +++ b/src/Build5Nines.SharpVector/Id/NumericIdGenerator.cs @@ -1,6 +1,6 @@ namespace Build5Nines.SharpVector.Id; -public class NumericIdGenerator : IIdGenerator +public class NumericIdGenerator : ISequentialIdGenerator where TId : struct { public NumericIdGenerator() @@ -22,4 +22,11 @@ public TId NewId() { return _lastId; } } + + public void SetMostRecent(TId mostRecentId) + { + lock(_lock) { + _lastId = mostRecentId; + } + } } \ No newline at end of file diff --git a/src/Build5Nines.SharpVector/MemoryVectorDatabaseBase.cs b/src/Build5Nines.SharpVector/MemoryVectorDatabaseBase.cs index 53cee62..d701881 100644 --- a/src/Build5Nines.SharpVector/MemoryVectorDatabaseBase.cs +++ b/src/Build5Nines.SharpVector/MemoryVectorDatabaseBase.cs @@ -11,6 +11,7 @@ using Build5Nines.SharpVector.Embeddings; using System.Runtime.ExceptionServices; using System.Collections; +using System.Linq; namespace Build5Nines.SharpVector; @@ -351,8 +352,18 @@ await DatabaseFile.LoadDatabaseFromZipArchiveAsync( async (archive) => { await DatabaseFile.LoadVectorStoreAsync(archive, VectorStore); - await DatabaseFile.LoadVocabularyStoreAsync(archive, VectorStore.VocabularyStore); + + // Re-initialize the IdGenerator with the max Id value from the VectorStore if it supports sequential numeric IDs + if (_idGenerator is ISequentialIdGenerator seqIdGen) + { + // Re-seed the sequence only if there are existing IDs + var ids = VectorStore.GetIds(); + if (ids.Any()) + { + seqIdGen.SetMostRecent(ids.Max()!); + } + } } ); } @@ -708,6 +719,17 @@ await DatabaseFile.LoadDatabaseFromZipArchiveAsync( async (archive) => { await DatabaseFile.LoadVectorStoreAsync(archive, VectorStore); + + // Re-initialize the IdGenerator with the max Id value from the VectorStore if it supports sequential numeric IDs + if (_idGenerator is ISequentialIdGenerator seqIdGen) + { + // Re-seed the sequence only if there are existing IDs + var ids = VectorStore.GetIds(); + if (ids.Any()) + { + seqIdGen.SetMostRecent(ids.Max()!); + } + } } ); } diff --git a/src/SharpVectorOpenAITest/BasicOpenAIMemoryVectorDatabaseTest.cs b/src/SharpVectorOpenAITest/BasicOpenAIMemoryVectorDatabaseTest.cs index e12c1ed..5e8b285 100644 --- a/src/SharpVectorOpenAITest/BasicOpenAIMemoryVectorDatabaseTest.cs +++ b/src/SharpVectorOpenAITest/BasicOpenAIMemoryVectorDatabaseTest.cs @@ -7,6 +7,9 @@ using System.Threading; using System.Threading.Tasks; using System.Collections.Generic; +using System.ClientModel.Primitives; +using System.IO; +using System; namespace Build5Nines.SharpVector.OpenAI.Tests { @@ -20,9 +23,49 @@ public class BasicMemoryVectorDatabaseTest public void Setup() { _mockEmbeddingClient = new Mock(); + + // Mock the OpenAI EmbeddingClient to return a deterministic embedding vector + // GenerateEmbeddingAsync(string input, EmbeddingGenerationOptions? options = null, CancellationToken cancellationToken = default) + // returns ClientResult. We create one using the Model Factory helpers. + var embeddingVector = new float[] { 0.1f, 0.2f, 0.3f }; // small deterministic vector for tests + var openAiEmbedding = OpenAIEmbeddingsModelFactory.OpenAIEmbedding(index: 0, vector: embeddingVector); + // Create minimal concrete PipelineResponse implementation to satisfy ClientResult.FromValue without relying on Moq for abstract type + var response = new TestPipelineResponse(); + var clientResult = ClientResult.FromValue(openAiEmbedding, response); + + _mockEmbeddingClient + .Setup(c => c.GenerateEmbeddingAsync( + It.IsAny(), + It.IsAny(), + It.IsAny())) + .ReturnsAsync(clientResult); + _database = new BasicOpenAIMemoryVectorDatabase(_mockEmbeddingClient.Object); } + // Minimal headers implementation for TestPipelineResponse + internal class EmptyPipelineResponseHeaders : PipelineResponseHeaders + { + public override IEnumerator> GetEnumerator() => (new List>()).GetEnumerator(); + public override bool TryGetValue(string name, out string? value) { value = null; return false; } + public override bool TryGetValues(string name, out IEnumerable? values) { values = null; return false; } + } + + // Minimal PipelineResponse implementation + internal class TestPipelineResponse : PipelineResponse + { + private Stream? _contentStream = Stream.Null; + private readonly EmptyPipelineResponseHeaders _headers = new EmptyPipelineResponseHeaders(); + public override int Status => 200; + public override string ReasonPhrase => "OK"; + public override Stream? ContentStream { get => _contentStream; set => _contentStream = value; } + protected override PipelineResponseHeaders HeadersCore => _headers; + public override BinaryData Content => BinaryData.FromBytes(Array.Empty()); + public override BinaryData BufferContent(CancellationToken cancellationToken = default) => Content; + public override ValueTask BufferContentAsync(CancellationToken cancellationToken = default) => ValueTask.FromResult(Content); + public override void Dispose() { _contentStream?.Dispose(); } + } + [TestMethod] public void TestInitialization() { @@ -40,5 +83,57 @@ public async Task Test_SaveLoad_01() await _database.LoadFromFileAsync(filename); } + [TestMethod] + public async Task Test_SaveLoad_TestIds_01() + { + _database.AddText("Sample text for testing IDs.", "111"); + _database.AddText("Another sample text for testing IDs.", "222"); + + var results = _database.Search("testing IDs"); + Assert.AreEqual(2, results.Texts.Count()); + + var filename = "openai_test_saveload_testids_01.b59vdb"; +#pragma warning disable CS8604 // Possible null reference argument. + await _database.SaveToFileAsync(filename); +#pragma warning restore CS8604 // Possible null reference argument. + + await _database.LoadFromFileAsync(filename); + + _database.AddText("A new text after loading to check ID assignment.", "333"); + + var newResults = _database.Search("testing IDs"); + Assert.AreEqual(3, newResults.Texts.Count()); + var texts = newResults.Texts.OrderBy(x => x.Metadata).ToArray(); + Assert.AreEqual("111", texts[0].Metadata); + Assert.AreEqual("222", texts[1].Metadata); + Assert.AreEqual("333", texts[2].Metadata); + } + + [TestMethod] + public async Task Test_SaveLoad_TestIds_02() + { + _database.AddText("Sample text for testing IDs.", "111"); + _database.AddText("Another sample text for testing IDs.", "222"); + + var results = _database.Search("testing IDs"); + Assert.AreEqual(2, results.Texts.Count()); + + var filename = "openai_test_saveload_testids_02.b59vdb"; +#pragma warning disable CS8604 // Possible null reference argument. + await _database.SaveToFileAsync(filename); +#pragma warning restore CS8604 // Possible null reference argument. + + var newdb = new BasicOpenAIMemoryVectorDatabase(_mockEmbeddingClient.Object); + await newdb.LoadFromFileAsync(filename); + + newdb.AddText("A new text after loading to check ID assignment.", "333"); + + var newResults = newdb.Search("testing IDs"); + Assert.AreEqual(3, newResults.Texts.Count()); + var texts = newResults.Texts.OrderBy(x => x.Metadata).ToArray(); + Assert.AreEqual("111", texts[0].Metadata); + Assert.AreEqual("222", texts[1].Metadata); + Assert.AreEqual("333", texts[2].Metadata); + } } } \ No newline at end of file diff --git a/src/SharpVectorOpenAITest/SharpVectorOpenAITest.csproj b/src/SharpVectorOpenAITest/SharpVectorOpenAITest.csproj index d454955..6f82bd4 100644 --- a/src/SharpVectorOpenAITest/SharpVectorOpenAITest.csproj +++ b/src/SharpVectorOpenAITest/SharpVectorOpenAITest.csproj @@ -10,7 +10,7 @@ - + @@ -25,6 +25,7 @@ + diff --git a/src/SharpVectorTest/VectorDatabaseTests.cs b/src/SharpVectorTest/VectorDatabaseTests.cs index c7117c7..ced9326 100644 --- a/src/SharpVectorTest/VectorDatabaseTests.cs +++ b/src/SharpVectorTest/VectorDatabaseTests.cs @@ -174,6 +174,40 @@ public void BasicMemoryVectorDatabase_SaveLoad_01() Assert.AreEqual(0.3396831452846527, results.Texts.First().VectorComparison); } + [TestMethod] + public void BasicMemoryVectorDatabase_SaveLoad_TestIds() + { + var vdb = new BasicMemoryVectorDatabase(); + + // // Load Vector Database with some sample text + vdb.AddText("The Lion King is a 1994 Disney animated film about a young lion cub named Simba who is the heir to the throne of an African savanna.", "First"); + vdb.AddText("Build5Nines is awesome!", "Second"); + var results = vdb.Search("Lion King"); + + Assert.AreEqual(2, results.Texts.Count()); + + var filename = "BasicMemoryVectorDatabase_SaveLoad_TestIds.b59vdb"; + vdb.SaveToFile(filename); + + var newvdb = new BasicMemoryVectorDatabase(); + newvdb.LoadFromFile(filename); + + // Add a new text entry after loading + // This should get the next available ID (3) and not overwrite existing entries + newvdb.AddText("A new string that should be added, not replacing existing one.", "Third"); + + results = newvdb.Search("Lion King"); + + Assert.AreEqual(3, results.Texts.Count()); + var listOfTexts = results.Texts.OrderBy(x => x.Id).ToArray(); + Assert.AreEqual(listOfTexts[0].Id, 1); + Assert.AreEqual(listOfTexts[0].Metadata, "First"); + Assert.AreEqual(listOfTexts[1].Id, 2); + Assert.AreEqual(listOfTexts[1].Metadata, "Second"); + Assert.AreEqual(listOfTexts[2].Id, 3); + Assert.AreEqual(listOfTexts[2].Metadata, "Third"); + } + [TestMethod] public async Task BasicMemoryVectorDatabase_SaveLoadBinaryStreamAsync_01() { From 69b277a964ae337d8462742cc96f02e41f16022a Mon Sep 17 00:00:00 2001 From: Chris Pietschmann Date: Sun, 23 Nov 2025 10:51:15 -0500 Subject: [PATCH 4/6] Update CHANGELOG.md --- CHANGELOG.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 868ef80..c0ddd7b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## v2.1.2 + +Fixed: + +- Fixed a bug when loading saved database from file/stream where `IntIdGenerator` or `NumericIdGenerator` lose max Id, resulting in adding new texts to database causes existing texts to be overwritten. This specifically affected `SharpVector.OpenAI` and `SharpVector.Ollama` libraries but the fix is implemented within the core `Build5Nines.SharpVector` library. + ## v2.1.1 Add: From d1e87f77640eb6bfde68b543e0bd0f41f19c2426 Mon Sep 17 00:00:00 2001 From: Chris Pietschmann Date: Sun, 23 Nov 2025 11:10:45 -0500 Subject: [PATCH 5/6] Remove NuGet version badge Removed NuGet version badge from README. --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 8b09f5a..423a266 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,6 @@ [![Build and Release](https://github.com/Build5Nines/SharpVector/actions/workflows/build-release.yml/badge.svg)](https://github.com/Build5Nines/SharpVector/actions/workflows/build-release.yml) ![Libraries.io dependency status for GitHub repo](https://img.shields.io/librariesio/github/build5nines/sharpvector) -[![NuGet](https://img.shields.io/nuget/v/Build5Nines.SharpVector.svg)](https://www.nuget.org/packages/Build5Nines.SharpVector/) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) ![Framework: .NET 8+](https://img.shields.io/badge/framework-.NET%208%2B-blue) ![Semantic Search: Enabled](https://img.shields.io/badge/semantic%20search-enabled-purple) From 34292d42a9a3d218701ebaa23d0ddfb4d3061aea Mon Sep 17 00:00:00 2001 From: Chris Pietschmann Date: Sun, 23 Nov 2025 11:11:47 -0500 Subject: [PATCH 6/6] Remove NuGet badge from index.md Removed NuGet badge from documentation. --- docs/docs/index.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/docs/index.md b/docs/docs/index.md index 2b32c5c..36ef9cc 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -10,7 +10,6 @@ description: The lightweight, in-memory, semantic search, text vector database f [![Build and Release](https://github.com/Build5Nines/SharpVector/actions/workflows/build-release.yml/badge.svg)](https://github.com/Build5Nines/SharpVector/actions/workflows/build-release.yml) ![Libraries.io dependency status for GitHub repo](https://img.shields.io/librariesio/github/build5nines/sharpvector) -[![NuGet](https://img.shields.io/nuget/v/Build5Nines.SharpVector.svg)](https://www.nuget.org/packages/Build5Nines.SharpVector/) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) ![Framework: .NET 8+](https://img.shields.io/badge/framework-.NET%208%2B-blue) ![Semantic Search: Enabled](https://img.shields.io/badge/semantic%20search-enabled-purple)