Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Add PostgresVectorStore Memory connector. #9324

Open
wants to merge 73 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
8778d5f
Add PostgresVectorStore Memory connector.
Oct 18, 2024
ddad99a
Add UpsertBatch, GetBatch, and DeleteBatch
Oct 18, 2024
5447815
Remove unused CreateMapping
Oct 18, 2024
7533f8c
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 18, 2024
9a4f836
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 21, 2024
68a000e
Add vector search to PostgresVectorStore
Oct 22, 2024
317f6af
create index on collection creation
Oct 23, 2024
f4f5ba2
Support Guid, test distance functions
Oct 23, 2024
2acf118
Format tests
Oct 23, 2024
5db2c59
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 23, 2024
f4b4dc5
Add service and kernel extensions
Oct 24, 2024
5c58400
Default to Euclidean distance if no distance function is specified
Oct 24, 2024
8ea21cd
Add Postgres sample to concepts
Oct 24, 2024
4dcd222
Add docs for setting configuration in samples\Concepts
Oct 24, 2024
74b3764
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 24, 2024
5c3e63f
Enforce dimension size in index creation
Oct 24, 2024
6d9f1fd
Create index for CreateTableIfNotExistsAsyc
Oct 24, 2024
b4266cc
Log warning when index not created due to dimensions
Oct 24, 2024
f86613a
Refactor and tests; make SqlBuilder internal
Oct 24, 2024
716b794
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 24, 2024
8d8283b
Remove old migration note
Oct 25, 2024
89027fc
Fix docstring
Oct 25, 2024
8f45d9c
Use parameter for tableName
Oct 25, 2024
48811bd
Fix support for DateTime, DateTimeOffset
Oct 25, 2024
1d6082d
Fix warnings in test
Oct 25, 2024
eb0a683
Remove kernel extensions, improve service extensions
Oct 25, 2024
a66d835
Make PostgresSqlCommandInfo internal
Oct 25, 2024
53f1009
Default to a Hnsw index
Oct 25, 2024
08ea55f
Default to cosine distance
Oct 25, 2024
319648b
Consistently use includeVectors
Oct 25, 2024
5b52bdc
Simplify AsyncEnumerable return
Oct 25, 2024
cd845ee
Pass properties instead of full definition
Oct 25, 2024
1d09a21
Throw instead of log for too high dimensionality
Oct 25, 2024
74e9757
Remove DefaultVectorSize
Oct 25, 2024
ad5628c
Remove unused using statements
Oct 25, 2024
dbf1aef
Remove VectorStore constructor that creates datsaource
Oct 25, 2024
a355bf7
Fix duplicate mapper call
Oct 25, 2024
e499a80
Fix docstring typo
Oct 25, 2024
c95e2b3
Comment clarifying that multiple keys should be previously validated
Oct 25, 2024
9d972b3
Refactor ExecuteNonQueryAsync calls to reduce code dupe
Oct 25, 2024
6eb3793
Forward Schema option.
Oct 25, 2024
ed59fed
Make PostgresVectorStoreDbClient internal
Oct 25, 2024
1749adb
Support more enumerable types
Oct 25, 2024
ea7b01c
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 25, 2024
86486d7
Refactor to support default + transactions
Oct 28, 2024
b9b4a44
Fix issue with converting readonly array on upsert
Oct 28, 2024
c53a8ee
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 28, 2024
97ef60a
Fix SLN merge error
Oct 28, 2024
81e1805
Improve error handling
Oct 28, 2024
a587260
Avoid CA1859 in test class
Oct 28, 2024
e8fe800
Account for ngpsql missing func in .net std 2.0
Oct 28, 2024
96c088e
Fix servicecollection tests
Oct 28, 2024
0fc76f6
Logic for dimension max moved and tested elsewhere
Oct 28, 2024
266310b
Remove unused using statement
Oct 28, 2024
08f110c
Remove logger from PostgresVectorStoreRecordCollection
Oct 29, 2024
26516c5
Merge branch 'main' into feature/postgres-vector-store-dotnet
lossyrob Oct 30, 2024
5b44a80
Use Flat instead of None index kind
Oct 30, 2024
b9b2487
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 31, 2024
24577a0
Remove unnecessary overloads
Oct 31, 2024
60d6512
Change tests to be true to name
Oct 31, 2024
5a66a13
Remove reduntant key type based test
Oct 31, 2024
581b6ab
Remove unnecessary overloads
Oct 31, 2024
494a0d4
Better error handling for IAsyncEnumerable
Oct 31, 2024
5f19889
Default to Flat (no index) instead of Hnsw
Oct 31, 2024
62ac8eb
Add enumerable to record mapper test
Oct 31, 2024
364b592
Remove unused fixture properties
Oct 31, 2024
bf58cab
Test StoragePropertyName in sql builder tests
Oct 31, 2024
aa592de
Remove dynamic from integration test
Nov 1, 2024
9a3b216
Add test to read from manually inserted record
Nov 1, 2024
1ee09c1
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Nov 1, 2024
b037075
Formatting, spelling
Nov 1, 2024
29d91ba
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Nov 1, 2024
c2937e0
Fix test.
Nov 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions dotnet/SK-dotnet.sln
Original file line number Diff line number Diff line change
Expand Up @@ -399,6 +399,8 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "AotCompatibility", "samples
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "SemanticKernel.AotTests", "src\SemanticKernel.AotTests\SemanticKernel.AotTests.csproj", "{39EAB599-742F-417D-AF80-95F90376BB18}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Connectors.Postgres.UnitTests", "src\Connectors\Connectors.Postgres.UnitTests\Connectors.Postgres.UnitTests.csproj", "{232E1153-6366-4175-A982-D66B30AAD610}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand Down Expand Up @@ -1030,6 +1032,12 @@ Global
{6F591D05-5F7F-4211-9042-42D8BCE60415}.Publish|Any CPU.Build.0 = Debug|Any CPU
{6F591D05-5F7F-4211-9042-42D8BCE60415}.Release|Any CPU.ActiveCfg = Release|Any CPU
{6F591D05-5F7F-4211-9042-42D8BCE60415}.Release|Any CPU.Build.0 = Release|Any CPU
{232E1153-6366-4175-A982-D66B30AAD610}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{232E1153-6366-4175-A982-D66B30AAD610}.Debug|Any CPU.Build.0 = Debug|Any CPU
{232E1153-6366-4175-A982-D66B30AAD610}.Publish|Any CPU.ActiveCfg = Debug|Any CPU
{232E1153-6366-4175-A982-D66B30AAD610}.Publish|Any CPU.Build.0 = Debug|Any CPU
{232E1153-6366-4175-A982-D66B30AAD610}.Release|Any CPU.ActiveCfg = Release|Any CPU
{232E1153-6366-4175-A982-D66B30AAD610}.Release|Any CPU.Build.0 = Release|Any CPU
{E82B640C-1704-430D-8D71-FD8ED3695468}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{E82B640C-1704-430D-8D71-FD8ED3695468}.Debug|Any CPU.Build.0 = Debug|Any CPU
{E82B640C-1704-430D-8D71-FD8ED3695468}.Publish|Any CPU.ActiveCfg = Debug|Any CPU
Expand Down Expand Up @@ -1179,6 +1187,7 @@ Global
{E82B640C-1704-430D-8D71-FD8ED3695468} = {5A7028A7-4DDF-4E4F-84A9-37CE8F8D7E89}
{6ECFDF04-2237-4A85-B114-DAA34923E9E6} = {5D4C0700-BBB5-418F-A7B2-F392B9A18263}
{39EAB599-742F-417D-AF80-95F90376BB18} = {831DDCA2-7D2C-4C31-80DB-6BDB3E1F7AE0}
{232E1153-6366-4175-A982-D66B30AAD610} = {0247C2C9-86C3-45BA-8873-28B0948EDC0C}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {FBDC56A3-86AD-4323-AA0F-201E59123B83}
Expand Down
3 changes: 3 additions & 0 deletions dotnet/samples/Concepts/Concepts.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,9 @@
</ItemGroup>

<ItemGroup>
<None Update="appsettings.Development.json">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Include="Resources\Plugins\ApiManifestPlugins\**\apimanifest.json">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,51 @@ namespace Memory.VectorStoreFixtures;
/// </summary>
internal static class VectorStoreInfra
{
/// <summary>
/// Setup the postgres pgvector container by pulling the image and running it.
/// </summary>
/// <param name="client">The docker client to create the container with.</param>
/// <returns>The id of the container.</returns>
public static async Task<string> SetupPostgresContainerAsync(DockerClient client)
{
await client.Images.CreateImageAsync(
new ImagesCreateParameters
{
FromImage = "pgvector/pgvector",
Tag = "pg16",
},
null,
new Progress<JSONMessage>());

var container = await client.Containers.CreateContainerAsync(new CreateContainerParameters()
{
Image = "pgvector/pgvector:pg16",
HostConfig = new HostConfig()
{
PortBindings = new Dictionary<string, IList<PortBinding>>
{
{"5432", new List<PortBinding> {new() {HostPort = "5432" } }},
},
PublishAllPorts = true
},
ExposedPorts = new Dictionary<string, EmptyStruct>
{
{ "5432", default },
},
Env = new List<string>
{
"POSTGRES_USER=postgres",
"POSTGRES_PASSWORD=example",
},
});

await client.Containers.StartContainerAsync(
container.ID,
new ContainerStartParameters());

return container.ID;
}

/// <summary>
/// Setup the qdrant container by pulling the image and running it.
/// </summary>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
// Copyright (c) Microsoft. All rights reserved.

using Docker.DotNet;
using Npgsql;

namespace Memory.VectorStoreFixtures;

/// <summary>
/// Fixture to use for creating a Postgres container before tests and delete it after tests.
/// </summary>
public class VectorStorePostgresContainerFixture : IAsyncLifetime
{
private DockerClient? _dockerClient;
private string? _postgresContainerId;

public async Task InitializeAsync()
{
}

public async Task ManualInitializeAsync()
{
if (this._postgresContainerId == null)
{
// Connect to docker and start the docker container.
using var dockerClientConfiguration = new DockerClientConfiguration();
this._dockerClient = dockerClientConfiguration.CreateClient();
this._postgresContainerId = await VectorStoreInfra.SetupPostgresContainerAsync(this._dockerClient);

// Delay until the Postgres server is ready.
var connectionString = "Host=localhost;Port=5432;Username=postgres;Password=example;Database=postgres;";
var succeeded = false;
var attemptCount = 0;
while (!succeeded && attemptCount++ < 10)
{
try
{
NpgsqlDataSourceBuilder dataSourceBuilder = new(connectionString);
dataSourceBuilder.UseVector();
using var dataSource = dataSourceBuilder.Build();
NpgsqlConnection connection = await dataSource.OpenConnectionAsync().ConfigureAwait(false);

await using (connection)
{
// Create extension vector if it doesn't exist
await using (NpgsqlCommand command = new("CREATE EXTENSION IF NOT EXISTS vector", connection))
{
await command.ExecuteNonQueryAsync();
}
}
}
catch (Exception)
{
await Task.Delay(1000);
}
}
}
}

public async Task DisposeAsync()
{
if (this._dockerClient != null && this._postgresContainerId != null)
{
// Delete docker container.
await VectorStoreInfra.DeleteContainerAsync(this._dockerClient, this._postgresContainerId);
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
// Copyright (c) Microsoft. All rights reserved.

using Azure.Identity;
using Memory.VectorStoreFixtures;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.AzureOpenAI;
using Microsoft.SemanticKernel.Connectors.Postgres;

namespace Memory;

/// <summary>
/// An example showing how to use common code, that can work with any vector database, with a Postgres database.
/// The common code is in the <see cref="VectorStore_VectorSearch_MultiStore_Common"/> class.
/// The common code ingests data into the vector store and then searches over that data.
/// This example is part of a set of examples each showing a different vector database.
///
/// For other databases, see the following classes:
/// <para><see cref="VectorStore_VectorSearch_MultiStore_AzureAISearch"/></para>
/// <para><see cref="VectorStore_VectorSearch_MultiStore_Redis"/></para>
/// <para><see cref="VectorStore_VectorSearch_MultiStore_InMemory"/></para>
///
/// To run this sample, you need a local instance of Docker running, since the associated fixture will try and start a Postgres container in the local docker instance.
/// </summary>
public class VectorStore_VectorSearch_MultiStore_Postgres(ITestOutputHelper output, VectorStorePostgresContainerFixture PostgresFixture) : BaseTest(output), IClassFixture<VectorStorePostgresContainerFixture>
{
/// <summary>
/// The connection string to the Postgres database hosted in the docker container.
/// </summary>
private const string ConnectionString = "Host=localhost;Port=5432;Username=postgres;Password=example;Database=postgres;";

[Fact]
public async Task ExampleWithDIAsync()
{
// Use the kernel for DI purposes.
var kernelBuilder = Kernel
.CreateBuilder();

// Register an embedding generation service with the DI container.
kernelBuilder.AddAzureOpenAITextEmbeddingGeneration(
deploymentName: TestConfiguration.AzureOpenAIEmbeddings.DeploymentName,
endpoint: TestConfiguration.AzureOpenAIEmbeddings.Endpoint,
credential: new AzureCliCredential());

// Initialize the Postgres docker container via the fixtures and register the Postgres VectorStore.
await PostgresFixture.ManualInitializeAsync();
kernelBuilder.AddPostgresVectorStore(ConnectionString);

// Register the test output helper common processor with the DI container.
kernelBuilder.Services.AddSingleton<ITestOutputHelper>(this.Output);
kernelBuilder.Services.AddTransient<VectorStore_VectorSearch_MultiStore_Common>();

// Build the kernel.
var kernel = kernelBuilder.Build();

// Build a common processor object using the DI container.
var processor = kernel.GetRequiredService<VectorStore_VectorSearch_MultiStore_Common>();

// Run the process and pass a key generator function to it, to generate unique record keys.
// The key generator function is required, since different vector stores may require different key types.
// E.g. Postgres supports Guid and ulong keys, but others may support strings only.
await processor.IngestDataAndSearchAsync("skglossaryWithDI", () => Guid.NewGuid());
}

[Fact]
public async Task ExampleWithoutDIAsync()
{
// Create an embedding generation service.
var textEmbeddingGenerationService = new AzureOpenAITextEmbeddingGenerationService(
TestConfiguration.AzureOpenAIEmbeddings.DeploymentName,
TestConfiguration.AzureOpenAIEmbeddings.Endpoint,
new AzureCliCredential());

// Initialize the Postgres docker container via the fixtures and construct the Postgres VectorStore.
await PostgresFixture.ManualInitializeAsync();
var vectorStore = new PostgresVectorStore(ConnectionString);

// Create the common processor that works for any vector store.
var processor = new VectorStore_VectorSearch_MultiStore_Common(vectorStore, textEmbeddingGenerationService, this.Output);

// Run the process and pass a key generator function to it, to generate unique record keys.
// The key generator function is required, since different vector stores may require different key types.
// E.g. Postgres supports Guid and ulong keys, but others may support strings only.
await processor.IngestDataAndSearchAsync("skglossaryWithoutDI", () => Guid.NewGuid());
}
}
85 changes: 85 additions & 0 deletions dotnet/samples/Concepts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,3 +204,88 @@ dotnet test -l "console;verbosity=detailed" --filter "FullyQualifiedName=ChatCom
### TextToImage - Using [`TextToImage`](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/SemanticKernel.Abstractions/AI/TextToImage/ITextToImageService.cs) services to generate images

- [OpenAI_TextToImage](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/TextToImage/OpenAI_TextToImageDalle3.cs)

## Configuration

### Option 1: Use Secret Manager

Concept samples will require secrets and credentials, to access OpenAI, Azure OpenAI,
Bing and other resources.

We suggest using .NET [Secret Manager](https://learn.microsoft.com/en-us/aspnet/core/security/app-secrets)
to avoid the risk of leaking secrets into the repository, branches and pull requests.
You can also use environment variables if you prefer.

To set your secrets with Secret Manager:

```
cd dotnet/src/samples/Concepts

dotnet user-secrets init

dotnet user-secrets set "OpenAI:ServiceId" "gpt-3.5-turbo-instruct"
dotnet user-secrets set "OpenAI:ModelId" "gpt-3.5-turbo-instruct"
dotnet user-secrets set "OpenAI:ChatModelId" "gpt-4"
dotnet user-secrets set "OpenAI:ApiKey" "..."

...
```

### Option 2: Use Configuration File
1. Create a `appsettings.Development.json` file next to the `Concepts.csproj` file. This file will be ignored by git,
the content will not end up in pull requests, so it's safe for personal settings. Keep the file safe.
2. Edit `appsettings.Development.json` and set the appropriate configuration for the samples you are running.

For example:

```json
{
"OpenAI": {
"ServiceId": "gpt-3.5-turbo-instruct",
"ModelId": "gpt-3.5-turbo-instruct",
"ChatModelId": "gpt-4",
"ApiKey": "sk-...."
},
"AzureOpenAI": {
"ServiceId": "azure-gpt-35-turbo-instruct",
"DeploymentName": "gpt-35-turbo-instruct",
"ChatDeploymentName": "gpt-4",
"Endpoint": "https://contoso.openai.azure.com/",
"ApiKey": "...."
},
// etc.
}
```

### Option 3: Use Environment Variables
You may also set the settings in your environment variables. The environment variables will override the settings in the `appsettings.Development.json` file.

When setting environment variables, use a double underscore (i.e. "\_\_") to delineate between parent and child properties. For example:

- bash:

```bash
export OpenAI__ApiKey="sk-...."
export AzureOpenAI__ApiKey="...."
export AzureOpenAI__DeploymentName="gpt-35-turbo-instruct"
export AzureOpenAI__ChatDeploymentName="gpt-4"
export AzureOpenAIEmbeddings__DeploymentName="azure-text-embedding-ada-002"
export AzureOpenAI__Endpoint="https://contoso.openai.azure.com/"
export HuggingFace__ApiKey="...."
export Bing__ApiKey="...."
export Postgres__ConnectionString="...."
```

- PowerShell:

```ps
$env:OpenAI__ApiKey = "sk-...."
$env:AzureOpenAI__ApiKey = "...."
$env:AzureOpenAI__DeploymentName = "gpt-35-turbo-instruct"
$env:AzureOpenAI__ChatDeploymentName = "gpt-4"
$env:AzureOpenAIEmbeddings__DeploymentName = "azure-text-embedding-ada-002"
$env:AzureOpenAI__Endpoint = "https://contoso.openai.azure.com/"
$env:HuggingFace__ApiKey = "...."
$env:Bing__ApiKey = "...."
$env:Postgres__ConnectionString = "...."
```
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,9 @@
<ProjectReference Include="..\..\SemanticKernel.Core\SemanticKernel.Core.csproj" />
</ItemGroup>

<ItemGroup>
<InternalsVisibleTo Include="SemanticKernel.Connectors.Postgres.UnitTests" />
<InternalsVisibleTo Include="DynamicProxyGenAssembly2" />
</ItemGroup>

</Project>
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ namespace Microsoft.SemanticKernel.Connectors.Postgres;
/// <summary>
/// Interface for client managing postgres database operations.
/// </summary>
/// <remarks>
/// This interface is used with the PostgresMemoryStore, which is being deprecated.
/// Use the <see cref="IPostgresVectorStoreDbClient"/> interface with the PostgresVectorStore
/// and related classes instead.
/// </remarks>
public interface IPostgresDbClient
{
/// <summary>
Expand Down
Loading
Loading