Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Azure Cosmos DB NoSQL Vector Store & Collection implementation #9296

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

TaoChenOSU
Copy link
Contributor

@TaoChenOSU TaoChenOSU commented Oct 16, 2024

Motivation and Context

We are implementing the Azure Cosmos DB NoSQL vector store and vector collection.

Description

Azure Cosmos DB NoSQL vector store & collection implementation.

Contribution Checklist

@TaoChenOSU TaoChenOSU added PR: in progress Under development and/or addressing feedback python Pull requests for the Python Semantic Kernel labels Oct 16, 2024
@TaoChenOSU TaoChenOSU self-assigned this Oct 16, 2024
@TaoChenOSU TaoChenOSU requested a review from a team as a code owner October 16, 2024 22:30
@github-actions github-actions bot changed the title Azure Cosmos DB NoSQL Vector Store & Collection implementation Python: Azure Cosmos DB NoSQL Vector Store & Collection implementation Oct 16, 2024
@TaoChenOSU TaoChenOSU linked an issue Oct 16, 2024 that may be closed by this pull request
8 tasks
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Oct 16, 2024

Python Test Coverage

Python Test Coverage Report
FileStmtsMissCoverMissing
semantic_kernel
   kernel.py1994776%148, 159, 163, 313–316, 423, 437–480
semantic_kernel/agents/group_chat
   agent_chat.py124298%78, 171
   agent_group_chat.py100298%151, 201
   broadcast_queue.py72199%35
semantic_kernel/agents/open_ai
   assistant_content_generation.py141994%97–98, 329–337, 379, 381
   azure_assistant_agent.py107298%284, 304
   open_ai_assistant_agent.py105298%252, 272
   open_ai_assistant_base.py467898%260, 338–339, 747, 868, 871, 945, 1007
semantic_kernel/connectors/ai
   audio_to_text_client_base.py9189%51
   chat_completion_client_base.py116298%382, 392
   completion_usage.py8188%17
semantic_kernel/connectors/ai/anthropic/services
   anthropic_chat_completion.py176597%147, 165, 169, 223, 419
semantic_kernel/connectors/ai/azure_ai_inference/services
   azure_ai_inference_chat_completion.py119794%120, 146–149, 159, 180, 202
   azure_ai_inference_text_embedding.py41198%86
semantic_kernel/connectors/ai/bedrock/services
   bedrock_chat_completion.py1361490%117, 138, 163, 167–170, 228, 246–265, 324
   bedrock_text_completion.py57296%95, 118
   bedrock_text_embedding.py45198%94
semantic_kernel/connectors/ai/bedrock/services/model_provider
   bedrock_ai21_labs.py13192%67
   bedrock_anthropic_claude.py12192%54
   bedrock_cohere.py20195%75
   utils.py802075%68, 71, 102, 106–115, 132–150, 171–174
semantic_kernel/connectors/ai/embeddings
   embedding_generator_base.py8188%50
semantic_kernel/connectors/ai/google/google_ai/services
   google_ai_chat_completion.py119497%126, 152, 175, 177
   google_ai_text_completion.py63297%98, 121
   utils.py65395%139, 159–164
semantic_kernel/connectors/ai/google/vertex_ai/services
   utils.py66395%140, 160–165
   vertex_ai_chat_completion.py119497%121, 147, 170, 172
   vertex_ai_text_completion.py62297%95, 116
semantic_kernel/connectors/ai/hugging_face/services
   hf_text_completion.py60395%103, 112, 127
   hf_text_embedding.py32584%79–83
semantic_kernel/connectors/ai/mistral_ai/services
   mistral_ai_chat_completion.py118794%118–121, 307–310
semantic_kernel/connectors/ai/ollama/services
   ollama_chat_completion.py1071190%114, 139, 143–144, 154, 186, 223, 233–234, 256, 283
   ollama_text_completion.py57395%93, 103, 130
   utils.py462546%29, 44–52, 64–86, 98–102, 119–122
semantic_kernel/connectors/ai/onnx
   utils.py53394%50–51, 112
semantic_kernel/connectors/ai/onnx/services
   onnx_gen_ai_chat_completion.py72790%67–68, 98, 122, 167, 173, 179
   onnx_gen_ai_completion_base.py582164%59–71, 79–90
   onnx_gen_ai_text_completion.py46589%54–55, 87, 117, 133
semantic_kernel/connectors/ai/open_ai/prompt_execution_settings
   open_ai_prompt_execution_settings.py95199%113
   open_ai_text_to_image_execution_settings.py36197%60
semantic_kernel/connectors/ai/open_ai/services
   azure_chat_completion.py103397%140, 149, 152
   open_ai_audio_to_text_base.py27389%38–39, 44
   open_ai_chat_completion_base.py127596%71, 121, 141, 177, 287
   open_ai_handler.py98694%133, 141–142, 154, 163–164
   open_ai_text_completion_base.py80298%56, 161
semantic_kernel/connectors/ai/open_ai/settings
   azure_open_ai_settings.py23483%104–107
semantic_kernel/connectors/memory/azure_ai_search
   azure_ai_search_collection.py1343375%171, 173, 246–284, 294–304, 308, 312, 317–320
   azure_ai_search_store.py42295%130–131
   utils.py66297%127, 129
semantic_kernel/connectors/memory/azure_cosmos_db
   azure_cosmos_db_no_sql_base.py46980%75, 78–79, 85, 95–99
   azure_cosmos_db_no_sql_collection.py1012575%98–103, 114–119, 123–130, 154, 187–200, 211–212
   azure_cosmos_db_no_sql_store.py30680%82–87
   utils.py58493%139, 153–156
semantic_kernel/connectors/memory/in_memory
   in_memory_collection.py1421689%65, 117, 119, 139, 149, 174, 189, 201, 219–222, 226, 228, 230–231
semantic_kernel/connectors/memory/qdrant
   qdrant_collection.py95298%262–263
   qdrant_store.py48296%139–140
semantic_kernel/connectors/memory/redis
   redis_collection.py163498%148, 153–154, 324
   redis_store.py42295%108–109
   utils.py451176%145–146, 164, 166, 173–188
semantic_kernel/connectors/memory/weaviate
   utils.py61493%85–90, 253
   weaviate_collection.py1303275%149–158, 162–182, 186–191, 275–280, 286
   weaviate_store.py591771%110–118, 122–127, 132–137, 142–143
semantic_kernel/connectors/openapi_plugin
   openapi_manager.py58297%110–111
   openapi_parser.py88298%71, 128
   openapi_runner.py105298%181–182
semantic_kernel/connectors/openapi_plugin/models
   rest_api_operation.py129199%242
semantic_kernel/contents
   audio_content.py18194%53
   function_call_content.py100298%185, 213
   streaming_chat_message_content.py68199%210
   streaming_content_mixin.py39295%37, 64
semantic_kernel/core_plugins/sessions_python_tool
   sessions_python_plugin.py134894%69, 82–91, 99
   sessions_python_settings.py39490%84–87
semantic_kernel/data
   search_filter.py25196%7
semantic_kernel/data/record_definition
   vector_store_record_utils.py28293%55, 57
semantic_kernel/data/text_search
   text_search.py72494%125, 165, 205, 293
   utils.py33779%23, 54–60, 69–70
   vector_store_text_search.py761778%167–174, 180–187, 192
semantic_kernel/data/vector_search
   vector_search_filter.py20195%6
   vector_text_search.py16194%45
   vectorizable_text_search.py15193%50
   vectorized_search.py15193%45
semantic_kernel/data/vector_storage
   vector_store.py16288%41, 51
   vector_store_record_collection.py2482092%65, 431, 491–495, 503–507, 547–550, 557–560
semantic_kernel/functions
   kernel_function_decorator.py98199%102
   kernel_function_from_method.py96199%153
   kernel_function_from_prompt.py154795%165–166, 180, 201, 219, 239, 322
   kernel_function_log_messages.py36683%37–43
   kernel_plugin.py199597%468, 471, 500, 521, 546
semantic_kernel/planners
   plan.py2344581%54, 163–165, 197, 214–227, 264, 269, 277–278, 288–291, 308, 313, 329, 332–337, 355, 360, 363, 365, 372, 386–388, 393–397
semantic_kernel/planners/function_calling_stepwise_planner
   function_calling_stepwise_planner.py116497%145, 189–190, 198
semantic_kernel/planners/sequential_planner
   sequential_planner.py64691%71, 75, 109, 125, 134–135
   sequential_planner_extensions.py50982%31–32, 56, 110–124
   sequential_planner_parser.py771284%66–74, 93, 117–120
semantic_kernel/processes
   process_builder.py683943%43–52, 56–58, 64–74, 78–85, 89–92, 96–100, 105, 109–114
   process_end_step.py19289%37, 41
   process_function_target_builder.py25388%37–40
   process_step_builder.py1052477%44, 89, 103, 110–123, 135–142, 151, 160–169, 178, 192, 209
   process_step_edge_builder.py35391%43, 58, 68
   process_types.py25196%35
semantic_kernel/processes/kernel_process
   kernel_process_step_context.py17194%37
semantic_kernel/processes/local_runtime
   local_kernel_process.py20290%23, 30
   local_kernel_process_context.py32294%66–67
   local_process.py1345261%92, 102, 120–130, 163–190, 194–199, 203, 207–213, 217–227, 231–232
   local_step.py17811436%61, 72, 81–169, 173, 177, 181–182, 187–249, 253–270, 274–277, 281–284, 288–297, 303–306, 310–312
semantic_kernel/prompt_template
   kernel_prompt_template.py78791%144–151
semantic_kernel/schema
   kernel_json_schema_builder.py131993%54, 93, 189, 197, 208, 216, 231, 235–236
semantic_kernel/services
   ai_service_client_base.py22195%64
semantic_kernel/template_engine/blocks
   code_block.py77199%119
   named_arg_block.py43198%98
semantic_kernel/utils/authentication
   entra_id_authentication.py15287%26, 38
semantic_kernel/utils/telemetry
   user_agent.py16288%18–19
semantic_kernel/utils/telemetry/model_diagnostics
   decorators.py171498%364–367
TOTAL1454988394% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
2845 4 💤 0 ❌ 0 🔥 1m 7s ⏱️

@TaoChenOSU TaoChenOSU removed the PR: in progress Under development and/or addressing feedback label Oct 28, 2024
Copy link
Member

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of small comments

create_database (bool): If True, the database will be created if it does not exist.
Defaults to False.
"""
super().__init__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've introduced a concept called managed_client in VectorStore and VectorStoreRecordCollection, this together with a context manager can be used to make sure we cleanup what we create, but not what someone else created, so have a look at Azure AI Search for the mechanics!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure AI Search actually uses two clients (one for the service, one for an index) so it introduces a additional one, that can be applied to the database creation

if isinstance(field, VectorStoreRecordVectorField):
vector_embedding_policy["vectorEmbeddings"].append({
"path": f'/"{field.name}"',
"dataType": "float32",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a vector field can also have Int as the datatype, which would map toint8: https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/vector-search#vector-indexing-policies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory python Pull requests for the Python Semantic Kernel
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Python: Updated Cosmos Db Memory Connector
5 participants