fixed links

Azure · Mar 4, 2024 · fcfd950 · fcfd950
1 parent e01d16c
commit fcfd950
Show file tree

Hide file tree

Showing 4 changed files with 251 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -12,10 +12,10 @@ Vector support consists of generally available features and preview features.
 |---------|--------|
 | [vector indexing](https://learn.microsoft.com/azure/search/vector-search-how-to-create-index) | generally available (2023-11-01 and stable SDK packages) |
 | [vector queries](https://learn.microsoft.com/azure/search/vector-search-how-to-query) | generally available (2023-11-01 and stable SDK packages)|
-| [integrated data chunking (Text Split skill)](https://learn.microsoft.com/azure/search/cognitive-search-skill-textsplit) | public preview (2023-10-01-preview and beta SDK packages) |
-| [integrated embedding (AzureOpenAIEmbedding skill()](https://learn.microsoft.com/azure/search/cognitive-search-skill-azure-openai-embedding) | public preview (2023-10-01-preview and beta SDK packages) |
-| [index projections](https://learn.microsoft.com/azure/search/index-projections-concept-intro) in skillsets | public preview (2023-10-01-preview and beta SDK packages) |
-| [vectorizer](https://learn.microsoft.com/azure/search/vector-search-how-to-configure-vectorizer) in index schema | public preview (2023-10-01-preview and beta SDK packages) |
+| [integrated data chunking](https://learn.microsoft.com/azure/search/cognitive-search-skill-textsplit) | public preview (2023-10-01-preview and beta SDK packages) |
+| [integrated embedding](https://learn.microsoft.com/azure/search/cognitive-search-skill-azure-openai-embedding) | public preview (2023-10-01-preview and beta SDK packages) |
+| [index projections](https://learn.microsoft.com/azure/search/index-projections-concept-intro) | public preview (2023-10-01-preview and beta SDK packages) |
+| [vectorizer](https://learn.microsoft.com/azure/search/vector-search-how-to-configure-vectorizer)a | public preview (2023-10-01-preview and beta SDK packages) |
 
 Preview features are available under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). 
 

diff --git a/demo-python/code/index-backup-restore/azure-search-backup-and-restore-requirements.txt b/demo-python/code/index-backup-restore/azure-search-backup-and-restore-requirements.txt
@@ -0,0 +1,4 @@
+python-dotenv
+azure-search-documents==11.6.0b1
+azure-identity
+tqdm
diff --git a/demo-python/code/index-backup-restore/azure-search-backup-and-restore.ipynb b/demo-python/code/index-backup-restore/azure-search-backup-and-restore.ipynb
@@ -0,0 +1,241 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Azure AI Search backup and restore sample\n",
+    "\n",
+    "This notebook demonstrates how to backup and restore a search index and migrate it to another instance of Azure AI Search. The target instance can be a different tier and configuration, but make sure it has available storage and quota, and that the [region has the features you require](https://azure.microsoft.com/explore/global-infrastructure/products-by-region/?products=search).\n",
+    "\n",
+    "### Prerequisites\n",
+    "\n",
+    "+ The search index you're backing up must have a `key` field that is `filterable` and `sortable`. If your document key doesn't meet this criteria, you can create and populate a new key field, and remove the `key=true` flag from the previous key field. \n",
+    "\n",
+    "+ Only fields marked as `retrievable` can be successfully backed up and restored. You can toggle `retrievable` between true and false on any field, but as of this writing, the Azure portal doesn't allow you to modify `retrievable` on vector fields. As a workaround, use an Azure SDK or Postman with an Update Index REST call.\n",
+    "\n",
+    "  Setting `retrievable` to true doesn't increase index size. A `retrievable` action pulls from content that already exists in your index."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Set up a Python virtual environment in Visual Studio Code\n",
+    "\n",
+    "1. Open the Command Palette (Ctrl+Shift+P).\n",
+    "1. Search for **Python: Create Environment**.\n",
+    "1. Select **Venv**.\n",
+    "1. Select a Python interpreter. Choose 3.10 or later.\n",
+    "\n",
+    "It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Install packages"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! pip install -r azure-search-backup-and-restore-requirements.txt --quiet"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load .env file (Copy .env-sample to .env and update accordingly)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dotenv import load_dotenv\n",
+    "from azure.identity import DefaultAzureCredential\n",
+    "from azure.core.credentials import AzureKeyCredential\n",
+    "import os\n",
+    "\n",
+    "load_dotenv(override=True) # take environment variables from .env.\n",
+    "\n",
+    "# Variables not used here do not need to be updated in your .env file\n",
+    "source_endpoint = os.environ[\"AZURE_SEARCH_SERVICE_ENDPOINT\"]\n",
+    "source_credential = AzureKeyCredential(os.environ[\"AZURE_SEARCH_ADMIN_KEY\"]) if len(os.environ[\"AZURE_SEARCH_ADMIN_KEY\"]) > 0 else DefaultAzureCredential()\n",
+    "source_index_name = os.environ[\"AZURE_SEARCH_INDEX\"]\n",
+    "# Default to same service for copying index\n",
+    "target_endpoint = os.environ[\"AZURE_TARGET_SEARCH_SERVICE_ENDPOINT\"] if len(os.environ[\"AZURE_TARGET_SEARCH_SERVICE_ENDPOINT\"]) > 0 else source_endpoint\n",
+    "target_credential = AzureKeyCredential(os.environ[\"AZURE_TARGET_SEARCH_ADMIN_KEY\"]) if len(os.environ[\"AZURE_TARGET_SEARCH_ADMIN_KEY\"]) > 0 else DefaultAzureCredential()\n",
+    "target_index_name = os.environ[\"AZURE_TARGET_SEARCH_INDEX\"] "
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This script demonstrates backing up and restoring an Azure AI Search index between two services. The `backup_and_restore_index` function retrieves the source index definition, creates a new target index, backs up all documents, and restores them to the target index."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azure.search.documents import SearchClient  \n",
+    "from azure.search.documents.indexes import SearchIndexClient\n",
+    "import tqdm  \n",
+    "  \n",
+    "def create_clients(endpoint, credential, index_name):  \n",
+    "    search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)  \n",
+    "    index_client = SearchIndexClient(endpoint=endpoint, credential=credential)  \n",
+    "    return search_client, index_client\n",
+    "\n",
+    "def total_count(search_client):\n",
+    "    response = search_client.search(include_total_count=True, search_text=\"*\", top=0)\n",
+    "    return response.get_count()\n",
+    "  \n",
+    "def search_results_with_filter(search_client, key_field_name):\n",
+    "    last_item = None\n",
+    "    response = search_client.search(search_text=\"*\", top=100000, order_by=key_field_name).by_page()\n",
+    "    while True:\n",
+    "        for page in response:\n",
+    "            page = list(page)\n",
+    "            if len(page) > 0:\n",
+    "                last_item = page[-1]\n",
+    "                yield page\n",
+    "            else:\n",
+    "                last_item = None\n",
+    "        \n",
+    "        if last_item:\n",
+    "            response = search_client.search(search_text=\"*\", top=100000, order_by=key_field_name, filter=f\"{key_field_name} gt '{last_item[key_field_name]}'\").by_page()\n",
+    "        else:\n",
+    "            break\n",
+    "\n",
+    "def search_results_without_filter(search_client):\n",
+    "    response = search_client.search(search_text=\"*\", top=100000).by_page()\n",
+    "    for page in response:\n",
+    "        page = list(page)\n",
+    "        yield page\n",
+    "\n",
+    "def backup_and_restore_index(source_endpoint, source_key, source_index_name, target_endpoint, target_key, target_index_name):  \n",
+    "    # Create search and index clients  \n",
+    "    source_search_client, source_index_client = create_clients(source_endpoint, source_key, source_index_name)  \n",
+    "    target_search_client, target_index_client = create_clients(target_endpoint, target_key, target_index_name)  \n",
+    "  \n",
+    "    # Get the source index definition  \n",
+    "    source_index = source_index_client.get_index(name=source_index_name)\n",
+    "    non_retrievable_fields = []\n",
+    "    for field in source_index.fields:\n",
+    "        if field.hidden == True:\n",
+    "            non_retrievable_fields.append(field)\n",
+    "        if field.key == True:\n",
+    "            key_field = field\n",
+    "\n",
+    "    if not key_field:\n",
+    "        raise Exception(\"Key Field Not Found\")\n",
+    "    \n",
+    "    if len(non_retrievable_fields) > 0:\n",
+    "        print(f\"WARNING: The following fields are not marked as retrievable and cannot be backed up and restored: {', '.join(f.name for f in non_retrievable_fields)}\")\n",
+    "  \n",
+    "    # Create target index with the same definition \n",
+    "    source_index.name = target_index_name\n",
+    "    target_index_client.create_or_update_index(source_index)\n",
+    "  \n",
+    "    document_count = total_count(source_search_client)\n",
+    "    can_use_filter = key_field.sortable and key_field.filterable\n",
+    "    if not can_use_filter:\n",
+    "        print(\"WARNING: The key field is not filterable or not sortable. A maximum of 100,000 records can be backed up and restored.\")\n",
+    "    # Backup and restore documents  \n",
+    "    all_documents = search_results_with_filter(source_search_client, key_field.name) if can_use_filter else search_results_without_filter(source_search_client)\n",
+    "\n",
+    "    print(\"Backing up and restoring documents:\")  \n",
+    "    failed_documents = 0  \n",
+    "    failed_keys = []  \n",
+    "    with tqdm.tqdm(total=document_count) as progress_bar:  \n",
+    "        for page in all_documents:\n",
+    "            result = target_search_client.upload_documents(documents=page)  \n",
+    "            progress_bar.update(len(result))  \n",
+    "  \n",
+    "            for item in result:  \n",
+    "                if item.succeeded is not True:  \n",
+    "                    failed_documents += 1\n",
+    "                    failed_keys.append(page[result.index_of(item)].id)  \n",
+    "                    print(f\"Document upload error: {item.error.message}\")  \n",
+    "  \n",
+    "    if failed_documents > 0:  \n",
+    "        print(f\"Failed documents: {failed_documents}\")  \n",
+    "        print(f\"Failed document keys: {failed_keys}\")  \n",
+    "    else:  \n",
+    "        print(\"All documents uploaded successfully.\")  \n",
+    "  \n",
+    "    print(f\"Successfully backed up '{source_index_name}' and restored to '{target_index_name}'\")  \n",
+    "    return source_search_client, target_search_client, all_documents  \n",
+    "\n",
+    "source_search_client, target_search_client, all_documents = backup_and_restore_index(source_endpoint, source_credential, source_index_name, target_endpoint, target_credential, target_index_name)  \n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The verify_counts function compares document counts between source and target indexes after backup and restore. It prints a message indicating if the document counts match or not."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def verify_counts(source_search_client, target_search_client):  \n",
+    "    source_document_count = source_search_client.get_document_count()  \n",
+    "    target_document_count = target_search_client.get_document_count()  \n",
+    "  \n",
+    "    print(f\"Source document count: {source_document_count}\")  \n",
+    "    print(f\"Target document count: {target_document_count}\")  \n",
+    "  \n",
+    "    if source_document_count == target_document_count:  \n",
+    "        print(\"Document counts match.\")  \n",
+    "    else:  \n",
+    "        print(\"Document counts do not match.\")  \n",
+    "  \n",
+    "# Call the verify_counts function with the search_clients returned by the backup_and_restore_index function  \n",
+    "verify_counts(source_search_client, target_search_client)  \n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.3"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/demo-python/readme.md b/demo-python/readme.md
@@ -10,16 +10,16 @@ Once you understand the basics, continue with the following notebooks for more e
 
 | Sample | Description |
 |--------|-------------|
-| [backup-restore](./code/backup-restore/azure-search-backup-and-restore.ipynb) | Backup retrievable index fields and restore them on a new index on a different search service. |
 | [basic-vector-workflow](./code/basic-vector-workflow/azure-search-vector-python-sample.ipynb) | Basic vector indexing and queries using push model APIs. **Start here**. |
 | [community-integration/hugging-face](./code/community-integration/hugging-face/azure-search-vector-python-huggingface-model-sample.ipynb)  | Vectorize using the Hugging Face [E5-small-V2](https://huggingface.co/intfloat/e5-small-v2) embedding model. |
 | [community-integration/langchain](./code/community-integration/langchain/azure-search-vector-python-langchain-sample.ipynb) | LangChain integration using the [Azure AI Search vector store integration module](https://python.langchain.com/docs/integrations/vectorstores/azuresearch). |
 | [community-integration/llamaindex](./code/community-integration/llamaindex/azure-search-vector-python-llamaindex-sample.ipynb) | LlamaIndex integration. |
 | [custom-vectorizer](./code/custom-vectorizer/azure-search-custom-vectorization-sample.ipynb) | Use an open source embedding model such as Hugging Face sentence-transformers [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) to vectorize content and queries. This sample uses azd and bicep to deploy Azure resources for a fully operational solution. It uses a custom skill with a function app that calls an embedding model. |
 | [data-chunking](./code/data-chunking) | Examples used in the in [Chunk documents](https://learn.microsoft.com/azure/search/vector-search-how-to-chunk-documents) article on the documentation web site. |
+| [index-backup-restore](./code/index-backup-restore/azure-search-backup-and-restore.ipynb) | Backup retrievable index fields and restore them on a new index on a different search service. |
 | [integrated-vectorization](./code/integrated-vectorization/azure-search-integrated-vectorization-sample.ipynb) | Demonstrates integrated data chunking and vectorization (preview) using skills to split text and call an Azure OpenAI embedding model. |
 | [multimodal](./code/multimodal/azure-search-vector-image-index-creation-python-sample.ipynb) | Vectorize images using [Azure AI Vision multimodal embedding](https://learn.microsoft.com/azure/ai-services/computer-vision/how-to/image-retrieval). In contrast with the multimodal-custom-skill example, this notebook uses the push API (no indexers or skillsets) for indexing. It calls the embedding model directly for a pure image vector search.  |
-| [multimodal-custom-skill](./code/multimodal-custom-skill/azure-search-custom-vectorization-sample.ipynb) | End-to-end text-to-image sample that creates and calls a custom embedding model using a custom skill. Includes source code for an Azure function that calls the [Azure AI Vision Image Retrieval REST API](https://learn.microsoft.com/rest/api/computervision/image-retrieval) for text-to-image vectorization. Includes an azure-search-vector-image notebook for all steps, from deployment to queries. |
+| [multimodal-custom-skill](./code/multimodal-custom-skill/azure-search-vector-image-python-sample.ipynb) | End-to-end text-to-image sample that creates and calls a custom embedding model using a custom skill. Includes source code for an Azure function that calls the [Azure AI Vision Image Retrieval REST API](https://learn.microsoft.com/rest/api/computervision/image-retrieval) for text-to-image vectorization. Includes an azure-search-vector-image notebook for all steps, from deployment to queries. |
 
 ## Prerequisites