-
Notifications
You must be signed in to change notification settings - Fork 343
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e01d16c
commit fcfd950
Showing
4 changed files
with
251 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 changes: 4 additions & 0 deletions
4
demo-python/code/index-backup-restore/azure-search-backup-and-restore-requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
python-dotenv | ||
azure-search-documents==11.6.0b1 | ||
azure-identity | ||
tqdm |
241 changes: 241 additions & 0 deletions
241
demo-python/code/index-backup-restore/azure-search-backup-and-restore.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,241 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Azure AI Search backup and restore sample\n", | ||
"\n", | ||
"This notebook demonstrates how to backup and restore a search index and migrate it to another instance of Azure AI Search. The target instance can be a different tier and configuration, but make sure it has available storage and quota, and that the [region has the features you require](https://azure.microsoft.com/explore/global-infrastructure/products-by-region/?products=search).\n", | ||
"\n", | ||
"### Prerequisites\n", | ||
"\n", | ||
"+ The search index you're backing up must have a `key` field that is `filterable` and `sortable`. If your document key doesn't meet this criteria, you can create and populate a new key field, and remove the `key=true` flag from the previous key field. \n", | ||
"\n", | ||
"+ Only fields marked as `retrievable` can be successfully backed up and restored. You can toggle `retrievable` between true and false on any field, but as of this writing, the Azure portal doesn't allow you to modify `retrievable` on vector fields. As a workaround, use an Azure SDK or Postman with an Update Index REST call.\n", | ||
"\n", | ||
" Setting `retrievable` to true doesn't increase index size. A `retrievable` action pulls from content that already exists in your index." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Set up a Python virtual environment in Visual Studio Code\n", | ||
"\n", | ||
"1. Open the Command Palette (Ctrl+Shift+P).\n", | ||
"1. Search for **Python: Create Environment**.\n", | ||
"1. Select **Venv**.\n", | ||
"1. Select a Python interpreter. Choose 3.10 or later.\n", | ||
"\n", | ||
"It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Install packages" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"! pip install -r azure-search-backup-and-restore-requirements.txt --quiet" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Load .env file (Copy .env-sample to .env and update accordingly)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from dotenv import load_dotenv\n", | ||
"from azure.identity import DefaultAzureCredential\n", | ||
"from azure.core.credentials import AzureKeyCredential\n", | ||
"import os\n", | ||
"\n", | ||
"load_dotenv(override=True) # take environment variables from .env.\n", | ||
"\n", | ||
"# Variables not used here do not need to be updated in your .env file\n", | ||
"source_endpoint = os.environ[\"AZURE_SEARCH_SERVICE_ENDPOINT\"]\n", | ||
"source_credential = AzureKeyCredential(os.environ[\"AZURE_SEARCH_ADMIN_KEY\"]) if len(os.environ[\"AZURE_SEARCH_ADMIN_KEY\"]) > 0 else DefaultAzureCredential()\n", | ||
"source_index_name = os.environ[\"AZURE_SEARCH_INDEX\"]\n", | ||
"# Default to same service for copying index\n", | ||
"target_endpoint = os.environ[\"AZURE_TARGET_SEARCH_SERVICE_ENDPOINT\"] if len(os.environ[\"AZURE_TARGET_SEARCH_SERVICE_ENDPOINT\"]) > 0 else source_endpoint\n", | ||
"target_credential = AzureKeyCredential(os.environ[\"AZURE_TARGET_SEARCH_ADMIN_KEY\"]) if len(os.environ[\"AZURE_TARGET_SEARCH_ADMIN_KEY\"]) > 0 else DefaultAzureCredential()\n", | ||
"target_index_name = os.environ[\"AZURE_TARGET_SEARCH_INDEX\"] " | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"This script demonstrates backing up and restoring an Azure AI Search index between two services. The `backup_and_restore_index` function retrieves the source index definition, creates a new target index, backs up all documents, and restores them to the target index." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from azure.search.documents import SearchClient \n", | ||
"from azure.search.documents.indexes import SearchIndexClient\n", | ||
"import tqdm \n", | ||
" \n", | ||
"def create_clients(endpoint, credential, index_name): \n", | ||
" search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential) \n", | ||
" index_client = SearchIndexClient(endpoint=endpoint, credential=credential) \n", | ||
" return search_client, index_client\n", | ||
"\n", | ||
"def total_count(search_client):\n", | ||
" response = search_client.search(include_total_count=True, search_text=\"*\", top=0)\n", | ||
" return response.get_count()\n", | ||
" \n", | ||
"def search_results_with_filter(search_client, key_field_name):\n", | ||
" last_item = None\n", | ||
" response = search_client.search(search_text=\"*\", top=100000, order_by=key_field_name).by_page()\n", | ||
" while True:\n", | ||
" for page in response:\n", | ||
" page = list(page)\n", | ||
" if len(page) > 0:\n", | ||
" last_item = page[-1]\n", | ||
" yield page\n", | ||
" else:\n", | ||
" last_item = None\n", | ||
" \n", | ||
" if last_item:\n", | ||
" response = search_client.search(search_text=\"*\", top=100000, order_by=key_field_name, filter=f\"{key_field_name} gt '{last_item[key_field_name]}'\").by_page()\n", | ||
" else:\n", | ||
" break\n", | ||
"\n", | ||
"def search_results_without_filter(search_client):\n", | ||
" response = search_client.search(search_text=\"*\", top=100000).by_page()\n", | ||
" for page in response:\n", | ||
" page = list(page)\n", | ||
" yield page\n", | ||
"\n", | ||
"def backup_and_restore_index(source_endpoint, source_key, source_index_name, target_endpoint, target_key, target_index_name): \n", | ||
" # Create search and index clients \n", | ||
" source_search_client, source_index_client = create_clients(source_endpoint, source_key, source_index_name) \n", | ||
" target_search_client, target_index_client = create_clients(target_endpoint, target_key, target_index_name) \n", | ||
" \n", | ||
" # Get the source index definition \n", | ||
" source_index = source_index_client.get_index(name=source_index_name)\n", | ||
" non_retrievable_fields = []\n", | ||
" for field in source_index.fields:\n", | ||
" if field.hidden == True:\n", | ||
" non_retrievable_fields.append(field)\n", | ||
" if field.key == True:\n", | ||
" key_field = field\n", | ||
"\n", | ||
" if not key_field:\n", | ||
" raise Exception(\"Key Field Not Found\")\n", | ||
" \n", | ||
" if len(non_retrievable_fields) > 0:\n", | ||
" print(f\"WARNING: The following fields are not marked as retrievable and cannot be backed up and restored: {', '.join(f.name for f in non_retrievable_fields)}\")\n", | ||
" \n", | ||
" # Create target index with the same definition \n", | ||
" source_index.name = target_index_name\n", | ||
" target_index_client.create_or_update_index(source_index)\n", | ||
" \n", | ||
" document_count = total_count(source_search_client)\n", | ||
" can_use_filter = key_field.sortable and key_field.filterable\n", | ||
" if not can_use_filter:\n", | ||
" print(\"WARNING: The key field is not filterable or not sortable. A maximum of 100,000 records can be backed up and restored.\")\n", | ||
" # Backup and restore documents \n", | ||
" all_documents = search_results_with_filter(source_search_client, key_field.name) if can_use_filter else search_results_without_filter(source_search_client)\n", | ||
"\n", | ||
" print(\"Backing up and restoring documents:\") \n", | ||
" failed_documents = 0 \n", | ||
" failed_keys = [] \n", | ||
" with tqdm.tqdm(total=document_count) as progress_bar: \n", | ||
" for page in all_documents:\n", | ||
" result = target_search_client.upload_documents(documents=page) \n", | ||
" progress_bar.update(len(result)) \n", | ||
" \n", | ||
" for item in result: \n", | ||
" if item.succeeded is not True: \n", | ||
" failed_documents += 1\n", | ||
" failed_keys.append(page[result.index_of(item)].id) \n", | ||
" print(f\"Document upload error: {item.error.message}\") \n", | ||
" \n", | ||
" if failed_documents > 0: \n", | ||
" print(f\"Failed documents: {failed_documents}\") \n", | ||
" print(f\"Failed document keys: {failed_keys}\") \n", | ||
" else: \n", | ||
" print(\"All documents uploaded successfully.\") \n", | ||
" \n", | ||
" print(f\"Successfully backed up '{source_index_name}' and restored to '{target_index_name}'\") \n", | ||
" return source_search_client, target_search_client, all_documents \n", | ||
"\n", | ||
"source_search_client, target_search_client, all_documents = backup_and_restore_index(source_endpoint, source_credential, source_index_name, target_endpoint, target_credential, target_index_name) \n" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The verify_counts function compares document counts between source and target indexes after backup and restore. It prints a message indicating if the document counts match or not." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def verify_counts(source_search_client, target_search_client): \n", | ||
" source_document_count = source_search_client.get_document_count() \n", | ||
" target_document_count = target_search_client.get_document_count() \n", | ||
" \n", | ||
" print(f\"Source document count: {source_document_count}\") \n", | ||
" print(f\"Target document count: {target_document_count}\") \n", | ||
" \n", | ||
" if source_document_count == target_document_count: \n", | ||
" print(\"Document counts match.\") \n", | ||
" else: \n", | ||
" print(\"Document counts do not match.\") \n", | ||
" \n", | ||
"# Call the verify_counts function with the search_clients returned by the backup_and_restore_index function \n", | ||
"verify_counts(source_search_client, target_search_client) \n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.3" | ||
}, | ||
"orig_nbformat": 4 | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters