Fine-Tuning BGE Embeddings Using Synthetic Data from Amazon Bedrock

Getting Started

This guide assumes you have already set up a SageMaker Studio environment as described in the blog post. Follow these steps to set up your environment and run the code:

Clone the repository:

git clone https://github.com/aws-samples/finetune-bge-embeddings-blog.git
cd finetune-bge-embeddings-blog

Create the Conda environment:
```
conda env create -f environment.yml
```
This step may take several minutes to complete.

Activate the environment:

conda init
source ~/.bashrc 
conda activate ft-embedding-blog

Add the new Conda environment to Jupyter:

python -m ipykernel install --user --name=ft-embedding-blog

Open the Jupyter notebook:
- From the SageMaker Studio Launcher, open the repository folder named finetune-bge-embeddings-blog.
- Open the file finetune-bge-embeddings.ipynb.
Select the correct kernel:
- From the "Kernel" dropdown menu in the notebook, select "Change Kernel...".
- Choose "ft-embedding-blog".
- If you don't see the kernel, try refreshing your browser.
You're now ready to run the code in the notebook. Follow the instructions in each cell to generate synthetic data, fine-tune the BGE model, evaluate its performance, and deploy it using Amazon SageMaker.

Note: Make sure you have the necessary permissions and quotas set up in your AWS account to use Amazon Bedrock and SageMaker services as described in the blog post.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Fine-Tuning BGE Embeddings Using Synthetic Data from Amazon Bedrock

Getting Started

Security

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Fine-Tuning BGE Embeddings Using Synthetic Data from Amazon Bedrock

Getting Started

Security

License