Multilingual RAG Issue with Vector Search and Azure OpenAI Integration #1234

reminegrier · 2025-01-16T14:38:25Z

Hello everyone,

I'm facing challenges with querying a dataset containing documents in multiple languages using Azure OpenAI with RAG, based on Azure AI Search on ADLS Gen2.

The issue:

When I ask a question in French, the system primarily relies on French documents in the dataset, even if English documents sometimes provide more relevant answers. Similarly, when I query in English, it tends to ignore the French documents.

What I need:

The assistant should always respond in the language of the query (e.g., French for a question in French), but it should consider all documents in the dataset (regardless of their language) to provide the best possible answer.
It should therefore handle translation seamlessly during the search or generation process.

Observations:

It seemed to work better with a smaller dataset, but as the dataset grows, the precision drops significantly (likely due to vector search limitations with multilingual embeddings).
With ~100 documents, I’m seeing inconsistent behavior, where some relevant documents are completely missed.

I tried:

Multiple search indexing configurations, including vector search with the text-embedding-ada-002 model.
System prompt adjustment using the AZURE_OPENAI_SYSTEM_MESSAGE key.

Despite these efforts, I can't achieve consistent or satisfactory results on larger and multilingual datasets.

Questions:

Has anyone managed to make RAG work effectively on large datasets with multilingual documents?
Are there best practices or additional configurations I should consider to improve the behavior in this scenario?

Thank you in advance for your help!

reminegrier added the bug Something isn't working label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multilingual RAG Issue with Vector Search and Azure OpenAI Integration #1234

Multilingual RAG Issue with Vector Search and Azure OpenAI Integration #1234

reminegrier commented Jan 16, 2025

Multilingual RAG Issue with Vector Search and Azure OpenAI Integration #1234

Multilingual RAG Issue with Vector Search and Azure OpenAI Integration #1234

Comments

reminegrier commented Jan 16, 2025