You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm facing challenges with querying a dataset containing documents in multiple languages using Azure OpenAI with RAG, based on Azure AI Search on ADLS Gen2.
The issue:
When I ask a question in French, the system primarily relies on French documents in the dataset, even if English documents sometimes provide more relevant answers. Similarly, when I query in English, it tends to ignore the French documents.
What I need:
The assistant should always respond in the language of the query (e.g., French for a question in French), but it should consider all documents in the dataset (regardless of their language) to provide the best possible answer.
It should therefore handle translation seamlessly during the search or generation process.
Observations:
It seemed to work better with a smaller dataset, but as the dataset grows, the precision drops significantly (likely due to vector search limitations with multilingual embeddings).
With ~100 documents, I’m seeing inconsistent behavior, where some relevant documents are completely missed.
I tried:
Multiple search indexing configurations, including vector search with the text-embedding-ada-002 model.
System prompt adjustment using the AZURE_OPENAI_SYSTEM_MESSAGE key.
Despite these efforts, I can't achieve consistent or satisfactory results on larger and multilingual datasets.
Questions:
Has anyone managed to make RAG work effectively on large datasets with multilingual documents?
Are there best practices or additional configurations I should consider to improve the behavior in this scenario?
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered:
Hello everyone,
I'm facing challenges with querying a dataset containing documents in multiple languages using Azure OpenAI with RAG, based on Azure AI Search on ADLS Gen2.
The issue:
When I ask a question in French, the system primarily relies on French documents in the dataset, even if English documents sometimes provide more relevant answers. Similarly, when I query in English, it tends to ignore the French documents.
What I need:
Observations:
I tried:
Despite these efforts, I can't achieve consistent or satisfactory results on larger and multilingual datasets.
Questions:
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered: