Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilingual RAG Issue with Vector Search and Azure OpenAI Integration #1234

Open
reminegrier opened this issue Jan 16, 2025 · 0 comments
Open
Labels
bug Something isn't working

Comments

@reminegrier
Copy link

Hello everyone,

I'm facing challenges with querying a dataset containing documents in multiple languages using Azure OpenAI with RAG, based on Azure AI Search on ADLS Gen2.

The issue:

When I ask a question in French, the system primarily relies on French documents in the dataset, even if English documents sometimes provide more relevant answers. Similarly, when I query in English, it tends to ignore the French documents.

What I need:

  • The assistant should always respond in the language of the query (e.g., French for a question in French), but it should consider all documents in the dataset (regardless of their language) to provide the best possible answer.
  • It should therefore handle translation seamlessly during the search or generation process.

Observations:

  • It seemed to work better with a smaller dataset, but as the dataset grows, the precision drops significantly (likely due to vector search limitations with multilingual embeddings).
  • With ~100 documents, I’m seeing inconsistent behavior, where some relevant documents are completely missed.

I tried:

  • Multiple search indexing configurations, including vector search with the text-embedding-ada-002 model.
  • System prompt adjustment using the AZURE_OPENAI_SYSTEM_MESSAGE key.

Despite these efforts, I can't achieve consistent or satisfactory results on larger and multilingual datasets.

Questions:

  • Has anyone managed to make RAG work effectively on large datasets with multilingual documents?
  • Are there best practices or additional configurations I should consider to improve the behavior in this scenario?

Thank you in advance for your help!

@reminegrier reminegrier added the bug Something isn't working label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant