You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When upgrading tantivy from 0.21 to 0.22 (used within tantivy-py), we observed a performance regression in query parsing. Rolling back resolved the issue. The issue occurred with queries that were deeply nested.
Reproducer
It's been difficult to get an exact reproducer, but I think I have a small example that demonstrates the issue:
It turns out, on 0.21, we had been benefitting from the faster path that happens when a leading + is in front of the query text. In 0.22, and in main branch, that benefit is gone. With 0.22, we observed some queries that took several minutes to parse. It is somewhat difficult to get those exact queries, but if necessary I can try.
It is also clear from this simple benchmark that the query parsing time is exponential in the depth of nesting, even for the very simple query above. That is its own issue, but my main question is whether the performance profile of 0.21 can be restored in 0.22.x, or main branch. As is, with 0.22 we will be unable to upgrade without additional work on our query generation code.
Overview
When upgrading tantivy from 0.21 to 0.22 (used within tantivy-py), we observed a performance regression in query parsing. Rolling back resolved the issue. The issue occurred with queries that were deeply nested.
Reproducer
It's been difficult to get an exact reproducer, but I think I have a small example that demonstrates the issue:
It turns out, on 0.21, we had been benefitting from the faster path that happens when a leading
+
is in front of the query text. In 0.22, and in main branch, that benefit is gone. With 0.22, we observed some queries that took several minutes to parse. It is somewhat difficult to get those exact queries, but if necessary I can try.It is also clear from this simple benchmark that the query parsing time is exponential in the depth of nesting, even for the very simple query above. That is its own issue, but my main question is whether the performance profile of 0.21 can be restored in 0.22.x, or main branch. As is, with 0.22 we will be unable to upgrade without additional work on our query generation code.
Here are the files to reproduce the project:
On rust 1.81, linux x86_64, I run the test case like this:
The text was updated successfully, but these errors were encountered: