-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue on basic string operations #742
Comments
I think the most of time was spent on Python->C++->Python conversion and new/delete objects in C++. Is there any real case that a model need more string operations than MatMul and other operations, or is it just a test? FYI, the input string should UTF-8 encoded. |
Hello @wenbingl sorry for the delay in the answer, Actually, this is something we are experiencing in production, the preprocessing of String takes 70% of the CPU of the total. (with unstack, upper, and stack methods) Our issue is in Scala but seems reproducible in Python, so I will just give you a minimal example in Python to reproduce. Using two benchmarks, one doing the
(here is the benchmark reproducer) Regarding the performances, I could give it a look using some profiling on the C++ code if you think this could be a valuable contribution! Thanks for your time and help |
Yes, the C++ profiling will be very helpful to see how much time was spent on this upper function, and ORT session. Then we can decide on the next steps. |
Hello,
First of all thank you for this amazing project.
We have a few questions regarding string manipulation in ONNX runtime extensions. Specifically, we are trying to incorporate simple string manipulations directly in our ONNX models, such as "string_upper" and "string_join".
However, we have observed a significant impact on performance. These operations appear to be unexpectedly expensive as it seems more expensive than matrice multiplication (our string are always rather short, less than 30-40 characters). For instance, adding a "string_upper" operation on five features increases the inference time by a factor of 3 for a batch size of 1, and doubles the inference time for a batch size of 10 on our benchmarks.
Even worse, but expected, when not fully vectorised (for example we want to string upper some features and lower some others) those times adds up practically linearly (processing 1 vector with 10 features is almost 10 times faster than splitting this vector and applying it separately on those 10 features). It can be a real performance issue if we want to apply different simple transforms based on the feature.
We generated a benchmark to show you this you can regenerate using the linked python notebook.
Benchmark_example.ipynb.zip
Here is the profiing with split/unsplit string upper operation. We can see that a simple operation like toUpper takes by itself time than the whole MLP.:
We suspect that the high cost may be due to the overhead of copying. Is there a reason we do all those copies ? Why are we obligated to do so ?
PS: We have verified that we are correctly using en_US.utf8.
Have you encountered this issue before? Is this performance impact expected? Could you provide any insights or recommendations on how to optimize these operations?
Thanks a lot
The text was updated successfully, but these errors were encountered: