Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support string type inputs from Python application code. #2629

Merged
merged 66 commits into from
Jan 29, 2024

Conversation

negiyas
Copy link
Collaborator

@negiyas negiyas commented Nov 20, 2023

This PR supports string type inputs from Python application code.

  • This PR supports utils/RunONNXModel.py and test/backend/inference_backend.py except the JNI cases. we assume that numpy.arrays of strings (e.g. numpy.array([["abc", "d"], ["e", "fghi"]], dtype=np.str_)).
  • This PR enables two backend tests with string inputs ("test_equal_string_cpu test_equal_string_broadcast_cpu") by "cmake check-onnx-backend."
  • JNI interface with string type inputs is not supported, so that tests with string type inputs are skipped with the emit JNI option.
  • This PR is similar to PR#2478. The objective might be the same, but the approach to use “object” type for string is different.

The basic strategies are as follows.

  • At Python side, use dtype=object instead of dtype=np.str_ to handle string values as pointers in order to avoid that they are handled as an array of constant-length strings. No limitations with string length.
  • At C++ runtime side,
  1. Investigate if the dtype is object in order to check the original data type is string or not. (c.f. Pybind11 dtype caster does not accept strings or type objects pybind/pybind11#1538 )
  2. Convert the array to a vector of strings (e.g. by inputPyArray.cast<std::vector<std::string>>(); for 1D array).
  3. Generate OMTensor for each argument by allocating buffers for the strings and copying the strings to the allocated buffer (c.f. strdup is used in the code.) in order to convert c++ vector type multi-dimentional array into c-type one-dimentional array (used in the OMTensor structure).
  4. Invoke the compiled code with the generated OMTensor(s) as input arguments.
  5. Free buffers for the strings in omTensorDestroy.

(Additional comments)

  • Confirmed that onnx-mlir can run bidaf-9 model in the model zoo correctly with this PR. (C.f. Refence input files in the model zoo is not correct, input files of the second and third arguments (test_data_set_?/input_{1,2}.pb) should be swapped.)

Comments on the basic strategy and testing policies are very welcome.

image

@negiyas negiyas marked this pull request as draft November 20, 2023 08:04
@negiyas negiyas marked this pull request as ready for review November 21, 2023 23:38
@negiyas negiyas changed the title [WIP] Support string type inputs from Python application code. Support string type inputs from Python application code. Nov 21, 2023
@negiyas negiyas requested a review from gongsu832 November 22, 2023 04:13
@negiyas
Copy link
Collaborator Author

negiyas commented Nov 22, 2023

@gongsu832 This PR does not support JNI. Can you kindly give me comments to support to enable the JNI part, or create another PR to support string inputs with JNI?

@gongsu832
Copy link
Collaborator

@gongsu832 This PR does not support JNI. Can you kindly give me comments to support to enable the JNI part, or create another PR to support string inputs with JNI?

Let's first decide how users would typically input a tensor of strings for the languages we support. For example, let's say a model specifies an input tensor [d1, d2, d3] of strings, how would a user input it?

  • for C/C++, it could be a multi-dimensional array of pointers, each pointing to a string, e.g., char *tensor[d1][d2][d3];
  • for Java, it could be a multi-dimensional array of strings, e.g., Strings[d1][d2][d3] tensor;
  • for Python, I'm not very familiar with pybind11, how would a user input such a string tensor in Python?

Regardless of how the string tensor is inputted in various languages, ultimately they will all be converted to the OMTensor expected input format, which is an one-dimensional d1*d2*d3 array with shape info. For C/C++, multi-dimensional and one-dimensional arrays don't really differ much since the array elements are all stored contineously in memory in both cases. So conversion between the two is straightforward. But for Java (and for Python also I suspect) the memory layout for multi-dimensional and one-dimensional arrays are very different. So in order to minimize copying, the choice of the input format matters quite a bit. In Java, so far for non-string data, we have used one-dimensional array input, combined with bytebuffer backing, to avoid having to copy input data when crossing Java/native boundary. But I'm not yet sure if we can do the same for string data so I'll have to think a bit. I think we need to do the same for Python, i.e., carefully consider the input format to minimize copying when crossing Python/native boundary.

Also, I would not change the meaning of the _owning flag because of the string tensor. When _owning is true, it means the API will free the tensor data. For string tensor, data means the pointer array plus the strings themselves. So freeing tensor data means freeing both. There shouldn't be a case where the pointer array is freed but the strings are not, or the strings are freed but the pointer array is not.

@negiyas
Copy link
Collaborator Author

negiyas commented Nov 24, 2023

@gongsu832 Thanks for your variable comments.

Let's first decide how users would typically input a tensor of strings for the languages we support. For example, let's say a model specifies an input tensor [d1, d2, d3] of strings, how would a user input it?
* for C/C++, it could be a multi-dimensional array of pointers, each pointing to a string, e.g., char *tensor[d1][d2][d3];
* for Java, it could be a multi-dimensional array of strings, e.g., Strings[d1][d2][d3] tensor;
* for Python, I'm not very familiar with pybind11, how would a user input such a string tensor in Python?

Yes. Although this PR focuses to support Python, but other languages should be considered.
For Python, we assume that numpy.arrays of strings (e.g. numpy.array([["abc", "d"], ["e", "fghi"]], dtype=np.str_)). (I added this info. into the description of this PR.)

Regardless of how the string tensor is inputted in various languages, ultimately they will all be converted to the OMTensor expected input format, which is an one-dimensional d1d2d3 array with shape info.

For Python, "pybind11" converts muti-dimentional Python arrays into the multi-dimentional std::vector types in c++(e.g. std::vector<std::vector<std::string>> for 2D), which are different from the OMTensor style.
And this PR uses pointers of strings directly for string data themselves to when crossing Python/native boundary.

So conversion between the two is straightforward. But for Java (and for Python also I suspect) the memory layout for multi-dimensional and one-dimensional arrays are very different.

The code copies strings in the C++ vector types into one-dimentional array with shape info (1) in order to change the array style (from C++ multi-dim vector to C one-dim array) and (2) duplicate buffers for the string data for keeping the string buffers until "omTensorDestroy." This is only one copy for each string. I agree that copies should be minimized, and I suppose that this copy in PR are minimum and mandatory.

Also, I would not change the meaning of the _owning flag because of the string tensor. When _owning is true, it means the API will free the tensor data. For string tensor, data means the pointer array plus the strings themselves. So freeing tensor data means freeing both. There shouldn't be a case where the pointer array is freed but the strings are not, or the strings are freed but the pointer array is not.

Yes. exactly. I suppose that there are no such cases now, but I am not sure especially about other language cases.
It might to be better to use bitmasks to manage buffers for pointer arrays and strings independently.
Anyway, we need to manage string data themselves by the OMTensor structure to keep them until "omTensorDestroy", I suppose.

Thank you for your useful comments. I am not sure my comments answers all of your questions, but more comments and suggestions are very welcome!

@gongsu832
Copy link
Collaborator

I think we should make the string tensor behave as similar to non-string tensor as possible. Currently, C/C++ and Java APIs use the combination of an one-dimensional data array with a shape array instead of a multi-dimensional array for the input/output tensors . This has the advantage that the one-dimensional data array is tightly packed in contiguous memory and therefore they can be passed into the model runtime via OMTensor struct without having to copy the data array. As I mentioned before, one-dimensional vs multi-dimensional array doesn't make much difference in C/C++ but matters quite a bit in Java. I I suspect it matters too in Python. I don't know why we used a multi-dimensional array with the Python API. If I were doing it, I would have used one-dimensional data array + shape array as well for Python, if only just for being consistent with other languages.

So back to the string tensor, I think it should also use a combination of one-dimensional string array plus a shape array for input/output string tensors, i.e., something like [ "foo", "bar", ... ] (string data) + [ 2, 3, 4 ] (shape). Internally, this is represented as an one-dimensional array of pointers to string data (with a total of 2*3*4=24 pointers and strings). This makes a string tensor behave "almost identical" to an int64 tensor, with the difference being that after you get the int64 element you need to dereference it to get the actual string data. This representation of string tensor should in most cases allow passing it by manipulating the pointers therefore avoid having to copy the actual string data.

Regarding the _owning flag, I don't think it has anything to do with the frontend language being used. It's a flag for the model runtime API, which is in C.

@negiyas
Copy link
Collaborator Author

negiyas commented Nov 27, 2023

Thank you for the comments!

I think we should make the string tensor behave as similar to non-string tensor as possible. Currently, C/C++ and Java APIs use the combination of an one-dimensional data array with a shape array instead of a multi-dimensional array for the input/output tensors . This has the advantage that the one-dimensional data array is tightly packed in contiguous memory and therefore they can be passed into the model runtime via OMTensor struct without having to copy the data array. As I mentioned before, one-dimensional vs multi-dimensional array doesn't make much difference in C/C++ but matters quite a bit in Java. I I suspect it matters too in Python. I don't know why we used a multi-dimensional array with the Python API. If I were doing it, I would have used one-dimensional data array + shape array as well for Python, if only just for being consistent with other languages.

I suppose that this PR uses one-dementional data array (of pointers to strings), but does not use multi-dimentional array.
In order to keep the one-dimentional data array structure, this PR copies string data pointers in from the multi-dimentional data array (generated by "pybind11") to the one-dimentional array (used in the original OMTensor structure).
(I inserted some words in the description, I am sorry if my descriptions were not enough.)

In this PR, string data are kept in the one-dimentional data array, and shapes are kept in the shape structure. It is same to numerical data cases. The only difference is that the one-dimentional data array keeps pointers instead of real values, since the data lengths of string elements in an array are different unlike numerical data cases.

So back to the string tensor, I think it should also use a combination of one-dimensional string array plus a shape array for input/output string tensors, i.e., something like [ "foo", "bar", ... ] (string data) + [ 2, 3, 4 ] (shape). Internally, this is represented as an one-dimensional array of pointers to string data (with a total of 2*3*4=24 pointers and strings). This makes a string tensor behave "almost identical" to an int64 tensor, with the difference being that after you get the int64 element you need to deference it to get the actual string data. This representation of string tensor should in most cases allow passing it by manipulating the pointers therefore avoid having to copy the actual string data.

I suppose that this PR keeps the one-dimentional array structure. A string tensor behaves "almost identical" to an int64 tensor as described in this comment.

Regarding the _owning flag, I don't think it has anything to do with the frontend language being used. It's a flag for the model runtime API, which is in C.

I understand the idea, but onnx-mlir runtime code needs to hold the string data (not pointers) in the OMTensor structure until "omTensorDestroy" is called, because Python applications free string data used for an OMTensor for the first argument before generating another OMTensor for the next argument.

Thank you very much for the comments, and I am sorry if my explanations were not enough or I do not understand your comments. Your comments are important and very welcome to me.

@gongsu832
Copy link
Collaborator

I suppose that this PR uses one-dementional data array (of pointers to strings), but does not use multi-dimentional array. In order to keep the one-dimentional data array structure, this PR copies string data pointers in from the multi-dimentional data array (generated by "pybind11") to the one-dimentional array (used in the original OMTensor structure). (I inserted some words in the description, I am sorry if my descriptions were not enough.)

In this PR, string data are kept in the one-dimentional data array, and shapes are kept in the shape structure. It is same to numerical data cases. The only difference is that the one-dimentional data array keeps pointers instead of real values, since the data lengths of string elements in an array are different unlike numerical data cases.

What I meant is that the Python interface, instead of using a multi-dimensional numpy array, should have used an one dimensional array plus a shape array, like the other languages.

I understand the idea, but onnx-mlir runtime code needs to hold the string data (not pointers) in the OMTensor structure until "omTensorDestroy" is called, because Python applications free string data used for an OMTensor for the first argument before generating another OMTensor for the next argument.

I'm not sure I understand this. Can you provide an example?

* @return pointer to OMTensor created, NULL if creation failed.
*
*/
#define OMTENSOR_OWNING_DATA_PTR 1
#define OMTENSOR_OWNING_DATA_PTR_AND_STRING 2
Copy link
Collaborator

@chentong319 chentong319 Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change can support pointers to data type other than string, right?
The ownership is kind of pointer depth for data_ptr
0: data_ptr
1: *data_ptr
2: *(*data_ptr+i)

For the input tensor of string type, I feel that its OMTensor owns *data_ptr like other type of tensor, but not **data_ptr because those objects are allocated by Python. Is my assumption correct?

Is there any case in onnx operation that a totally new string will be generated? Or we just copy the ptr of string to new tensor, in a similar way that we copy integer or float data?

Copy link
Collaborator

@gongsu832 gongsu832 Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _owning flag tells whether the generated model runtime owns the tensor data. It's not related to whether a language wrapper like JNI or pybind11 creates its own copy of the tensor data for whatever purpose before calling the runtime OMTensorList/OMTensor API. Those copies should be manage by the wrapper itself, not the model runtime. So typically, for input tensors, the _owning flag should always be false and for output tensors the _owning flag should always be true. The one exception where the _owning flag is false for output tensors is when the tensor data is static and cannot be freed.

This is the reason why I'm not sure why we may have the cases here where the pointers and string data may be owned by different parties and asked @negiyas for an example.

Copy link
Collaborator

@chentong319 chentong319 Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation stores char * pointer in the Tensor of string. If a Tensor of string and the strings are created inside the compiler, we will have to allocate the space for data_ptr and for the strings. That will be a new case for the ownership. If the Tensor of strings are just to propagate pointers from one tensor to another, the ownership will be the same as the float or integer data type.

Copy link
Collaborator

@gongsu832 gongsu832 Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean there can be multiple output string tensors sharing the same string data? Internally how the strings are passed around doesn't really matter. What matters is how the strings are passed out of the model runtime in the output tensor(s).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gongsu832 Thanks for the comments. I changed the code according to your comments. I updated the code to use the OMTensor->_allocated buffer for string data themselves. And there are no modifications on src/Runtime/OMTensor.inc from the original code now.

The string data by allocated by Python applications are copied into memory area in the OMTensor->_allocated buffer, which follows the area for array of string pointers (in the PyExecutionSessionBase::pyRun function in PyExecutionSessionBase.cpp). The _allocated buffer is freed in the mTensorDestroy like other cases.

I hope that this update can answer your concerns related to the OMTensor->_owning flag.
Please let me know your additional comments and suggestions.

(Let me explain my thought about the multi-dimensional array in Python by another comments.)

@negiyas
Copy link
Collaborator Author

negiyas commented Nov 28, 2023

What I meant is that the Python interface, instead of using a multi-dimensional numpy array, should have used an one dimensional array plus a shape array, like the other languages.

Thanks. I have got the point now. My comments are as follows.

(1) The current implementation uses multi-dimensional numpy for input/output Tensors, that is a standard array format in Python. This PR simply follows the current one for the numerical data cases, does not change the policy including multi-dimensional array treatment.
(2) "numpy" in Python originally has the shape attribute for managing array shape explicitly as same to the OMTensor. The shape attribute defines the shape of numpy. Users gets the array's shape by referring the shape attribute, and changes the shape by modifying the shape attribute.
(3) The "pybind" module and current onnx-mlir runtime module use the numpy shape attribute of input/output arrays to pass their shapes. To utilize them, we need to use a standard numpy shape attribute for multi-dimensional arrays.

@gongsu832
Copy link
Collaborator

What I meant is that the Python interface, instead of using a multi-dimensional numpy array, should have used an one dimensional array plus a shape array, like the other languages.

Thanks. I have got the point now. My comments are as follows.

(1) The current implementation uses multi-dimensional numpy for input/output Tensors, that is a standard array format in Python. This PR simply follows the current one for the numerical data cases, does not change the policy including multi-dimensional array treatment. (2) "numpy" in Python originally has the shape attribute for managing array shape explicitly as same to the OMTensor. The shape attribute defines the shape of numpy. Users gets the array's shape by referring the shape attribute, and changes the shape by modifying the shape attribute. (3) The "pybind" module and current onnx-mlir runtime module use the numpy shape attribute of input/output arrays to pass their shapes. To utilize them, we need to use a standard numpy shape attribute for multi-dimensional arrays.

I understand you are trying to follow the current way of using numpy multi-dimensional array for the Python wrapper. But that's exactly what I'm questioning, that whether this is the best way of doing it. I think using a multi-dimensional numpy array actually makes things unnecessarily complicated because you have to "flatten" it to what OMTensor expects, an one-dimensional array. If you had used an one-dimensional array (plus a shape array) to begin with, pybind11 probably would have converted it to a vector instead of vector of vector. So there is no need to "flatten" it further. In addition, you would also be able to handle arbitrary number of dimensions instead of hardcode a maximum 4 dimensions in PyExecutionSessionBase.cpp, which looks rather hacky.

…instead of buffers besides the _allocated buffer.

Signed-off-by: Yasushi Negishi <[email protected]>
Signed-off-by: Yasushi Negishi <[email protected]>
Signed-off-by: Yasushi Negishi <[email protected]>
@negiyas
Copy link
Collaborator Author

negiyas commented Nov 28, 2023

I understand you are trying to follow the current way of using numpy multi-dimensional array for the Python wrapper. But that's exactly what I'm questioning, that whether this is the best way of doing it.

Because "pybind11" uses shape defined by the shape attribute of numpy, we cannot flatten multi-dimensional arrays in Python applications to keep the original shape for pybind11.

I think using a multi-dimensional numpy array actually makes things unnecessarily complicated because you have to "flatten" it to what OMTensor expects, an one-dimensional array. If you had used an one-dimensional array (plus a shape array) to begin with, pybind11 probably would have converted it to a vector instead of vector of vector. So there is no need to "flatten" it further. In addition, you would also be able to handle arbitrary number of dimensions instead of hardcode a maximum 4 dimensions in PyExecutionSessionBase.cpp, which looks rather hacky.

Yes, exactly. As described in your comments, for the numerical data cases, we need not flatten multi-dimensional array manually, I tried to follow the same way for string data cases, but I could not follow the same way for string data because of a pybind11 issue ( pybind/pybind11#1538. This page shows a solution but it does not work for our case). I will update the code if we find better way to avoid the issue or pybind11 fixes the issue.

I appended the following comments in the code to explain the situation.

      //
      // Convert multi-dimensional array (for string data pointers) to
      // one-dimensional array to mange multi-dimensional array in an
      // integrated way.
      //
      // For numerical array, pybind11 can convert multi-dimensional array into
      // one-dimensional array without manual conversion, but pybind11 has
      // an issue about "dtype caster does not accept strings or type objects"
      // (https://github.com/pybind/pybind11/issues/1538). The issue page
      // shows a solution to avoid the bug, but it does not work for our case.
      // The following part solves the issue temporally, and will be updated
      // if we find better way to avoid the issue or pybind11 fixes the issue.
      //

@gongsu832
Copy link
Collaborator

The current PyExecutionSessionBase::pyRun function

std::vector<py::array> PyExecutionSessionBase::pyRun(const std::vector<py::array> &inputsPyArray)

takes a vector of py::array, which is converted from numpy.ndarray by pybind11. What I was trying to say is that there is nothing that says we have to do this. We could always make the function take two vectors, one is the inputsPyArray like we have now but the original numpy.ndarray is flattened, and the other a new shapesPyArray that carries the shape of the original unflattened numpy.ndarray.

This way you will be able to handle arbitrary dimension without having to hardcode a maximum supported dimension. That part of the code is really ugly. Yes this means that more test code will have to be changed, e.g., code like

outputs = session.run(inputs)

will need to be changed to something like

outputs = session.run(map(lambda t: t.flatten(), inputs), map(lambda t: t.shape, inputs))

But IMO this is the right way to do it.

@negiyas
Copy link
Collaborator Author

negiyas commented Jan 26, 2024

@AlexandreEichenberger Thanks for your thorough comments.

I have a bigger question, which I would like to have addressed.

Let me answer this bigger question at first. I will answer other questions next.
Basically speaking, I updated this PR, which preserves the external Python interfaces and still has the same internal("pybind") interface with three arguments.

Before: we had OMExecutionSession and OMCompileExecutionSession python classes to perform execution and compilation&execution.
Then you also added a new python class, PyOMExecutionSession that provides back the 1 parameter run method.

Thanks for the summary. It is correct.

2 Questions:
Why is there not also a PyOMCompileExecutionSession? It feels just as needed as for the execution alone?
Do you ever anticipate the user to need to access the old classes (the one with the 3-argument run class)?

The answers are yes and yes.

As a result of that, there is no need to change any of the existing code base as we preserve the same (external) interfaces of having only 1 parameter for the run operation.
Now you have added (for reasons that @gongsu832 and you agreed on, I assume) to modify the run method of these 2 python class to take 3 parameters instead of 1.

Yes. I agreed with @gongsu832 to have class with 3-argument run class for execution, and developed this PR based on your summary. Now I have realized that we can develop this PR, and have updated it, which preserves the external Python interfaces, and still has the same internal("pybind") interface with three arguments.

  1. Change the PyRuntime/PyCompileAndRuntime module names to PyRuntimeC/PyCompileAndRuntimeC, which have
    OMExecutionSession/OMCompileExecutionSession class with 3-argument "run" function.
  2. Introduce PyRuntime.py/PyCompileAndRuntime.py, which has OMExecutionSession/OMCompileExecutionSession class with 1-argument run function, which preserve the existing Python interfaces fully.

I hope that the latest PR answers most comments of @AlexandreEichenberger and @gongsu832.
Further comments on the latest PR are very welcome!

Quick question: does this new 2 step interface have an impact on performance? E.g. does it increase the number of data copy (as they are possibly very large), especially for the NON-String inputs?

I suppose that this PR has a negligible impact on performance. Although we need to pass two additional arguments (shapes and strides) though the "pybind" module, the arguments are quite small (1D with input rank size for each argument) and are passed only once at whole module execution.

@negiyas
Copy link
Collaborator Author

negiyas commented Jan 26, 2024

@jenkins-droid test this please

Signed-off-by: Yasushi Negishi <[email protected]>
Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the changes, I am grateful that you found a way to not change the external interface while satisfying the needs for the 3-arguments run method. Please confirm that it is indeed the case.

I left 2 small nits, approved the PR but believe you should correct them. One is about the relative include path, and the other is about the copyright not being updated to 2024.

Thanks for the good work.

@@ -21,7 +21,7 @@

namespace py = pybind11;

#include "ExecutionSession.hpp"
#include "../ExecutionSession.hpp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I believe in most places we give the path from the onnx-mlir root dir. I would use that here if you don't mind.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!


############# PyOMRuntime.py #######################################
#
# Copyright 2021-2023 The IBM Research Authors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, please update all of the copyright to include year 2024, here and elsewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated "2023" to "2024", in files included in this PR, and other files including "-2023" still remains.

@AlexandreEichenberger
Copy link
Collaborator

PS: I see the python format failing for other PRs as well, so no need to worry about this.\

I asked @gongsu832 to look into why it reports a failure

@gongsu832
Copy link
Collaborator

LGTM, thanks for the changes, I am grateful that you found a way to not change the external interface while satisfying the needs for the 3-arguments run method. Please confirm that it is indeed the case.

I think there might be some misunderstanding. The external interface will change to be 3 arguments. The original 1-argument interface uses shape info embedded in the numpy array and the python-to-native conversion is done by pybind11 directly. But it has trouble supporting string data type because the multi-dimensional array must be "flattened" to 1-dimension by copying each dimension manually in C/C++ since pybind11 currently does not directly support string data type. With the 3-argument interface, the numpy array is "flattened" in the new python wrapper (and hence the need for additional arguments to carry shape info) and this "flatten" typically is a simple "view change" without copying. This 3-argument interface is also more similar to C/C++ and Java interfaces.

@gongsu832
Copy link
Collaborator

PS: I see the python format failing for other PRs as well, so no need to worry about this.\

I asked @gongsu832 to look into why it reports a failure

The problem is that MacOS is now using a newer version of black 24.1.0. So on your local machine you should upgrade black to that version.

@negiyas
Copy link
Collaborator Author

negiyas commented Jan 27, 2024

@jenkins-droid test this please

@negiyas
Copy link
Collaborator Author

negiyas commented Jan 29, 2024

@gongsu832 I updated the black command, and also updated format of utils/analyze-simd.py and utils/pre-onnx-mlir.py to fix the black format errors. Thanks.

@negiyas
Copy link
Collaborator Author

negiyas commented Jan 29, 2024

@gongsu832 The external interface will change to be 3 arguments. The original 1-argument interface uses shape info embedded in the numpy array and the python-to-native conversion is done by pybind11 directly. But it has trouble supporting string data type because the multi-dimensional array must be "flattened" to 1-dimension by copying each dimension manually in C/C++ since pybind11 currently does not directly support string data type.

Yes. Exactly. Thanks for pointing out this issue. I remember this issue again now:-).
@gongsu832 Thank you again for reviewing this complicated PR!

@negiyas negiyas merged commit d4cb9a9 into onnx:main Jan 29, 2024
8 checks passed
@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #13930 [push] Support string type inpu... started at 22:09

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #13957 [push] Support string type inpu... started at 23:09

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #12954 [push] Support string type inpu... started at 23:18

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #13957 [push] Support string type inpu... passed after 1 hr 27 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #13930 [push] Support string type inpu... passed after 1 hr 31 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #12954 [push] Support string type inpu... passed after 1 hr 51 min

@cjvolzka
Copy link
Collaborator

cjvolzka commented Feb 1, 2024

After pulling in this change, I'm noticing that the PyRuntime files are coming out as PyRuntimeC.cpython-*-linux-gnu.so Also to get our Python examples to work, I have to do from PyRuntimeC import OMExecutionSession

As is, that would then break existing client applications. Their programs would need to be updated for the change. Also I haven't tried it yet, but I'm guessing the onnx-mlir examples at https://github.com/onnx/onnx-mlir/blob/main/docs/mnist_example/mnist-runPyRuntime.py#L4 are broken because it still has from PyRuntime instead of from PyRuntimeC

If it breaks clients, then that should be a major version change (ie 0.5.0) for onnx-mlir. Is that desired? For zDLC, after generating the PyRuntimeC.cpython-*-linux-gnu.so, I can probably rename the PyRuntime.cpython-*-linux-gnu.so to avoid breaking exploiters but I wonder if we should revert the name back in onnx-mlir as a whole.

@gongsu832
Copy link
Collaborator

Sorry I didn't notice this when reviewing the PR. I think the change is due to the original PyRuntime generated by pybind11 conflicting with the new PyRuntime.py wrapper. @negiyas I think we should keep the original PyRuntime in order not to break the existing python clients. We can perhaps name the new python wrapper PyOMRuntime.py.

@cjvolzka keep in mind that the original PyRuntime will not support string data type. To use string data type, you must switch to PyOMRuntime.

@cjvolzka
Copy link
Collaborator

cjvolzka commented Feb 2, 2024

I can probably rename the PyRuntime.cpython-*-linux-gnu.so to avoid breaking exploiters but I wonder if we should revert the name back in onnx-mlir as a whole.

Actually I was wrong on this. I missed that the .run() interface changed. So that wouldn't actually work to avoid breaking exploiters. See comment at #2629 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants