Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Builder API #23223

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Model Builder API #23223

wants to merge 6 commits into from

Conversation

skottmckay
Copy link
Contributor

Description

Supports creating a model programmatically using the ORT C or C++ API.
Supports augmenting an existing model to add nodes.

TODO: Validation API is feature complete and additional tests.

Motivation and Context

Supports creating a model programmatically using the ORT C or C++ API.
Supports augmenting an existing model to add nodes.
// FUTURE: This will also allow CopyTensors to utilize the IDataTransfer objects
// "0": Disabled. [DEFAULT]
// "1": Enable Model Builder Session
static const char* const kOrtSessionOptionsEnableModelBuilder = "session.model_builder_session";

Check notice

Code scanning / CodeQL

Unused static variable Note

Static variable kOrtSessionOptionsEnableModelBuilder is never read.
#include "core/framework/error_code_helper.h"
#include "core/framework/execution_provider.h"
#include "core/session/abi_session_options_impl.h"
// #include "core/session/environment.h"

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
// FUTURE: This will also allow CopyTensors to utilize the IDataTransfer objects
// "0": Disabled. [DEFAULT]
// "1": Enable Model Builder Session
static const char* const kOrtSessionOptionsEnableModelBuilder = "session.model_builder_session";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't currently used. Initially I was thinking it would enable copying inputs/initializers to the correct device AOT, but...

a) that requires knowing where the value would be used, which is dependent on partitioning that happens later and depends on which EPs are enabled so easy to get wrong;
and
b) would be counter-productive if an optimizer wanted to update the initializer as we'd have to copy it back to CPU to do that.

TBD if needed.

@Craigacp
Copy link
Contributor

Is it possible to save out the model from the builder via the C API? It'll be a nice alternative to building ONNX models with raw protobufs in languages which don't have a native ONNX library.

@skottmckay
Copy link
Contributor Author

Is it possible to save out the model from the builder via the C API? It'll be a nice alternative to building ONNX models with raw protobufs in languages which don't have a native ONNX library.

You can use the SessionOption that's typically used to save the optimized ONNX model.

ORT_API2_STATUS(SetOptimizedModelFilePath, _Inout_ OrtSessionOptions* options,
_In_ const ORTCHAR_T* optimized_model_filepath);

Caveat is that does not currently support saving tensors created with CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue, but could be updated to do so if required.

api.ReleaseTensorTypeAndShapeInfo(tensor_type_info); // input_type_info took a copy

// create ValueInfo and release the type info as CreateValueInfo takes a copy.
OrtValueInfo* input_value_info = nullptr;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where to release OrtValueInfo?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems SetGraphInputs() (and SetGraphOutputs()) would take ownership of OrtValueInfos. Resolving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Craigacp
Copy link
Contributor

Craigacp commented Jan 1, 2025

Is it possible to save out the model from the builder via the C API? It'll be a nice alternative to building ONNX models with raw protobufs in languages which don't have a native ONNX library.

You can use the SessionOption that's typically used to save the optimized ONNX model.

ORT_API2_STATUS(SetOptimizedModelFilePath, _Inout_ OrtSessionOptions* options,
_In_ const ORTCHAR_T* optimized_model_filepath);

Caveat is that does not currently support saving tensors created with CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue, but could be updated to do so if required.

Is the optimized model one which has had op fusion and other passes done so it's no longer using ONNX standard ops everywhere, or is that a different process?

@skottmckay
Copy link
Contributor Author

Is the optimized model one which has had op fusion and other passes done so it's no longer using ONNX standard ops everywhere, or is that a different process?

You can specify the optimization level. If you keep it to level 1 (GraphOptimizationLevel.ORT_ENABLE_BASIC) it will only use standard ONNX ops.

typedef enum GraphOptimizationLevel {
ORT_DISABLE_ALL = 0,
ORT_ENABLE_BASIC = 1,

/** \brief Set the optimization level to apply when loading a graph
*
* Please see https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html for an in-depth explanation
* \param[in,out] options The session options object
* \param[in] graph_optimization_level The optimization level
*
* \snippet{doc} snippets.dox OrtStatus Return Value
*/
ORT_API2_STATUS(SetSessionGraphOptimizationLevel, _Inout_ OrtSessionOptions* options,
GraphOptimizationLevel graph_optimization_level);

https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html

huningxin added a commit to huningxin/chromium-src that referenced this pull request Jan 2, 2025
huningxin added a commit to huningxin/chromium-src that referenced this pull request Jan 2, 2025
mingmingtasd pushed a commit to shiyi9801/chromium that referenced this pull request Jan 2, 2025
* Pass ORT_API_VERSION to `OrtApiBase::GetApi()`

Also removes the inclusion of onnx.pb.h header.

* Add third_party/onnxruntime_headers

Import https://github.com/microsoft/onnxruntime/tree/main/include

Commit is based on microsoft/onnxruntime#23223

* Use ORT Model Builder API

* Refactor scoped ORT type ptr

1. Rename to ScopedOrtTypePtr
2. Use macros
3. Introduce `operator T*()`
4. Introduce `Release()` method
5. Rename `get_ptr()` to `Get()`
6. Rename `get_pptr()` to `GetAddressOf()`

* Remove ONNX Runtime headers from third_party/microsoft_dxheaders
if (attributes != nullptr) {
n->attributes.reserve(attribs_len);
for (size_t i = 0; i < attribs_len; ++i) {
n->attributes.push_back(*reinterpret_cast<const ONNX_NAMESPACE::AttributeProto*>(attributes[i]));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call ReleaseOpAttr after it's copied into the node so the user doesn't have to? Would be more consistent with the rest of the API to 'take ownership' of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question applies to the CreateXXXTypeInfo/CreateValueInfo calls.

Those felt a little more re-usable (e.g. if you were constructing a model with KV cache you're going to be using the same TypeInfo for multiple inputs/outputs.) but maybe it's better overall to have a consistent pattern of ownership transferring when you add to a containing class instead of taking a copy in some places.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those felt a little more re-usable (e.g. if you were constructing a model with KV cache you're going to be using the same TypeInfo for multiple inputs/outputs.)

OpAttr might also be re-used?

but maybe it's better overall to have a consistent pattern of ownership transferring when you add to a containing class instead of taking a copy in some places.

I feel the consistency is for those AddXxxToXxx() methods who do ownership transferring? SetGraphInputs()/SetGrapOutputs() also take ownership (I was not aware of it when I initially used the API), should they be renamed to AddInputsToGraph()/AddOutputsToGraph()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpAttr could be re-used but that seems less likely. e.g. are you going to keep an OpAttr value around and look it up to re-use in a later node that might need the same value?

My concern with AddInputsToGraph would be that it doesn't accumulate unlike the others, so 'set' is a better description.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpAttr could be re-used but that seems less likely. e.g. are you going to keep an OpAttr value around and look it up to re-use in a later node that might need the same value?

WebNN POC code doesn't keep an OpAttr.

My concern with AddInputsToGraph would be that it doesn't accumulate unlike the others, so 'set' is a better description.

That's good point. I am fine with SetGraphInputs()/SetGrapOutputs().

* Pre-existing memory:
* Use CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue to create an OrtValue
* with a tensor that contains a pointer to the existing data.
* User must keep pointer valid for lifetime of the inference session.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true if using CreateTensorWithDataAndDeleterAsOrtValue()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ownership transfers to ORT so the pointer must remain valid, but in that case the user shouldn't be freeing the memory at any point. Will update the comment to clarify.

ORT_API(const OrtModelBuilderApi*, GetModelBuilderApi);

ORT_API_STATUS_IMPL(CreateTensorWithDataAndDeleterAsOrtValue, _In_ OrtAllocator* deleter,
_In_ void* p_data, size_t p_data_len,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would p_data be written by ORT? Or should it take const void* if it is only read by ORT? A similar question is for CreateTensorWithDataAsOrtValue() which takes _Inout_ void* p_data.

https://github.com/microsoft/onnxruntime/blob/skottmckay/ModelBuilderAPI_PR/include/onnxruntime/core/session/onnxruntime_c_api.h#L1359C83-L1359C103

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ORT wouldn't write to p_data itself, but a user could use update the data in the OrtValue returned using the ORT API (e.g. call GetTensorMutableData and make changes).

@@ -27,6 +27,7 @@
#include "core/common/span_utils.h"
#include "core/common/status.h"
#include "core/common/logging/logging.h"
#include "core/framework/ort_value.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It introduces a circular dependency between onnxruntime_graph and onnxruntime_framework. ort_value is a core concept in onnxruntime_framework, which also depends on MemoryInfo, Allocators, etc. It means that the lifetime of a graph will be bound to an allocator. Furthermore, people may ask if the OrtValue can be put on GPU devices, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That dependency already existed with InjectExternalInitializedTensors in this file. And there are lots of places in the graph code where we use types from the framework library. If we want to fix that we might need to limit the graph library to fairly pure ONNX related types, and have ORT things built on top of those in the framework library. e.g. you'd have an ONNX Graph class as well as an onnxruntime Graph class, and things like OrtValue usage would be in the latter.

Long term I think it would be better to convert initializers to OrtValue when loading from the ONNX model so we detach from the protobuf types asap. There are many reasons for doing so. Having to add things like InjectExternalInitializedTensors to efficiently manage memory is a good sign the current setup isn't working well.

Can you elaborate on how the lifetime of the Graph is bound to an allocator? The OrtValue instances internally have a Tensor instance where the deleter is in a shared_ptr, so I would have thought the Graph instance can go away at any time, and the shared_ptr for the allocator in the Tensor deleter would also keep the allocator alive for as long as needed.

The problem I'm trying to address is that there's pre-existing memory where we want to transfer ownership to ORT. e.g. to free CPU based memory if we copy it to GPU. Because we have protobuf based initializers there's no way to attach the deleter to them, and the ORT API deals in OrtValue. So this was the best option I could find to essentially pass through that OrtValue to session state finalization.

The OrtValue could theoretically be on GPU. If you did that you could avoid a copy (if you knew for sure the value would be used on GPU) but you'd break the current setup with optimizers as they expect initializer data to be on CPU. Not clear we want to allow that.

Copy link
Member

@snnn snnn Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tensor cannot live longer than the allocator that allocated the buffer.
An allocator cannot live longer than the corresponding EP(e.g. CUDA EP). Because the EP needs to manage a lot of handlers, and the allocator may need to use a device handler to do malloc/free. All such handlers get destroyed when the EP is destroyed.
That could make things complicated. For example, in InferenceSession class, we have:

  std::shared_ptr<onnxruntime::Model> model_;

  // The file path of where the model was loaded. e.g. /tmp/test_squeezenet/model.onnx
  PathString model_location_;

  // The list of execution providers.
  ExecutionProviders execution_providers_;

  //...
  std::unique_ptr<SessionState> session_state_;

The model_ variable contains a graph, which contains OrtValues, which should be deleted before the execution_providers_ . But they are not ordered in that way. We had similar issues with "execution_providers_" and "session_state_". So, this is very subtle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. So whilst the Tensor has a shared_ptr for the allocator, if the allocator depends on internals of the EP, and the EP goes away, it breaks due to that?

And if we add OrtValue to Graph, which is in InferenceSession::model_, which will be released after execution_providers_ it may break?

Should execution_providers_ therefore be declared prior to model_ in InferenceSession?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so.
However, the code is ok as for now if the graph's OrtValues only use CPU allocators which are relatively simple.

* If using CreateTensorWithDataAsOrtValue you must keep the pointer valid for lifetime of the inference session.
* Set `data_is_external` to true.
*
* Allocated memory:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to our testing, some ops seem to require shape-inference-related initializers being in allocated memory, including:

  • Reshape's shape
  • Reduce's axes
  • Expand's shape
  • Slice's starts, ends and steps

If using pre-existing memory, there will be shape inference error, e.g.

[ShapeInferenceError] Cannot parse data from external tensors. Please load external data into raw data for tensor: x

If that is the case, it would be helpful to be documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general approach, there's about 60 bytes or so of overhead to use the external memory structure for pre-allocated memory, so if the value is less than say 128 bytes you're probably better off using allocated memory.

I think doing so could almost guaranteed that shape inferencing isn't going to break as I can't think of an input that shape inferencing would read that would have 128 bytes of data (16x int64_t dimension or indices values).

If it seems reasonable to do, we could enforce that pre-allocated data is a minimum size of 128 to reduce the chance of a user hitting a shape inferencing error, and document any edge cases we find in the ONNX ops where shape inferencing fails as that would be a much smaller set of operators if any.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it seems reasonable to do, we could enforce that pre-allocated data is a minimum size of 128 to reduce the chance of a user hitting a shape inferencing error, and document any edge cases we find in the ONNX ops where shape inferencing fails as that would be a much smaller set of operators if any.

SGTM, that would be very helpful.

@huningxin
Copy link

Is it possible to save out the model from the builder via the C API? It'll be a nice alternative to building ONNX models with raw protobufs in languages which don't have a native ONNX library.

You can use the SessionOption that's typically used to save the optimized ONNX model.

ORT_API2_STATUS(SetOptimizedModelFilePath, _Inout_ OrtSessionOptions* options,
_In_ const ORTCHAR_T* optimized_model_filepath);

Caveat is that does not currently support saving tensors created with CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue, but could be updated to do so if required.

@skottmckay , Saving tensors created with CreateTensorWithDataAsOrtValue or CreateTensorWithDataAndDeleterAsOrtValue would be useful for debugging. Please support it if possible. Thanks!


// create an initializer for the Y input. add to `weights` so the memory remains valid
OrtValue* y_tensor = nullptr;
auto info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeDefault);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemoryInfo seems not to be used when calling CreateTensorWithDataAndDeleterAsOrtValue()?

Copy link
Contributor Author

@skottmckay skottmckay Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemoryInfo comes from the OrtAllocator deleter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this info variable is unnecessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no. This code initially used CreateTensorWithDataAsOrtValue but I changed it to CreateTensorWithDataAndDeleterAsOrtValue to test that functionality and overlooked removing the info variable. Will remove.

- enforce 128 byte minimum for tensors with external data to avoid shape inferencing issues
  - update unit tests to use 128 byte initializer so external data can be tested
- support saving initializer with in-memory external data to ONNX model by copying into TensorProto's raw_data property.
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment on lines 629 to 631

// The list of execution providers.
// This MUST be prior to model_ in case there are values in the model that were allocated using an allocator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The list of execution providers.
// This MUST be prior to model_ in case there are values in the model that were allocated using an allocator
// The list of execution providers.
// This MUST be prior to model_ in case there are values in the model that were allocated using an allocator

@@ -39,6 +40,9 @@
#include "core/graph/node_arg.h"
#include "core/graph/ort_format_load_options.h"

// Type from Graph API in ORT C API so can't be in a namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like Model Builder API better.

* Create attributes with CreateOpAttr. OrtOpAttr instances are copied.
*
* \param[in] operator_name The name of the operator.
* \param[in] domain_name The domain of the operator. Use an empty string for ONNX operators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can nullptr also be used?

* Existing input/outputs will only be updated if the OrtGraph inputs/outputs are set in the OrtModel.
* i.e. you don't need to call SetGraphInputs/Outputs if they are unchanged.
*
* \snippet{doc} snippets.dox OrtStatus Return Value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments are not documented (as in some other apis too). The semantics here is not clear. It sounds like the changes accumulated in the Model are applied to the Session (similar to CreateSessionFromModel?). However, both args are non-const, so the Model is also modified. After thinking about it, it looks like entities are being transferred from the Model to the Session.
I feel this needs to be clarified/documented.

const std::vector<std::string>& output_names,
std::vector<OpAttr>& attributes);

ConstNode GetConst() const { return ConstNode{this->p_}; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below. The method is useles when the type does not have any const methods.

for (size_t i = 0; i < num_inputs; ++i) {
char* name = nullptr;
ThrowOnError(GetApi().SessionGetInputName(this->p_, i, allocator, &name));
input_names.push_back(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In theory, one allocation leaks if push_back throws.

@@ -2167,4 +2327,136 @@ inline const OrtOpAttr* ShapeInferContext::GetAttrHdl(const char* attr_name) con
return attr_hdl;
}

namespace ModelBuilderAPI {
inline std::vector<const char*> StringsToCharPtrs(const std::vector<std::string>& strings) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be under detail?

ONNX_NAMESPACE::StringStringEntryProto* entry = tensor_proto.mutable_external_data()->Add();
entry->set_key("location");
// magic tag for existing memory that causes 'offset' to be treated as a pointer to the memory
entry->set_value(ToUTF8String(onnxruntime::utils::kTensorProtoMemoryAddressTag));
Copy link
Member

@yuslepukhin yuslepukhin Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ExternalData class now has a helper to set external location. May be we can put all the related code there and re-use it.

@@ -386,6 +386,9 @@ common::Status SaveInitializedTensors(
if (user_supplied_initializer_ids.find(entry.first) != user_supplied_initializer_ids.end()) {
ort_value = *(session_options.initializers_to_share_map.at(name));
LOGS(logger, INFO) << "Using user supplied initializer with name (" << name << ").";

} else if (graph.GetOrtValueInitializer(name, ort_value)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initializers stored in the graph are different in nature. Should we rename GetOrtValueInitializer() as GetOrtValuePopulatedInitializer()?

shiyi9801 pushed a commit to shiyi9801/chromium that referenced this pull request Jan 16, 2025
* Pass ORT_API_VERSION to `OrtApiBase::GetApi()`

Also removes the inclusion of onnx.pb.h header.

* Add third_party/onnxruntime_headers

Import https://github.com/microsoft/onnxruntime/tree/main/include

Commit is based on microsoft/onnxruntime#23223

* Use ORT Model Builder API

* Refactor scoped ORT type ptr

1. Rename to ScopedOrtTypePtr
2. Use macros
3. Introduce `operator T*()`
4. Introduce `Release()` method
5. Rename `get_ptr()` to `Get()`
6. Rename `get_pptr()` to `GetAddressOf()`

* Remove ONNX Runtime headers from third_party/microsoft_dxheaders
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants