Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add framework for version converter API #1926

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

shubhambhokare1
Copy link
Contributor

No description provided.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lintrunner found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

)
model = ir.serde.deserialize_model(model_proto)
self.assertEqual(model.graph._nodes[4].op_type, "GridSample")
self.assertEqual(model.graph._nodes[4]._attributes['mode'].value, 'bilinear')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertEqual(model.graph._nodes[4]._attributes['mode'].value, 'bilinear')
self.assertEqual(model.graph[4].attributes['mode'].value, 'bilinear')

Copy link

codecov bot commented Oct 31, 2024

❌ 15 Tests Failed:

Tests completed Failed Passed Skipped
14275 15 14260 1625
View the full list of 3 ❄️ flaky tests
tests.eager_mode_test.TestEagerModeArguments_0_reference_runtime test_function_input_and_attribute_by_kwargs_out_of_order

Flake rate in main: 38.05% (Passed 5976 times, Failed 3671 times)

Stack Traces | 0.002s run time
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:91: in run
    res = self._run(x, y)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:139: in _run
    res = (convert_from_ml_dtypes(res[0]),)
..../test_torch_nightly/lib/python3.12.../onnx/reference/custom_element_types.py:50: in convert_from_ml_dtypes
    return array.view(dtype=dtype)
E   ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged

The above exception was the direct cause of the following exception:
tests/eager_mode_test.py:115: in test_function_input_and_attribute_by_kwargs_out_of_order
    self.assertEqual(add_with_alpha(alpha=3.0, other=2.0, this=1.0), 7.0)
onnxscript/values.py:576: in __call__
    return evaluator.default().eval_function(self, args, kwargs)
onnxscript/evaluator.py:307: in eval_function
    result = function.function(*adapted_args, **adapted_kwargs)
tests/eager_mode_test.py:59: in add_with_alpha
    other = op.Mul(other, alpha)
.../onnx_opset/_impl/opset14.py:696: in Mul
    return op(*self._prepare_inputs(schema, A, B))
onnxscript/values.py:304: in __call__
    return evaluator.default().eval(schema, args, kwargs)
onnxscript/evaluator.py:194: in eval
    outputs = self._eval(schema, inputs, attributes, closure)
onnxscript/evaluator.py:524: in _eval
    result = session.run(None, session_run_input)
..../test_torch_nightly/lib/python3.12.../onnx/reference/reference_evaluator.py:599: in run
    outputs = node.run(*inputs, **linked_attributes)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:114: in run
    res = OpRunBinary.run(self, x, y)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:93: in run
    raise TypeError(
E   TypeError: Issues with types <class 'numpy.ndarray'>, <class 'numpy.ndarray'> (binary operator 'Mul').
tests.eager_mode_test.TestEagerModeArguments_0_reference_runtime test_function_some_input_by_kwargs

Flake rate in main: 38.05% (Passed 5976 times, Failed 3671 times)

Stack Traces | 0.002s run time
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:91: in run
    res = self._run(x, y)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:139: in _run
    res = (convert_from_ml_dtypes(res[0]),)
..../test_torch_nightly/lib/python3.12.../onnx/reference/custom_element_types.py:50: in convert_from_ml_dtypes
    return array.view(dtype=dtype)
E   ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged

The above exception was the direct cause of the following exception:
tests/eager_mode_test.py:106: in test_function_some_input_by_kwargs
    self.assertEqual(add_with_alpha(1.0, other=2.0), 3.0)
onnxscript/values.py:576: in __call__
    return evaluator.default().eval_function(self, args, kwargs)
onnxscript/evaluator.py:307: in eval_function
    result = function.function(*adapted_args, **adapted_kwargs)
tests/eager_mode_test.py:59: in add_with_alpha
    other = op.Mul(other, alpha)
.../onnx_opset/_impl/opset14.py:696: in Mul
    return op(*self._prepare_inputs(schema, A, B))
onnxscript/values.py:304: in __call__
    return evaluator.default().eval(schema, args, kwargs)
onnxscript/evaluator.py:194: in eval
    outputs = self._eval(schema, inputs, attributes, closure)
onnxscript/evaluator.py:524: in _eval
    result = session.run(None, session_run_input)
..../test_torch_nightly/lib/python3.12.../onnx/reference/reference_evaluator.py:599: in run
    outputs = node.run(*inputs, **linked_attributes)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:114: in run
    res = OpRunBinary.run(self, x, y)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:93: in run
    raise TypeError(
E   TypeError: Issues with types <class 'numpy.ndarray'>, <class 'numpy.ndarray'> (binary operator 'Mul').
tests.eager_mode_test.TestEagerModeArguments_0_reference_runtime test_function_all_input_by_kwargs

Flake rate in main: 38.05% (Passed 5976 times, Failed 3671 times)

Stack Traces | 0.003s run time
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:91: in run
    res = self._run(x, y)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:139: in _run
    res = (convert_from_ml_dtypes(res[0]),)
..../test_torch_nightly/lib/python3.12.../onnx/reference/custom_element_types.py:50: in convert_from_ml_dtypes
    return array.view(dtype=dtype)
E   ValueError: Changing the dtype of a 0d array is only supported if the itemsize is unchanged

The above exception was the direct cause of the following exception:
tests/eager_mode_test.py:109: in test_function_all_input_by_kwargs
    self.assertEqual(add_with_alpha(this=1.0, other=2.0), 3.0)
onnxscript/values.py:576: in __call__
    return evaluator.default().eval_function(self, args, kwargs)
onnxscript/evaluator.py:307: in eval_function
    result = function.function(*adapted_args, **adapted_kwargs)
tests/eager_mode_test.py:59: in add_with_alpha
    other = op.Mul(other, alpha)
.../onnx_opset/_impl/opset14.py:696: in Mul
    return op(*self._prepare_inputs(schema, A, B))
onnxscript/values.py:304: in __call__
    return evaluator.default().eval(schema, args, kwargs)
onnxscript/evaluator.py:194: in eval
    outputs = self._eval(schema, inputs, attributes, closure)
onnxscript/evaluator.py:524: in _eval
    result = session.run(None, session_run_input)
..../test_torch_nightly/lib/python3.12.../onnx/reference/reference_evaluator.py:599: in run
    outputs = node.run(*inputs, **linked_attributes)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:114: in run
    res = OpRunBinary.run(self, x, y)
..../test_torch_nightly/lib/python3.12.../reference/ops/_op.py:93: in run
    raise TypeError(
E   TypeError: Issues with types <class 'numpy.ndarray'>, <class 'numpy.ndarray'> (binary operator 'Mul').

To view more test analytics, go to the Test Analytics Dashboard
Got feedback? Let us know on Github

### Adapters

# Compatibility Adapter
def adapter_compatible(op: ir.Node, target_opset):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I recommend node instead of op. We frequently use op for building a node via the op.MatMul(x, y) syntax.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, the general interface for a version-converter would likely need an op/node builder as an input, and that would be best called op. We would need that when creating new nodes as part of the version-conversion (eg., even in the example below to create a Constant node from an attribute value.



_ADAPTERS_18_19 = {
"Equal": adapter_compatible,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: I think it would be better to not have to register an adapter for compatible extensions, since it is basically an identity operation.

I can see a value in explicitly documenting that a particular opset-update is backwards-compatible (to catch the case where we forget to register an adapter). We could do that using a separate set of all compatible ops.


# Compatibility Adapter
def adapter_compatible(op: ir.Node, target_opset):
op.version = target_opset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the version on a node need not be part of the adapter logic. It should be in the converter logic that calls the adapter, since this logic is the same for all adapters. No point in duplicating this line in every adapter. That would also make a compatible adapter an identity function, and we don't even need to register one for compatible extensions,

@gramalingam
Copy link
Collaborator

One of the main design question relates to the "adapter" signature: what form should it take? Essentially it is a function that takes a single node as a parameter, and modifies it in some form. The changes are typically a simple mutation of a node along with potentially other changes (such as the insertion of extra nodes).

For now, I think it might be fine to follow the pattern used in the optimizer and rewriter, which are based on node-transformers that, given an input node, return a sequence of replacement nodes or None (if no replacement is required). This allows a simple loop over all nodes in the graph that transforms each node in sequence. This can be generalized later if necessary.

name=_attr.name,
)
# Add the ir.Value as inputs to the node and graph
node._inputs = node._inputs + (attr_as_input,)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: avoid using and modifying private fields. To change inputs to a node, always initialize a new node to replace the current one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 ... we could do what the optimizer and rewriter currently do: use a more generic interface for a node-adapter that returns a list of replacement nodes (or None if no replacement is needed).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further, we can't just create a value as above ... we need to create a constant value with the given value. For now, the simplest way is to create a new Constant node. (Orthogonally to this, we should extend the "builder" API we currently use to create initializers as well, but that's a separate issue, for now Constant nodes should be fine.)

self.assertEqual(nodes[1].version, 19)
self.assertEqual(nodes[4].op_type, "GridSample")
self.assertEqual(nodes[4].version, 20)
self.assertEqual(model.graph._nodes[4]._attributes["mode"].value, "cubic")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: Avoid accessing internal fields

@shubhambhokare1 shubhambhokare1 self-assigned this Nov 6, 2024
@shubhambhokare1 shubhambhokare1 added enhancement New feature or request topic: api topic: IR Intermediate representation labels Nov 6, 2024
self.custom_adapters = custom_adapter_list

def graph_version_convert(self, graph: ir.Graph, target_version: int) -> None:
if self.model_version == target_version:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Extension) I think we will need to soon support the case where the incoming model has nodes with different opset versions. At that point, such checks should happen at the node level, not at the model level.


# Iterate from current model version -> target version
# Updating each node based on the correct adapter [ver->ver+1]
for opset_version in range(self.model_version, target_version):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be better to explicitly check for target_version being > or < current version. (It's ok to focus on up-conversion in first version, but may be better to have an error/warning message if we run into down-conversion when it is unimplemented.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this could be bundled into a check after pick_adapter_set to more generally handle the case when we don't have an adapter set (for either 23 to 24 or for 18 to 17).

@justinchuby
Copy link
Collaborator

justinchuby commented Nov 7, 2024

Questions that are related to the design doc that comes to mind

  1. How do we ensure down conversion can be supported in the future
  2. What invariants do we preserve in the nodes; an opset version may or may not be associated with an ir.Node. How are both cases handled
  3. How does the design relate to the rewriter? Conceptually version conversion is a model rewriting process. How do we plan to maintain a consistent dev/user experience when authoring and debugging subgraph replacement logic?
  4. When version conversion fails, how should it fail? Succeed partially, abort, etc.? What are the guarantees/invariants of the model state when conversion is not possible?
  5. Any performance considerations?
  6. What is the path for supporting new opsets?
  7. How is the conversion logic tested for them to be robust? How is it designed so that future maintenance is simple and scalable?

def __init__(self, target_version: int):
self.target_version = target_version

def process_node(self, node: ir.Node, opset_version):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
model = ir.serde.deserialize_model(model_proto)
target_version = 17
version_converter.convert_version(model, target_version=target_version)
nodes = model.graph._nodes

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable nodes is not used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request topic: api topic: IR Intermediate representation
Projects
Development

Successfully merging this pull request may close these issues.

3 participants