-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Including tokenizer to onnx model / basic usage of the onnxruntime-extensions #798
Comments
If you want to combine any processing model into the ONNX model, please use this function https://onnx.ai/onnx/api/compose.html#onnx.compose.merge_models |
Thank you so much for the quick reply! I will give it a try! Are there any alternatives?
Would it then be possible to create a pipeline using:
Or in other words, which one is the easiest and most straight-forward method? :-) Really appreciate your help! Cheers, M |
this approach works in a lower level which requires onnx and tokenization data knowledge and is prone to errors. So, it is recommended to only use |
Alright, I think I solved it using the gen_processing() and merge functions :-) I attach my solution as a reference for others who encounter a similar problem: import torch
from onnxruntime_extensions import gen_processing_models
from onnxruntime_extensions import get_library_path
import onnx
import onnxruntime as ort
import numpy as np
from transformers import RobertaForSequenceClassification, RobertaTokenizer
# Step 1: Load the Huggingface Roberta tokenizer and model
input_text = "A test text!"
model_type = "roberta-base"
model = RobertaForSequenceClassification.from_pretrained(model_type)
tokenizer =RobertaTokenizer.from_pretrained(model_type)
# Step 2: Export the tokenizer to ONNX using gen_processing_models
onnx_tokenizer_path = "tokenizer.onnx"
# Generate the tokenizer ONNX model
tokenizer_onnx_model = gen_processing_models(tokenizer, pre_kwargs={})[0]
# Save the tokenizer ONNX model
with open(onnx_tokenizer_path, "wb") as f:
f.write(tokenizer_onnx_model.SerializeToString())
# Step 3: Export the Huggingface Roberta model to ONNX
onnx_model_path = "model.onnx"
dummy_input = tokenizer("This is a dummy input", return_tensors="pt")
# 5. Export the model to ONNX
torch.onnx.export(
model, # model to be exported
(dummy_input['input_ids'],dummy_input["attention_mask"]), # model input (dummy input)
onnx_model_path, # where to save the ONNX model
input_names=["input_ids", "attention_mask_input"], # input tensor name
output_names=["logits"], # output tensor names
dynamic_axes={"input_ids": {0: "batch_size", 1: "sequence_length"}, # dynamic axes
"logits": {0: "batch_size"}
}
)
# Step 4: Merge the tokenizer and model ONNX files into one
onnx_combined_model_path = "combined_model_tokenizer.onnx"
# Load the tokenizer and model ONNX files
tokenizer_onnx_model = onnx.load(onnx_tokenizer_path)
model_onnx_model = onnx.load(onnx_model_path)
# Inspect the ONNX models to find the correct input/output names
print("Tokenizer Model Inputs:", [node.name for node in tokenizer_onnx_model.graph.input])
print("Tokenizer Model Outputs:", [node.name for node in tokenizer_onnx_model.graph.output])
print("Model Inputs:", [node.name for node in model_onnx_model.graph.input])
print("Model Outputs:", [node.name for node in model_onnx_model.graph.output])
# Merge the tokenizer and model ONNX files
combined_model = onnx.compose.merge_models(
tokenizer_onnx_model,
model_onnx_model,
io_map=[('input_ids', 'input_ids'), ('attention_mask', 'attention_mask_input')]
)
# Save the combined model
onnx.save(combined_model, onnx_combined_model_path)
# Step 5: Test the combined ONNX model using an Inference session with ONNX Runtime Extensions
# Initialize ONNX Runtime SessionOptions and load custom ops library
sess_options = ort.SessionOptions()
sess_options.register_custom_ops_library(get_library_path())
# Initialize ONNX Runtime Inference session with Extensions
session = ort.InferenceSession(onnx_combined_model_path, sess_options=sess_options, providers=['CPUExecutionProvider'])
# Prepare dummy input text
input_feed = {"input_text": np.asarray([input_text])} # Assuming "input_text" is the input expected by the tokenizer
# Run the model
outputs = session.run(None, input_feed)
# Print the outputs
print("logits:", outputs[1][0]) Thanks for the help! Cheers, M |
Hi @MLRadfys , I am currently stuck in the process of decoding Ids back to text. Is there some example which I can use for reference. |
There is an example here:
gen_processing_models is the ONNX model which can decode the ids into text.
|
Hi @MLRadfys and @wenbingl ,
I did not see attention_mask included in the output. I understand that attention_mask is typically used for filtering padding tokens. To address this, I could add an additional step to generate an array of all ones with the same length as input_ids (since my goal is to generate embeddings for each sentence, and I don't need to do the padding step). However, I would like to confirm if this approach is correct or if there might be an issue in my conversion process ?
The results are:
|
Yes, you are right. The attention_mask is a binary mask which is 1 on non-padding id in tokenization scenarios. |
Hi and thanks for this great library!
Iam very new to onnx and Iam trying to include the Roberta tokenizer into a Roberta onnx model.
As far as I have understood, one can get the onnx graph for the tokenizer using:
import onnxruntime as _ort
from transformers import RobertaTokenizer
from onnxruntime_extensions import OrtPyFunction, gen_processing_models
# Roberta tokenizer
tokenizer = AutoTokenizer.from_pretrained("roberta-base", model_max_length=512)
tokenizer_onnx = OrtPyFunction(gen_processing_models(spm_hf_tokenizer, pre_kwargs={})[0])
Now Iam wondering what the next step is? How can I combine the onnx tokenizer (or graph) with a model?
Thanks in advance for any help,
cheers,
M
The text was updated successfully, but these errors were encountered: