-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding tests for "quantize" function for CNN PTQ #908
Comments
I've started working on making unit tests for
So yeah, I was wondering if this was all expected behavior. If so, I can add some appropriate documentation! If not, I can start working on "fixes". |
The parts of the pre-processing that might be needed are mostly the following: https://github.com/Xilinx/brevitas/blob/master/src/brevitas/graph/quantize.py#L275-L280 These are not always needed and there are cases when they can be skipped, except maybe only for symbolic trace which is required with FX quantization backend. Having them makes the quantization process easier.
Conceptually, they do very different things. They are coupled for the sake of these examples but there are cases where those transformations should not be applied or they are not interesting for the model in case.
That is expected. We have a separate entrypoint for LLM quantization and we would like to unify the two at some point. To do that, first we might need tests to ensure we preserve all the correct functionalities.
Could you post an example? |
Sounds good! I'll experiment a bit with the pre-processing, but will use symbolic trace as the default for now.
That makes sense, in that case I'll delay my Transformer-based test until then.
I'm not entirely sure what the issue was: I was getting 8-bit quantization in every case in my larger example. I'm going to dig into it and see what my error was, and post a minimal example. However, in the meantime below is an example using the
|
So the issue I ran into, where the Using the same
I stepped through the model, and saw that the input activation and weight tensor were being quantized to 8 bits, not the desired 6 and 3 respectively. Is this expected behavior? |
For layerwise quantization, there are special rules that we have for first and last layer. The way we identify first/last in that function is a bit hard-coded around imagenet examples, where we check that the first layer has 3 input channel, and the last has 1000 output channels. If you change the number of inp channel in your conv, you should see a difference |
I found the function that does the 3/1000 input/output channel identification:
I can confirm that changing the number of input channels made the quantization have the "normal" behavior. Is there any desire to un-hard-code the 3 vs 1000 channels thing? It seems a bit fragile, but I'd imagine that one would need to use FX mode to get insight into what is the first or last layer. Alternatively, we could add a logged warning that we're defaulting to the default values if one chooses I've opened up a PR with some preliminary tests, and I will be adding more tests (e.g. whatever tests you guys want!). The ones that are currently failing are in some cases when I give invalid inputs (e.g. when I give zero-valued or negative-valued bit widths), where |
I've done a bit of testing, and negative/zero bit widths are considered valid if the model isn't used for anything. E.g.
If one feeds data through the model with zero bit-widths, it outputs NaNs. However, if one feeds data through a model with negative bit widths, it still outputs values. I'm digging into this a bit more because I'm curious as to what's happening inside the model, but in either case I would imagine we should add add some asserts to make sure all provided bit widths are positive integers. I opened a PR for it. I'm not sure if the integer constraint I added is desired behavior, or if I should add it to other functions as well. |
Here we keep track of what part of
quantize
inptq_common.py
are tested and what are still missing.The text was updated successfully, but these errors were encountered: