[OpenVINO Quantizer] Update OpenVINO Quantizer Tutorial#3889
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3889
Note: Links to docs will display an error until the docs builds have been completed. ❗ 2 Active SEVsThere are 2 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| * ``target_device`` - defines the target device, the specificity of which will be taken into account during optimization. The following values are supported: ``ANY`` (default), ``CPU``, ``CPU_SPR``, ``GPU``, and ``NPU``. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| OpenVINOQuantizer(target_device=nncf.TargetDevice.CPU) | ||
|
|
There was a problem hiding this comment.
@daniil-lyakhov I couldn't find the target_device being used inside of openvino quantizer.
| * ``mode`` - defines quantization scheme for the model. Multiple modes are supported: | ||
|
|
||
| * ``PERFORMANCE`` (default) - defines symmetric quantization of weights and activations | ||
| * ``INT8_SYM`` (default) - defines symmetric quantization of weights and activations. This is the best for performance |
There was a problem hiding this comment.
Why leave weight compression behind? Can we extend the example with WC?
There was a problem hiding this comment.
In the optional part? I agree I will add it there.
Also, maybe we can change the link which points to some example for PTQ in executorch like yolo instead of nncf resnet example. What do you think?
There was a problem hiding this comment.
What do you mean this is the best for performance? Unclear
| float_model(Python) Example Input | ||
| \ / | ||
| \ / | ||
| —-------------------------------------------------------- |
| * ``mode`` - defines quantization scheme for the model. Multiple modes are supported: | ||
|
|
||
| * ``PERFORMANCE`` (default) - defines symmetric quantization of weights and activations | ||
| * ``INT8_SYM`` (default) - defines symmetric quantization of weights and activations. This is the best for performance |
There was a problem hiding this comment.
What do you mean this is the best for performance? Unclear
| exported_model, quantizer, calibration_dataset, smooth_quant=True, fast_bias_correction=False | ||
| ) | ||
|
|
||
| Weights Only Quantization |
There was a problem hiding this comment.
| Weights Only Quantization | |
| Weights Only Compression |
| Data-free algorithms | ||
| ~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| When no calibration data is available, ``compress_pt2e`` can perform weight compression relying solely on the pretrained weights. Data-Free Compression uses only the weight tensor statistics, with no activations observed at any point. It can be combined with the AWQ and Mixed Precision algorithms when richer behavior is needed without giving up the no-dataset workflow. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from nncf.experimental.torch.fx import compress_pt2e | ||
|
|
||
| compressed_model = compress_pt2e(exported_model, quantizer, awq=True, ratio=0.8) | ||
|
|
||
| Mixed Precision algorithms | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| Mixed Precision assigns different bit-widths (e.g. INT4 vs INT8) to individual layers based on their sensitivity, keeping more sensitive layers at higher precision while aggressively compressing the rest. NNCF supports several sensitivity-ranking criteria: | ||
|
|
||
| - **Weight Quantization Error** - Data-free metric that measures the per-layer error introduced by quantizing the weights themselves, requiring no calibration data. | ||
| - **Hessian** - Activation-aware metric that uses second-order information about the loss to estimate how much the model output changes when a layer's weights are perturbed by quantization. | ||
| - **Mean Variance** and **Max Variance** - Activation-aware metrics that rank layers by the mean or maximum variance of their input activations, on the intuition that layers with more spread-out activations are harder to quantize. | ||
| - **Mean Magnitude** - Activation-aware metric that ranks layers by the average magnitude of their input activations. |
There was a problem hiding this comment.
Too much characters, please add it before the AWQ/scale estimation in the same heirarchy. And put a comment in the code against the calibration dataset that it is optional for the data-free mode
| - **Weight Quantization Error** - Data-free metric that measures the per-layer error introduced by quantizing the weights themselves, requiring no calibration data. | ||
| - **Hessian** - Activation-aware metric that uses second-order information about the loss to estimate how much the model output changes when a layer's weights are perturbed by quantization. | ||
| - **Mean Variance** and **Max Variance** - Activation-aware metrics that rank layers by the mean or maximum variance of their input activations, on the intuition that layers with more spread-out activations are harder to quantize. |
There was a problem hiding this comment.
Do we have a doc with all the details? I would prefere a link here
Description
Recently the Openvino quantizer was moved from nncf -> executorch. This would break the imports mentioned in this tutorial.
This PR fixes the imports to use executorch.
Checklist