Skip to content

[OpenVINO Quantizer] Update OpenVINO Quantizer Tutorial#3889

Draft
anzr299 wants to merge 7 commits into
pytorch:mainfrom
anzr299:patch-1
Draft

[OpenVINO Quantizer] Update OpenVINO Quantizer Tutorial#3889
anzr299 wants to merge 7 commits into
pytorch:mainfrom
anzr299:patch-1

Conversation

@anzr299
Copy link
Copy Markdown

@anzr299 anzr299 commented May 11, 2026

Description

Recently the Openvino quantizer was moved from nncf -> executorch. This would break the imports mentioned in this tutorial.
This PR fixes the imports to use executorch.

Checklist

  • The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
  • [x ] Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • [ x] No unnecessary issues are included into this pull request.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3889

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the cla signed label May 11, 2026
Comment on lines -168 to -173
* ``target_device`` - defines the target device, the specificity of which will be taken into account during optimization. The following values are supported: ``ANY`` (default), ``CPU``, ``CPU_SPR``, ``GPU``, and ``NPU``.

.. code-block:: python

OpenVINOQuantizer(target_device=nncf.TargetDevice.CPU)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daniil-lyakhov I couldn't find the target_device being used inside of openvino quantizer.

@anzr299 anzr299 marked this pull request as draft May 11, 2026 13:22
* ``mode`` - defines quantization scheme for the model. Multiple modes are supported:

* ``PERFORMANCE`` (default) - defines symmetric quantization of weights and activations
* ``INT8_SYM`` (default) - defines symmetric quantization of weights and activations. This is the best for performance
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why leave weight compression behind? Can we extend the example with WC?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the optional part? I agree I will add it there.
Also, maybe we can change the link which points to some example for PTQ in executorch like yolo instead of nncf resnet example. What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean this is the best for performance? Unclear

float_model(Python) Example Input
\ /
\ /
--------------------------------------------------------
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

* ``mode`` - defines quantization scheme for the model. Multiple modes are supported:

* ``PERFORMANCE`` (default) - defines symmetric quantization of weights and activations
* ``INT8_SYM`` (default) - defines symmetric quantization of weights and activations. This is the best for performance
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean this is the best for performance? Unclear

exported_model, quantizer, calibration_dataset, smooth_quant=True, fast_bias_correction=False
)

Weights Only Quantization
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Weights Only Quantization
Weights Only Compression

Comment on lines +267 to +286
Data-free algorithms
~~~~~~~~~~~~~~~~~~~~

When no calibration data is available, ``compress_pt2e`` can perform weight compression relying solely on the pretrained weights. Data-Free Compression uses only the weight tensor statistics, with no activations observed at any point. It can be combined with the AWQ and Mixed Precision algorithms when richer behavior is needed without giving up the no-dataset workflow.

.. code-block:: python

from nncf.experimental.torch.fx import compress_pt2e

compressed_model = compress_pt2e(exported_model, quantizer, awq=True, ratio=0.8)

Mixed Precision algorithms
~~~~~~~~~~~~~~~~~~~~~~~~~~

Mixed Precision assigns different bit-widths (e.g. INT4 vs INT8) to individual layers based on their sensitivity, keeping more sensitive layers at higher precision while aggressively compressing the rest. NNCF supports several sensitivity-ranking criteria:

- **Weight Quantization Error** - Data-free metric that measures the per-layer error introduced by quantizing the weights themselves, requiring no calibration data.
- **Hessian** - Activation-aware metric that uses second-order information about the loss to estimate how much the model output changes when a layer's weights are perturbed by quantization.
- **Mean Variance** and **Max Variance** - Activation-aware metrics that rank layers by the mean or maximum variance of their input activations, on the intuition that layers with more spread-out activations are harder to quantize.
- **Mean Magnitude** - Activation-aware metric that ranks layers by the average magnitude of their input activations.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much characters, please add it before the AWQ/scale estimation in the same heirarchy. And put a comment in the code against the calibration dataset that it is optional for the data-free mode

Comment on lines +283 to +285
- **Weight Quantization Error** - Data-free metric that measures the per-layer error introduced by quantizing the weights themselves, requiring no calibration data.
- **Hessian** - Activation-aware metric that uses second-order information about the loss to estimate how much the model output changes when a layer's weights are perturbed by quantization.
- **Mean Variance** and **Max Variance** - Activation-aware metrics that rank layers by the mean or maximum variance of their input activations, on the intuition that layers with more spread-out activations are harder to quantize.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a doc with all the details? I would prefere a link here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants