[Backend][Relax] Add NPU BYOC backend example#18247
[Backend][Relax] Add NPU BYOC backend example#18247Aristide021 wants to merge 1 commit intoapache:mainfrom
Conversation
|
cc @mshr-h can you help to take a look |
8fcab1b to
f58f54a
Compare
|
@tvm-bot rerun |
|
Very nice work, congratulations! As a simple user who might have such interests, walking through the E.g. the description given here in the header of the PR is very useful but users don't "immediately" read the originating PR. |
There was a problem hiding this comment.
@Aristide021
Thank you for the PR! Overall looks good to me. Please fix the CI error.
f58f54a to
a8ddc87
Compare
@cbalint13 Thank you for the feedback! I've added a comprehensive README.md in the latest commit that |
fdc0fa3 to
10825bb
Compare
|
Hi @Aristide021 This still looks very helpful. Do you have bandwidth to continue pushing it forward and address the requested changes? I’d be happy to help with follow-up if that would be useful. |
Thanks for checking in. I was busy with school, but I'm currently on a term break. I'll be picking this back up this weekend. Should have updates soon. |
…#18247) - Add cmake/modules/contrib/ExampleNPU.cmake with USE_EXAMPLE_NPU_CODEGEN and USE_EXAMPLE_NPU_RUNTIME flags - Wire cmake flags into CMakeLists.txt, LibInfo.cmake, and libinfo.cc - Add softmax pattern to patterns.py - Restructure README: Context section moved to top, add build instructions, inline MatmulReLU definition in Quick Start - Add docs/how_to/tutorials/byoc_npu_example.py tutorial
|
cc @mshr-h @FrozenGene |
Adds a vendor-neutral example NPU backend demonstrating the BYOC (Bring Your Own Codegen) pattern for custom accelerator integration in TVM's Relax framework. Components added: - python/tvm/relax/backend/contrib/example_npu/: pattern registry with op support for matmul, conv1d/2d, depthwise conv2d, pooling, batch norm, softmax, activations, elementwise ops, quantization, and a fused conv2d+relu pattern - src/relax/backend/contrib/example_npu/codegen.cc: JSON serializer registered as relax.ext.example_npu - src/runtime/contrib/example_npu/example_npu_runtime.cc: JSON runtime demonstrating NPU architectural concepts (memory hierarchy, tiling, execution engines, quantization) via CPU emulation - cmake/modules/contrib/ExampleNPU.cmake: build integration via USE_EXAMPLE_NPU_CODEGEN and USE_EXAMPLE_NPU_RUNTIME flags - docs/how_to/tutorials/byoc_npu_example.py: tutorial walking through the full BYOC flow from pattern registration to runtime execution - tests/python/contrib/test_example_npu.py: test suite covering pattern registration, graph partitioning, codegen, and end-to-end execution CI is enabled via tests/scripts/task_config_build_cpu.sh. Addresses reviewer feedback from apache#18247: cmake integration, self- contained README with build instructions, tutorial in docs/how_to, and Context section reorganization.
Adds a vendor-neutral example NPU backend demonstrating the BYOC (Bring Your Own Codegen) pattern for custom accelerator integration in TVM's Relax framework. Components added: - python/tvm/relax/backend/contrib/example_npu/: pattern registry with op support for matmul, conv1d/2d, depthwise conv2d, pooling, batch norm, softmax, activations, elementwise ops, quantization, and a fused conv2d+relu pattern - src/relax/backend/contrib/example_npu/codegen.cc: JSON serializer registered as relax.ext.example_npu - src/runtime/contrib/example_npu/example_npu_runtime.cc: JSON runtime demonstrating NPU architectural concepts (memory hierarchy, tiling, execution engines, quantization) via CPU emulation - cmake/modules/contrib/ExampleNPU.cmake: build integration via USE_EXAMPLE_NPU_CODEGEN and USE_EXAMPLE_NPU_RUNTIME flags - docs/how_to/tutorials/byoc_npu_example.py: tutorial walking through the full BYOC flow from pattern registration to runtime execution - tests/python/contrib/test_example_npu.py: test suite covering pattern registration, graph partitioning, codegen, and end-to-end execution CI is enabled via tests/scripts/task_config_build_cpu.sh. Addresses reviewer feedback from apache#18247: cmake integration, self- contained README with build instructions, tutorial in docs/how_to, and Context section reorganization.
Adds a vendor-neutral example NPU backend demonstrating the BYOC (Bring Your Own Codegen) pattern for custom accelerator integration in TVM's Relax framework. Components added: - python/tvm/relax/backend/contrib/example_npu/: pattern registry with op support for matmul, conv1d/2d, depthwise conv2d, pooling, batch norm, softmax, activations, elementwise ops, quantization, and a fused conv2d+relu pattern - src/relax/backend/contrib/example_npu/codegen.cc: JSON serializer registered as relax.ext.example_npu - src/runtime/contrib/example_npu/example_npu_runtime.cc: JSON runtime demonstrating NPU architectural concepts (memory hierarchy, tiling, execution engines, quantization) via CPU emulation - cmake/modules/contrib/ExampleNPU.cmake: build integration via USE_EXAMPLE_NPU_CODEGEN and USE_EXAMPLE_NPU_RUNTIME flags - docs/how_to/tutorials/byoc_npu_example.py: tutorial walking through the full BYOC flow from pattern registration to runtime execution - tests/python/contrib/test_example_npu.py: test suite covering pattern registration, graph partitioning, codegen, and end-to-end execution CI is enabled via tests/scripts/task_config_build_cpu.sh. Addresses reviewer feedback from apache#18247: cmake integration, self- contained README with build instructions, tutorial in docs/how_to, and Context section reorganization.
10825bb to
c96323f
Compare
Hi @tlopex there's a CI block that appears to be a workflow approval issue from the original PR which mentioned mshr-h. I've removed the mention, could you approve the workflow run so the checks can proceed? I've also added a C++ codegen, runtime, and test suite beyond the earlier commit. |
|
we made some changes in our CI, please consider creating a new PR against the main branch and resubmit. sorry about the trouble @Aristide021 |
This commit introduces a vendor-neutral NPU backend that demonstrates architectural patterns common across Neural Processing Units.
The implementation covers key NPU concepts including multi-tier memory hierarchy management, automatic tiling for large tensors, quantization handling, and specialized execution engines. It shows how NPUs manage memory across different tiers (L0/L1/L2/L3), tile operations to fit in on-chip SRAM, and dispatch operations to dedicated compute units.
This serves as an educational template for developers creating NPU backends, demonstrating BYOC integration while teaching NPU-specific optimization strategies. Uses CPU emulation for testing without requiring actual NPU hardware.
Update (April 2026): addresses prior review feedback and refreshes the PR on top of current
main.This update includes:
tests/python/contrib/test_example_npu.pyUSE_EXAMPLE_NPU_CODEGENUSE_EXAMPLE_NPU_RUNTIMEcmake/modules/contrib/ExampleNPU.cmakesrc/relax/backend/contrib/example_npu/codegen.ccpython/tvm/relax/backend/contrib/example_npu/README.mddocs/how_to/tutorials/byoc_npu_example.pymain, reruns lint/tests, and resolves requested review changesexample_npu.softmaxwith tests