Skip to content

[Backend][Relax] Add NPU BYOC backend example#18247

Open
Aristide021 wants to merge 1 commit intoapache:mainfrom
Aristide021:contrib-npu-generic
Open

[Backend][Relax] Add NPU BYOC backend example#18247
Aristide021 wants to merge 1 commit intoapache:mainfrom
Aristide021:contrib-npu-generic

Conversation

@Aristide021
Copy link
Copy Markdown

@Aristide021 Aristide021 commented Aug 28, 2025

This commit introduces a vendor-neutral NPU backend that demonstrates architectural patterns common across Neural Processing Units.

The implementation covers key NPU concepts including multi-tier memory hierarchy management, automatic tiling for large tensors, quantization handling, and specialized execution engines. It shows how NPUs manage memory across different tiers (L0/L1/L2/L3), tile operations to fit in on-chip SRAM, and dispatch operations to dedicated compute units.

This serves as an educational template for developers creating NPU backends, demonstrating BYOC integration while teaching NPU-specific optimization strategies. Uses CPU emulation for testing without requiring actual NPU hardware.


Update (April 2026): addresses prior review feedback and refreshes the PR on top of current main.

This update includes:

  • moves/keeps tests under tests/python/contrib/test_example_npu.py
  • adds build integration for example NPU codegen/runtime:
    • USE_EXAMPLE_NPU_CODEGEN
    • USE_EXAMPLE_NPU_RUNTIME
    • cmake/modules/contrib/ExampleNPU.cmake
    • CMake wiring for codegen/runtime sources
  • adds BYOC codegen entrypoint at src/relax/backend/contrib/example_npu/codegen.cc
  • documents enablement and usage in python/tvm/relax/backend/contrib/example_npu/README.md
  • adds a docs tutorial at docs/how_to/tutorials/byoc_npu_example.py
  • rebases the branch to current main, reruns lint/tests, and resolves requested review changes
  • extends pattern coverage to include example_npu.softmax with tests

@tqchen
Copy link
Copy Markdown
Member

tqchen commented Aug 28, 2025

cc @mshr-h can you help to take a look

@Aristide021 Aristide021 force-pushed the contrib-npu-generic branch 5 times, most recently from 8fcab1b to f58f54a Compare August 28, 2025 20:03
@mshr-h
Copy link
Copy Markdown
Contributor

mshr-h commented Aug 29, 2025

@tvm-bot rerun

@cbalint13
Copy link
Copy Markdown
Contributor

cbalint13 commented Aug 29, 2025

@Aristide021 ,

Very nice work, congratulations!

As a simple user who might have such interests, walking through the contrib section, could there be a simple README.md companion in this ‎python/tvm/relax/backend/contrib/example_npu with basic description of this folder's content ? It could describe a summary/purpose/technical/diagram (not necessarily all enumerated) perhaps even output results of the examples.

E.g. the description given here in the header of the PR is very useful but users don't "immediately" read the originating PR.

Copy link
Copy Markdown
Contributor

@mshr-h mshr-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aristide021
Thank you for the PR! Overall looks good to me. Please fix the CI error.

Comment thread tests/python/relax/test_example_npu.py
@Aristide021
Copy link
Copy Markdown
Author

@Aristide021 ,

Very nice work, congratulations!

As a simple user who might have such interests, walking through the contrib section, could there be a simple README.md companion in this ‎python/tvm/relax/backend/contrib/example_npu with basic description of this folder's content ? It could describe a summary/purpose/technical/diagram (not necessarily all enumerated) perhaps even output results of the examples.

E.g. the description given here in the header of the PR is very useful but users don't "immediately" read the originating PR.

@cbalint13 Thank you for the feedback! I've added a comprehensive README.md in the latest commit that
includes context and documentation that a user would need to understand when implementing an NPU backend.

@Aristide021 Aristide021 force-pushed the contrib-npu-generic branch 3 times, most recently from fdc0fa3 to 10825bb Compare August 30, 2025 14:52
Comment thread python/tvm/relax/backend/contrib/example_npu/README.md
Comment thread python/tvm/relax/backend/contrib/example_npu/README.md Outdated
Comment thread src/runtime/contrib/example_npu/example_npu_runtime.cc
Comment thread python/tvm/relax/backend/contrib/example_npu/README.md Outdated
Comment thread python/tvm/relax/backend/contrib/example_npu/README.md Outdated
@tlopex
Copy link
Copy Markdown
Member

tlopex commented Apr 17, 2026

Hi @Aristide021 This still looks very helpful. Do you have bandwidth to continue pushing it forward and address the requested changes? I’d be happy to help with follow-up if that would be useful.

@Aristide021
Copy link
Copy Markdown
Author

Hi @Aristide021 This still looks very helpful. Do you have bandwidth to continue pushing it forward and address the requested changes? I’d be happy to help with follow-up if that would be useful.

Thanks for checking in. I was busy with school, but I'm currently on a term break. I'll be picking this back up this weekend. Should have updates soon.

Aristide021 added a commit to Aristide021/tvm that referenced this pull request Apr 19, 2026
…#18247)

- Add cmake/modules/contrib/ExampleNPU.cmake with USE_EXAMPLE_NPU_CODEGEN
  and USE_EXAMPLE_NPU_RUNTIME flags
- Wire cmake flags into CMakeLists.txt, LibInfo.cmake, and libinfo.cc
- Add softmax pattern to patterns.py
- Restructure README: Context section moved to top, add build instructions,
  inline MatmulReLU definition in Quick Start
- Add docs/how_to/tutorials/byoc_npu_example.py tutorial
@tlopex
Copy link
Copy Markdown
Member

tlopex commented Apr 20, 2026

cc @mshr-h @FrozenGene
Please have a look. I think we can merge this first and follow up with any small modifications if needed.

Aristide021 added a commit to Aristide021/tvm that referenced this pull request Apr 20, 2026
Adds a vendor-neutral example NPU backend demonstrating the BYOC
(Bring Your Own Codegen) pattern for custom accelerator integration
in TVM's Relax framework.

Components added:
- python/tvm/relax/backend/contrib/example_npu/: pattern registry with
  op support for matmul, conv1d/2d, depthwise conv2d, pooling, batch
  norm, softmax, activations, elementwise ops, quantization, and a
  fused conv2d+relu pattern
- src/relax/backend/contrib/example_npu/codegen.cc: JSON serializer
  registered as relax.ext.example_npu
- src/runtime/contrib/example_npu/example_npu_runtime.cc: JSON runtime
  demonstrating NPU architectural concepts (memory hierarchy, tiling,
  execution engines, quantization) via CPU emulation
- cmake/modules/contrib/ExampleNPU.cmake: build integration via
  USE_EXAMPLE_NPU_CODEGEN and USE_EXAMPLE_NPU_RUNTIME flags
- docs/how_to/tutorials/byoc_npu_example.py: tutorial walking through
  the full BYOC flow from pattern registration to runtime execution
- tests/python/contrib/test_example_npu.py: test suite covering pattern
  registration, graph partitioning, codegen, and end-to-end execution

CI is enabled via tests/scripts/task_config_build_cpu.sh.

Addresses reviewer feedback from apache#18247: cmake integration, self-
contained README with build instructions, tutorial in docs/how_to,
and Context section reorganization.
Aristide021 added a commit to Aristide021/tvm that referenced this pull request Apr 20, 2026
Adds a vendor-neutral example NPU backend demonstrating the BYOC
(Bring Your Own Codegen) pattern for custom accelerator integration
in TVM's Relax framework.

Components added:
- python/tvm/relax/backend/contrib/example_npu/: pattern registry with
  op support for matmul, conv1d/2d, depthwise conv2d, pooling, batch
  norm, softmax, activations, elementwise ops, quantization, and a
  fused conv2d+relu pattern
- src/relax/backend/contrib/example_npu/codegen.cc: JSON serializer
  registered as relax.ext.example_npu
- src/runtime/contrib/example_npu/example_npu_runtime.cc: JSON runtime
  demonstrating NPU architectural concepts (memory hierarchy, tiling,
  execution engines, quantization) via CPU emulation
- cmake/modules/contrib/ExampleNPU.cmake: build integration via
  USE_EXAMPLE_NPU_CODEGEN and USE_EXAMPLE_NPU_RUNTIME flags
- docs/how_to/tutorials/byoc_npu_example.py: tutorial walking through
  the full BYOC flow from pattern registration to runtime execution
- tests/python/contrib/test_example_npu.py: test suite covering pattern
  registration, graph partitioning, codegen, and end-to-end execution

CI is enabled via tests/scripts/task_config_build_cpu.sh.

Addresses reviewer feedback from apache#18247: cmake integration, self-
contained README with build instructions, tutorial in docs/how_to,
and Context section reorganization.
Adds a vendor-neutral example NPU backend demonstrating the BYOC
(Bring Your Own Codegen) pattern for custom accelerator integration
in TVM's Relax framework.

Components added:
- python/tvm/relax/backend/contrib/example_npu/: pattern registry with
  op support for matmul, conv1d/2d, depthwise conv2d, pooling, batch
  norm, softmax, activations, elementwise ops, quantization, and a
  fused conv2d+relu pattern
- src/relax/backend/contrib/example_npu/codegen.cc: JSON serializer
  registered as relax.ext.example_npu
- src/runtime/contrib/example_npu/example_npu_runtime.cc: JSON runtime
  demonstrating NPU architectural concepts (memory hierarchy, tiling,
  execution engines, quantization) via CPU emulation
- cmake/modules/contrib/ExampleNPU.cmake: build integration via
  USE_EXAMPLE_NPU_CODEGEN and USE_EXAMPLE_NPU_RUNTIME flags
- docs/how_to/tutorials/byoc_npu_example.py: tutorial walking through
  the full BYOC flow from pattern registration to runtime execution
- tests/python/contrib/test_example_npu.py: test suite covering pattern
  registration, graph partitioning, codegen, and end-to-end execution

CI is enabled via tests/scripts/task_config_build_cpu.sh.

Addresses reviewer feedback from apache#18247: cmake integration, self-
contained README with build instructions, tutorial in docs/how_to,
and Context section reorganization.
@Aristide021 Aristide021 force-pushed the contrib-npu-generic branch from 10825bb to c96323f Compare April 20, 2026 13:14
@Aristide021
Copy link
Copy Markdown
Author

cc @mshr-h @FrozenGene Please have a look. I think we can merge this first and follow up with any small modifications if needed.

Hi @tlopex there's a CI block that appears to be a workflow approval issue from the original PR which mentioned mshr-h. I've removed the mention, could you approve the workflow run so the checks can proceed? I've also added a C++ codegen, runtime, and test suite beyond the earlier commit.

@mshr-h
Copy link
Copy Markdown
Contributor

mshr-h commented Apr 20, 2026

we made some changes in our CI, please consider creating a new PR against the main branch and resubmit. sorry about the trouble @Aristide021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants