ggml-cpu: extend RVV quantization vec dot to higher VLENs by rehan-10xengineer · Pull Request #22754 · ggml-org/llama.cpp

rehan-10xengineer · 2026-05-06T10:45:48Z

Summary

This PR adds RVV implementations for quantized vector dot kernels (for VLENs 512-bit and 1024-bit).

Each kernel that is VLEN-dependent has its own separate function call now, for example ggml_vec_dot_tq1_0_q8_K_vl512.

Kernels were functionally tested through test-quantize-fns for VLENs 512-bit and 1024-bit with QEMU, using test-quantize-fns.

…iq2_xxs

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

taimur-10x and others added 4 commits May 6, 2026 15:35

ggml-cpu: add rvv 512b,1024b impls for iq4_xs

c7a20ea

ggml-cpu: refactor; add rvv 512b, 1024b impls for q6_K, i-quants

da99a59

added 512 and 1024 implementations of tq3_s, iq3_xxs, iq2_s, iq2_xs, …

9565a40

…iq2_xxs

ggml-cpu: refactor; improve iq2_xs impl for rvv 256

e3e2c6c

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

rehan-10xengineer requested a review from ggerganov as a code owner May 6, 2026 10:45

rehan-10xengineer changed the title ~~10x/riscv quant vec dot vlens~~ ggml-cpu: extend RVV quantization vec dot to higher VLENs May 6, 2026

github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label May 6, 2026