Skip to content

Draft: ggml-opencl: Early proof-of-concept implementation of plans via command buffers#22764

Draft
jansol wants to merge 5 commits intoggml-org:masterfrom
jansol:experimental
Draft

Draft: ggml-opencl: Early proof-of-concept implementation of plans via command buffers#22764
jansol wants to merge 5 commits intoggml-org:masterfrom
jansol:experimental

Conversation

@jansol
Copy link
Copy Markdown

@jansol jansol commented May 6, 2026

Overview

This is still very incomplete and far from ready for proper review, but in offline discussions there has been some interest towards this, so I'm opening a draft PR for easier discussion & collaboration.

This is a not-at-all baked implementation of the ggml plan API with OpenCL command buffers that I've been experimenting with, built on top of the shared execution plan code from #16548. It's only tested gemma-3-1b-it-f16.gguf and only on the PoCL-CUDA and ROCm drivers.

I've also included some changes towards async support in ggml-opencl, as I saw a lot of time being spent on synchronous I/O when running on PoCL-Remote.

In my PoCL-Remote experiments I see some nice performance improvements from both the command buffers and the partial async-ification alike. PoCL-Remote is of course an extreme corner case of slow host<->device communication, but other setups should also see either slightly improved performance or no meaningful change.

Additional information

There are a bunch of changes to make various kernels build at all on ROCm and PoCL-CUDA. I would expect that most of those become unnecessary when #21310 and its follow-up PRs land.

The biggest open question is how to handle temporary (sub) buffers that are created in several kernels. This will need to be solved for both proper async support and command buffers. I've drafted in an attempt at deferred freeing, but with command buffers it should also possible to tie temporary buffers directly to the lifetime of the command buffer itself.

@wishstudio I took the liberty of cherry-picking only the relevant bits from your PR into a new commit and marking you as the author of that. Let me know if you'd prefer some other way of handling that.

Requirements

@ggml-gh-bot
Copy link
Copy Markdown

ggml-gh-bot Bot commented May 6, 2026

Hi @jansol, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • Large PR: Large changes require prior discussion (e.g. an issue or RFC) and maintainers may not be able to review this PR as-is. Consider splitting it into smaller, focused PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants