Draft: ggml-opencl: Early proof-of-concept implementation of plans via command buffers#22764
Draft
jansol wants to merge 5 commits intoggml-org:masterfrom
Draft
Draft: ggml-opencl: Early proof-of-concept implementation of plans via command buffers#22764jansol wants to merge 5 commits intoggml-org:masterfrom
jansol wants to merge 5 commits intoggml-org:masterfrom
Conversation
|
Hi @jansol, thanks for your contribution! Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:
Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This is still very incomplete and far from ready for proper review, but in offline discussions there has been some interest towards this, so I'm opening a draft PR for easier discussion & collaboration.
This is a not-at-all baked implementation of the ggml plan API with OpenCL command buffers that I've been experimenting with, built on top of the shared execution plan code from #16548. It's only tested gemma-3-1b-it-f16.gguf and only on the PoCL-CUDA and ROCm drivers.
I've also included some changes towards async support in ggml-opencl, as I saw a lot of time being spent on synchronous I/O when running on PoCL-Remote.
In my PoCL-Remote experiments I see some nice performance improvements from both the command buffers and the partial async-ification alike. PoCL-Remote is of course an extreme corner case of slow host<->device communication, but other setups should also see either slightly improved performance or no meaningful change.
Additional information
There are a bunch of changes to make various kernels build at all on ROCm and PoCL-CUDA. I would expect that most of those become unnecessary when #21310 and its follow-up PRs land.
The biggest open question is how to handle temporary (sub) buffers that are created in several kernels. This will need to be solved for both proper async support and command buffers. I've drafted in an attempt at deferred freeing, but with command buffers it should also possible to tie temporary buffers directly to the lifetime of the command buffer itself.
@wishstudio I took the liberty of cherry-picking only the relevant bits from your PR into a new commit and marking you as the author of that. Let me know if you'd prefer some other way of handling that.
Requirements