Skip to content

ggml-sycl : use malloc_shared for UMA/integrated GPU devices#22766

Open
vmartirosyan wants to merge 1 commit intoggml-org:masterfrom
vmartirosyan:sycl-uma-shared-buffers
Open

ggml-sycl : use malloc_shared for UMA/integrated GPU devices#22766
vmartirosyan wants to merge 1 commit intoggml-org:masterfrom
vmartirosyan:sycl-uma-shared-buffers

Conversation

@vmartirosyan
Copy link
Copy Markdown

Overview

I have been executing the llama.cpp on my Intel Arc integrated platform and found out that it is not using shared memory advantage. I had already done something like this for my auto-parallelizer compiler and it really helps to boost the performance. The change helped me to achieve a speedup like: [ Prompt: 141.0 t/s | Generation: 10.7 t/s ].

Additional information

Tested on Intel Arc 140T iGPU. Should not affect any discrete GPU, as the newly added code is all in if blocks.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES. I designed the solution based on my prior OpenCL work; AI me assisted with SYCL API translation.

@vmartirosyan vmartirosyan requested a review from a team as a code owner May 6, 2026 15:11
@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 6, 2026
Copy link
Copy Markdown
Contributor

@arthw arthw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to test it.

Could you share the test result to show the benefit?
And the info of OS, oneAPI version, Intel CPU/GPU model, LLM gguf name.

Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants