Write a readme on Multi-GPU usage in llama.cpp#22729
Write a readme on Multi-GPU usage in llama.cpp#22729JohannesGaessler merged 4 commits intoggml-org:masterfrom
Conversation
|
@JohannesGaessler could you please take a look at this? |
|
Sorry, I had looked at this from my phone and thought this was an issue asking me to do it rather than a PR that already did it. I'll review it later today. |
|
It seems like some information about the environment variable GGML_CUDA_P2P is missing? |
JohannesGaessler
left a comment
There was a problem hiding this comment.
This is not so much an issue with this particular but rather with how to handle legacy code going forward: --split-mode row is definitely deprecated and I intend to remove it. But I would say that eventually we should also deprecate and remove --split-mode none and instead handle this via -dev which I would say is the recommended way to select GPUs.
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
|
Thanks for the review and edits @JohannesGaessler. I have addressed the remaining comments and also added a small section on |
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
|
@ggerganov can you re-approve since the last commit has me as a co-author? |
|
@JohannesGaessler Sorry for chiming in, as I'm just a common user. But can you please re-consider removing '-sm row' at least until '-sm tensor' becomes fully functional? It gives a decent speed-up for generation with dense models in comparison to the default (-sm layer). |
|
I intend to remove |
Document known issues and provide troubleshooting for multi-GPU usage in llama.cpp
Requirements