feat(llvm): add llvm19 support for compute_100+ by brandonros · Pull Request #375 · Rust-GPU/rust-cuda

brandonros · 2026-04-14T12:50:46Z

attempt 2 of #227

brandonros · 2026-04-14T21:59:41Z

@LegNeato this is a much cleaner approach, what do you think? can we see if CI passes?

brandonros · 2026-04-15T12:31:26Z

proof it works in a limited capacity?

$ ./scripts/vast-ai.sh 
>> Building on brandon@asusrogstrix.local
warning: Git tree '/home/brandon/Rust-CUDA' has uncommitted changes
rust-cuda llvm19 shell
  CUDA_HOME=/usr/local/cuda-13.2
  LLVM_CONFIG_19=/nix/store/a7rsrh7cdbc8vzv72j1vc7936d4mapqm-llvm-19.1.7-dev/bin/llvm-config
  NVIDIA_DRIVER_LIB=/home/brandon/Rust-CUDA/.nix-driver-libs/libcuda.so.1
warning: vecadd@0.1.0: Building rustc_codegen_nvvm to satisfy cuda_builder requirements
   Compiling vecadd v0.1.0 (/home/brandon/Rust-CUDA/examples/vecadd)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.69s
>> Staging binary locally
vecadd                                                                                                                                        100% 6980KB  12.9MB/s   00:00    
>> Uploading to root@ssh6.vast.ai:34929
vecadd                                                                                                                                        100% 6980KB  11.4MB/s   00:00    
>> Running on vast.ai
GPU 0: NVIDIA GeForce RTX 5070 (UUID: GPU-cd9e55d4-294f-8a32-1e79-b3c13506c2c8)
[vecadd] cust::quick_init ...
using 131 blocks and 768 threads per block
0.09988744 + 0.3485085 = 0.44839594
[vecadd] cust::quick_init ok
[vecadd] CudaApiVersion::get ...
[vecadd] CudaApiVersion::get ok
[vecadd] CUDA driver API version: 13.2
[vecadd] Device::get_device(0) ...
[vecadd] Device::get_device(0) ok
[vecadd] Device::get_attribute(ComputeCapabilityMajor) ...
[vecadd] Device::get_attribute(ComputeCapabilityMajor) ok
[vecadd] Device::get_attribute(ComputeCapabilityMinor) ...
[vecadd] Device::get_attribute(ComputeCapabilityMinor) ok
[vecadd] Device::name ...
[vecadd] Device::name ok
[vecadd] GPU: NVIDIA GeForce RTX 5070 (compute 12.0)
[vecadd] PTX size: 1320 bytes
[vecadd] PTX header: // | // Generated by NVIDIA NVVM Compiler | // | // Compiler Build ID: UNKNOWN | // Cuda compilation tools, release 13.2, V13.2.78 | // Based on NVVM 22.0.0 | // |  | .version 9.2 | .target sm_100
[vecadd] cuModuleLoadDataEx (with JIT log buffers) ...
[vecadd] cuModuleLoadDataEx raw result code: CUDA_SUCCESS
[vecadd] cuModuleLoadDataEx (with JIT log buffers) ok
[vecadd] Stream::new ...
[vecadd] Stream::new ok
[vecadd] DeviceBuffer::from lhs ...
[vecadd] DeviceBuffer::from lhs ok
[vecadd] DeviceBuffer::from rhs ...
[vecadd] DeviceBuffer::from rhs ok
[vecadd] DeviceBuffer::from out ...
[vecadd] DeviceBuffer::from out ok
[vecadd] Module::get_function("vecadd") ...
[vecadd] Module::get_function("vecadd") ok
[vecadd] suggested_launch_configuration ...
[vecadd] suggested_launch_configuration ok
[vecadd] launching kernel ...
[vecadd] launch queued ok
[vecadd] stream.synchronize ...
[vecadd] stream.synchronize ok
[vecadd] copy_to ...
[vecadd] copy_to ok

brandonros · 2026-04-20T00:45:34Z

@CharryWu thoughts?

brandonros · 2026-04-20T01:11:10Z

                #[cfg(cuMemPrefetchAsync_v2)]
                driver_sys::CUmemLocation {
                    type_: driver_sys::CUmemLocationType::CU_MEM_LOCATION_TYPE_DEVICE,
+                    #[cfg(cuMemLocation_anon_id)]


this is from #368

Add the initial llvm19 cargo/build.rs plumbing while preserving the llvm7\ncheck path. Assemble a v19 libintrinsics bitcode at build time and route\nnvvm.rs through the build-script-provided path.\n\nDocument the validated baseline on the current host and the first Layer 1\nblocker: the existing C++ shim no longer builds unchanged against LLVM 19\nbecause rustllvm.h still expects headers like llvm/ADT/Triple.h. RUST_CUDA_ALLOW_LEGACY_ARCH_WITH_LLVM19 compute_100 target working through compilation errors working throw sigsegv on vecadd nix flake libintrinsics libintrinsics chore(llvm19): close out Layer 3 pre-smoke work Finalize the Layer 3 plan, add env-driven final-module and LLVM IR capture hooks to vecadd, and validate the harness locally so the next phase can move straight to CUDA 12.9+ smoke testing. refactor(llvm19): close out Layer 2 containment Add named Rust-side containment helpers for debug info and target machine creation, make the current ThinLTO behavior explicit, and update LLVM19_PLAN.md to mark Layers 2c and 2d complete. refactor(llvm19): start Layer 2 helper containment Add a small Rust-side helper surface in src/llvm.rs for call-building, symbol insertion, and debug-location setting, then migrate the obvious callers without introducing LLVM-version cfg branching. Update LLVM19_PLAN.md to reflect the real Layer 2 state: 2a is complete, 2b is complete, 2c is partially landed, and 2d is still pending. Include the current .gitignore change in this checkpoint as requested. feat(llvm19): complete Layer 1 C++ shim bridge Bridge the wrapper headers and C++ shims so rustc_codegen_nvvm now builds against both LLVM 7 and LLVM 19. This adds the LLVM 19 wrapper headers, ports RustWrapper.cpp and PassWrapper.cpp through the current checkpoint, and records the completed Layer 1 progress and remaining Layer 2 caveats in the plan. ptxjitcompiler.so load_ptx_with_log unified? Co-Authored-By: OpenAI Codex <codex@openai.com>

brandonros · 2026-04-23T19:27:09Z

CI passed all the way to docs, if we could allow it once more. thank you!!!

LegNeato

Thanks for jamming on this! Some small comments.

LegNeato · 2026-04-25T01:23:23Z

+/// LLVM 7 NVVM dialect, so pairing them with an LLVM 19 backend is never the right choice.
+/// Callers can still override via [`CudaBuilder::arch`].
+fn default_arch() -> NvvmArch {
+    if env::var_os("LLVM_CONFIG_19").is_some() {


Not a fan of the env variables.

Is there any way to tell so we can just do the right thing automatically in the default case? Maybe query rustc / the nvvm backend and expose which llvm it supports there (via rustflags?)?

ebfe81b solved here or no?

LegNeato · 2026-04-25T01:28:59Z


      # Exclude crates that require cuDNN, not available on Windows CI: cudnn, cudnn-sys.
+      # Exclude rustc_codegen_nvvm: `--all-features` enables its `llvm19` feature,
+      # whose build.rs requires an LLVM 19 toolchain not present in the CI image.


I guss we should add this to the images for the build step?

Linux LLVM 19 in CI images

Windows LLVM 19 prebuilt

RockyLinux 9 specifically (the genuinely awkward one)

Dual LLVM 7 + LLVM 19 testing in CI

could I land those in a separate followup PR? I'll create a tracking issue and then go figure out how to get Linux and Windows to both work https://github.com/rust-gpu/rustc_codegen_nvvm-llvm/releases/ 19

…eature Replace the LLVM_CONFIG_19 env-var sniffing in `cuda_builder` with a proper `llvm19` cargo feature, addressing review feedback on Rust-GPU#375. - `nvvm` gains an `llvm19` feature; `NvvmArch`'s `#[default]` is moved off `Compute75` and onto `Compute100` via `cfg_attr` when it's enabled, so `NvvmArch::default()` returns the right answer for the active dialect. - `cuda_builder` gains a matching `llvm19` feature that propagates to `nvvm/llvm19` and (when the optional dep is on) `rustc_codegen_nvvm/llvm19`. `CudaBuilder::new` goes back to plain `NvvmArch::default()`. - The build script's nested `cargo build -p rustc_codegen_nvvm` now keys the `--features llvm19` flag off `cfg!(feature = "llvm19")` instead of the env var, so the toggle lives in one place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

brandonros changed the title ~~feat(llvm19): scaffold Layer 0 and record progress~~ feat(llvm): add llvm19 support for compute_100+ Apr 14, 2026

brandonros marked this pull request as ready for review April 14, 2026 21:58

brandonros force-pushed the llvm19-cfg branch 10 times, most recently from 2b397fc to f53e57d Compare April 15, 2026 12:30

brandonros force-pushed the llvm19-cfg branch 3 times, most recently from 2feadba to f37288a Compare April 20, 2026 00:21

brandonros force-pushed the llvm19-cfg branch from f37288a to c240f46 Compare April 20, 2026 01:08

brandonros commented Apr 20, 2026

View reviewed changes

brandonros force-pushed the llvm19-cfg branch from c240f46 to 0ee2676 Compare April 22, 2026 13:10

brandonros force-pushed the llvm19-cfg branch from 0ee2676 to 3e35e85 Compare April 23, 2026 19:26

LegNeato requested changes Apr 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llvm): add llvm19 support for compute_100+#375

feat(llvm): add llvm19 support for compute_100+#375
brandonros wants to merge 2 commits intoRust-GPU:mainfrom
brandonros:llvm19-cfg

brandonros commented Apr 14, 2026 •

edited

Loading

Uh oh!

brandonros commented Apr 14, 2026

Uh oh!

brandonros commented Apr 15, 2026

Uh oh!

brandonros commented Apr 20, 2026

Uh oh!

brandonros Apr 20, 2026

Uh oh!

brandonros commented Apr 23, 2026

Uh oh!

LegNeato left a comment

Uh oh!

LegNeato Apr 25, 2026

Uh oh!

brandonros Apr 25, 2026

Uh oh!

Uh oh!

LegNeato Apr 25, 2026

Uh oh!

brandonros Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brandonros commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonros commented Apr 14, 2026

Uh oh!

brandonros commented Apr 15, 2026

Uh oh!

brandonros commented Apr 20, 2026

Uh oh!

brandonros Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

brandonros commented Apr 23, 2026

Uh oh!

LegNeato left a comment

Choose a reason for hiding this comment

Uh oh!

LegNeato Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

brandonros Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LegNeato Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

brandonros Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brandonros commented Apr 14, 2026 •

edited

Loading