Skip to content

feat(llvm): add llvm19 support for compute_100+#375

Open
brandonros wants to merge 2 commits intoRust-GPU:mainfrom
brandonros:llvm19-cfg
Open

feat(llvm): add llvm19 support for compute_100+#375
brandonros wants to merge 2 commits intoRust-GPU:mainfrom
brandonros:llvm19-cfg

Conversation

@brandonros
Copy link
Copy Markdown

@brandonros brandonros commented Apr 14, 2026

attempt 2 of #227

@brandonros brandonros changed the title feat(llvm19): scaffold Layer 0 and record progress feat(llvm): add llvm19 support for compute_100+ Apr 14, 2026
@brandonros brandonros marked this pull request as ready for review April 14, 2026 21:58
@brandonros
Copy link
Copy Markdown
Author

@LegNeato this is a much cleaner approach, what do you think? can we see if CI passes?

@brandonros brandonros force-pushed the llvm19-cfg branch 10 times, most recently from 2b397fc to f53e57d Compare April 15, 2026 12:30
@brandonros
Copy link
Copy Markdown
Author

proof it works in a limited capacity?

$ ./scripts/vast-ai.sh 
>> Building on brandon@asusrogstrix.local
warning: Git tree '/home/brandon/Rust-CUDA' has uncommitted changes
rust-cuda llvm19 shell
  CUDA_HOME=/usr/local/cuda-13.2
  LLVM_CONFIG_19=/nix/store/a7rsrh7cdbc8vzv72j1vc7936d4mapqm-llvm-19.1.7-dev/bin/llvm-config
  NVIDIA_DRIVER_LIB=/home/brandon/Rust-CUDA/.nix-driver-libs/libcuda.so.1
warning: vecadd@0.1.0: Building rustc_codegen_nvvm to satisfy cuda_builder requirements
   Compiling vecadd v0.1.0 (/home/brandon/Rust-CUDA/examples/vecadd)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.69s
>> Staging binary locally
vecadd                                                                                                                                        100% 6980KB  12.9MB/s   00:00    
>> Uploading to root@ssh6.vast.ai:34929
vecadd                                                                                                                                        100% 6980KB  11.4MB/s   00:00    
>> Running on vast.ai
GPU 0: NVIDIA GeForce RTX 5070 (UUID: GPU-cd9e55d4-294f-8a32-1e79-b3c13506c2c8)
[vecadd] cust::quick_init ...
using 131 blocks and 768 threads per block
0.09988744 + 0.3485085 = 0.44839594
[vecadd] cust::quick_init ok
[vecadd] CudaApiVersion::get ...
[vecadd] CudaApiVersion::get ok
[vecadd] CUDA driver API version: 13.2
[vecadd] Device::get_device(0) ...
[vecadd] Device::get_device(0) ok
[vecadd] Device::get_attribute(ComputeCapabilityMajor) ...
[vecadd] Device::get_attribute(ComputeCapabilityMajor) ok
[vecadd] Device::get_attribute(ComputeCapabilityMinor) ...
[vecadd] Device::get_attribute(ComputeCapabilityMinor) ok
[vecadd] Device::name ...
[vecadd] Device::name ok
[vecadd] GPU: NVIDIA GeForce RTX 5070 (compute 12.0)
[vecadd] PTX size: 1320 bytes
[vecadd] PTX header: // | // Generated by NVIDIA NVVM Compiler | // | // Compiler Build ID: UNKNOWN | // Cuda compilation tools, release 13.2, V13.2.78 | // Based on NVVM 22.0.0 | // |  | .version 9.2 | .target sm_100
[vecadd] cuModuleLoadDataEx (with JIT log buffers) ...
[vecadd] cuModuleLoadDataEx raw result code: CUDA_SUCCESS
[vecadd] cuModuleLoadDataEx (with JIT log buffers) ok
[vecadd] Stream::new ...
[vecadd] Stream::new ok
[vecadd] DeviceBuffer::from lhs ...
[vecadd] DeviceBuffer::from lhs ok
[vecadd] DeviceBuffer::from rhs ...
[vecadd] DeviceBuffer::from rhs ok
[vecadd] DeviceBuffer::from out ...
[vecadd] DeviceBuffer::from out ok
[vecadd] Module::get_function("vecadd") ...
[vecadd] Module::get_function("vecadd") ok
[vecadd] suggested_launch_configuration ...
[vecadd] suggested_launch_configuration ok
[vecadd] launching kernel ...
[vecadd] launch queued ok
[vecadd] stream.synchronize ...
[vecadd] stream.synchronize ok
[vecadd] copy_to ...
[vecadd] copy_to ok

@brandonros brandonros force-pushed the llvm19-cfg branch 3 times, most recently from 2feadba to f37288a Compare April 20, 2026 00:21
@brandonros
Copy link
Copy Markdown
Author

@CharryWu thoughts?

#[cfg(cuMemPrefetchAsync_v2)]
driver_sys::CUmemLocation {
type_: driver_sys::CUmemLocationType::CU_MEM_LOCATION_TYPE_DEVICE,
#[cfg(cuMemLocation_anon_id)]
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is from #368

Add the initial llvm19 cargo/build.rs plumbing while preserving the llvm7\ncheck path. Assemble a v19 libintrinsics bitcode at build time and route\nnvvm.rs through the build-script-provided path.\n\nDocument the validated baseline on the current host and the first Layer 1\nblocker: the existing C++ shim no longer builds unchanged against LLVM 19\nbecause rustllvm.h still expects headers like llvm/ADT/Triple.h.

RUST_CUDA_ALLOW_LEGACY_ARCH_WITH_LLVM19

compute_100 target

working through compilation errors

working throw sigsegv on vecadd

nix flake

libintrinsics

libintrinsics

chore(llvm19): close out Layer 3 pre-smoke work

Finalize the Layer 3 plan, add env-driven final-module and LLVM IR capture hooks to vecadd, and validate the harness locally so the next phase can move straight to CUDA 12.9+ smoke testing.

refactor(llvm19): close out Layer 2 containment

Add named Rust-side containment helpers for debug info and target machine creation, make the current ThinLTO behavior explicit, and update LLVM19_PLAN.md to mark Layers 2c and 2d complete.

refactor(llvm19): start Layer 2 helper containment

Add a small Rust-side helper surface in src/llvm.rs for call-building, symbol insertion, and debug-location setting, then migrate the obvious callers without introducing LLVM-version cfg branching.

Update LLVM19_PLAN.md to reflect the real Layer 2 state: 2a is complete, 2b is complete, 2c is partially landed, and 2d is still pending. Include the current .gitignore change in this checkpoint as requested.

feat(llvm19): complete Layer 1 C++ shim bridge

Bridge the wrapper headers and C++ shims so rustc_codegen_nvvm now builds against both LLVM 7 and LLVM 19.

This adds the LLVM 19 wrapper headers, ports RustWrapper.cpp and PassWrapper.cpp through the current checkpoint, and records the completed Layer 1 progress and remaining Layer 2 caveats in the plan.

ptxjitcompiler.so

load_ptx_with_log

unified?

Co-Authored-By: OpenAI Codex <codex@openai.com>
@brandonros
Copy link
Copy Markdown
Author

CI passed all the way to docs, if we could allow it once more. thank you!!!

Copy link
Copy Markdown
Contributor

@LegNeato LegNeato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for jamming on this! Some small comments.

Comment thread crates/cuda_builder/src/lib.rs Outdated
/// LLVM 7 NVVM dialect, so pairing them with an LLVM 19 backend is never the right choice.
/// Callers can still override via [`CudaBuilder::arch`].
fn default_arch() -> NvvmArch {
if env::var_os("LLVM_CONFIG_19").is_some() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of the env variables.

Is there any way to tell so we can just do the right thing automatically in the default case? Maybe query rustc / the nvvm backend and expose which llvm it supports there (via rustflags?)?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ebfe81b solved here or no?

Comment thread crates/cuda_builder/src/lib.rs

# Exclude crates that require cuDNN, not available on Windows CI: cudnn, cudnn-sys.
# Exclude rustc_codegen_nvvm: `--all-features` enables its `llvm19` feature,
# whose build.rs requires an LLVM 19 toolchain not present in the CI image.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guss we should add this to the images for the build step?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Linux LLVM 19 in CI images
  • Windows LLVM 19 prebuilt
  • RockyLinux 9 specifically (the genuinely awkward one)
  • Dual LLVM 7 + LLVM 19 testing in CI

could I land those in a separate followup PR? I'll create a tracking issue and then go figure out how to get Linux and Windows to both work https://github.com/rust-gpu/rustc_codegen_nvvm-llvm/releases/ 19

…eature

Replace the LLVM_CONFIG_19 env-var sniffing in `cuda_builder` with a proper
`llvm19` cargo feature, addressing review feedback on Rust-GPU#375.

- `nvvm` gains an `llvm19` feature; `NvvmArch`'s `#[default]` is moved off
  `Compute75` and onto `Compute100` via `cfg_attr` when it's enabled, so
  `NvvmArch::default()` returns the right answer for the active dialect.
- `cuda_builder` gains a matching `llvm19` feature that propagates to
  `nvvm/llvm19` and (when the optional dep is on) `rustc_codegen_nvvm/llvm19`.
  `CudaBuilder::new` goes back to plain `NvvmArch::default()`.
- The build script's nested `cargo build -p rustc_codegen_nvvm` now keys the
  `--features llvm19` flag off `cfg!(feature = "llvm19")` instead of the env
  var, so the toggle lives in one place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants