feat(cuda): add C-level simple API mirroring CPU one-shot wrappers#863
feat(cuda): add C-level simple API mirroring CPU one-shot wrappers#863DiamonDinoia wants to merge 1 commit into
Conversation
Adds 36 entry points (3 dims x 3 types x {single,many} x {single,double})
that wrap the cuFINUFFT plan -> setpts -> execute -> destroy sequence
into a single call. Pointers are device pointers; argument order matches
the CPU finufft1d1 family in src/c_interface.cpp. The plan is held in a
unique_ptr so error paths still release GPU resources.
A cross-check test exercises every (dim, type, precision) combination
against the 4-step plan path, and the C ABI is smoke-tested from
test/cuda/public_api_test.c. The new surface is documented in
docs/c_gpu.rst.
Reference: src/c_interface.cpp (CPU finufft1d1 family) and the prior
cufinufft simple-interface draft by @janden on janden/update_cu_api
(commit 61e2ae9 "cuda: first attempt at simple cufinufft interface").
Co-authored-by: Joakim Andén <3976052+janden@users.noreply.github.com>
|
FFT backend: Numbers are advisory: GitHub-hosted runners have variable hardware. Treat <1.10× as noise. CPU and compiler configurationCPU name: Arch: Core count: ISA extensions: Compiler version: Compiler flags: perftest commands |
|
Good idea! |
Adds 36 entry points (3 dims x 3 types x {single,many} x {single,double}) that wrap the cuFINUFFT plan -> setpts -> execute -> destroy sequence into a single call. Pointers are device pointers; argument order matches the CPU finufft1d1 family in src/c_interface.cpp. The plan is held in a unique_ptr so error paths still release GPU resources.
A cross-check test exercises every (dim, type, precision) combination against the 4-step plan path, and the C ABI is smoke-tested from test/cuda/public_api_test.c. The new surface is documented in docs/c_gpu.rst.
Reference: the prior cufinufft simple-interface draft by @janden on janden/update_cu_api