Skip to content

feat(cuda): add C-level simple API mirroring CPU one-shot wrappers#863

Open
DiamonDinoia wants to merge 1 commit into
flatironinstitute:masterfrom
DiamonDinoia:feat/cufinufft-simple-api
Open

feat(cuda): add C-level simple API mirroring CPU one-shot wrappers#863
DiamonDinoia wants to merge 1 commit into
flatironinstitute:masterfrom
DiamonDinoia:feat/cufinufft-simple-api

Conversation

@DiamonDinoia
Copy link
Copy Markdown
Collaborator

@DiamonDinoia DiamonDinoia commented May 16, 2026

Adds 36 entry points (3 dims x 3 types x {single,many} x {single,double}) that wrap the cuFINUFFT plan -> setpts -> execute -> destroy sequence into a single call. Pointers are device pointers; argument order matches the CPU finufft1d1 family in src/c_interface.cpp. The plan is held in a unique_ptr so error paths still release GPU resources.

A cross-check test exercises every (dim, type, precision) combination against the 4-step plan path, and the C ABI is smoke-tested from test/cuda/public_api_test.c. The new surface is documented in docs/c_gpu.rst.

Reference: the prior cufinufft simple-interface draft by @janden on janden/update_cu_api

Adds 36 entry points (3 dims x 3 types x {single,many} x {single,double})
that wrap the cuFINUFFT plan -> setpts -> execute -> destroy sequence
into a single call. Pointers are device pointers; argument order matches
the CPU finufft1d1 family in src/c_interface.cpp. The plan is held in a
unique_ptr so error paths still release GPU resources.

A cross-check test exercises every (dim, type, precision) combination
against the 4-step plan path, and the C ABI is smoke-tested from
test/cuda/public_api_test.c. The new surface is documented in
docs/c_gpu.rst.

Reference: src/c_interface.cpp (CPU finufft1d1 family) and the prior
cufinufft simple-interface draft by @janden on janden/update_cu_api
(commit 61e2ae9 "cuda: first attempt at simple cufinufft interface").

Co-authored-by: Joakim Andén <3976052+janden@users.noreply.github.com>
@DiamonDinoia DiamonDinoia requested a review from janden May 16, 2026 00:04
github-actions Bot added a commit that referenced this pull request May 16, 2026
@github-actions
Copy link
Copy Markdown

Perftest plot

FFT backend: DUCC

Numbers are advisory: GitHub-hosted runners have variable hardware. Treat <1.10× as noise.

CPU and compiler configuration

CPU name: AMD EPYC 9V74 80-Core Processor.

Arch: X86_64.

Core count: 2.

ISA extensions: 3dnowext, 3dnowprefetch, abm, adx, aes, aperfmperf, apic, arat, avx, avx2, avx512_bf16, avx512_bitalg, avx512_vbmi2, avx512_vnni, avx512_vpopcntdq, avx512bitalg, avx512bw, avx512cd, avx512dq, avx512f, avx512ifma, avx512vbmi, avx512vbmi2, avx512vl, avx512vnni, avx512vpopcntdq, bmi1, bmi2, clflush, clflushopt, clwb, clzero, cmov, cmp_legacy, constant_tsc, cpuid, cr8_legacy, cx16, cx8, de, decodeassists, erms, extd_apicid, f16c, flushbyasid, fma, fpu, fsgsbase, fsrm, fxsr, fxsr_opt, gfni, ht, hypervisor, invpcid, lahf_lm, lm, mca, mce, misalignsse, mmx, mmxext, movbe, msr, mtrr, nonstop_tsc, nopl, npt, nrip_save, nx, osvw, osxsave, pae, pat, pausefilter, pcid, pclmulqdq, pdpe1gb, pfthreshold, pge, pni, popcnt, pse, pse36, rdpid, rdpru, rdrand, rdrnd, rdseed, rdtscp, rep_good, sep, sha, sha_ni, smap, smep, sse, sse2, sse4_1, sse4_2, sse4a, ssse3, svm, syscall, topoext, tsc, tsc_known_freq, tsc_reliable, tsc_scale, umip, user_shstk, v_vmsave_vmload, vaes, vmcb_clean, vme, vmmcall, vpclmulqdq, xgetbv1, xsave, xsavec, xsaveerptr, xsaveopt, xsaves, xtopology.

Compiler version: c++ (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0.

Compiler flags: -march=native.

perftest commands
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=f --N1=10000.0 --N2=1 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=0.0001 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=1
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=f --N1=10000.0 --N2=1 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=0.0001 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=2
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=f --N1=10000.0 --N2=1 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=0.0001 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=3
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=d --N1=10000.0 --N2=1 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-09 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=1
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=d --N1=10000.0 --N2=1 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-09 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=2
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=d --N1=10000.0 --N2=1 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-09 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=3
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=f --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-05 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=1
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=f --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-05 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=2
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=f --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-05 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=3
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=d --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-09 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=1
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=d --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-09 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=2
taskset -c 0 /home/runner/work/finufft/finufft/builds/master/perftest/perftest --arg --prec=d --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=1 --M=10000000.0 --tol=1e-09 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=3
/home/runner/work/finufft/finufft/builds/master/perftest/perftest --prec=f --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=0 --M=10000000.0 --tol=1e-05 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=1
/home/runner/work/finufft/finufft/builds/master/perftest/perftest --prec=f --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=0 --M=10000000.0 --tol=1e-05 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=2
/home/runner/work/finufft/finufft/builds/master/perftest/perftest --prec=f --N1=320 --N2=320 --N3=1 --ntransf=1 --threads=0 --M=10000000.0 --tol=1e-05 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=3
/home/runner/work/finufft/finufft/builds/master/perftest/perftest --prec=d --N1=192 --N2=192 --N3=128 --ntransf=1 --threads=0 --M=10000000.0 --tol=1e-07 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=1
/home/runner/work/finufft/finufft/builds/master/perftest/perftest --prec=d --N1=192 --N2=192 --N3=128 --ntransf=1 --threads=0 --M=10000000.0 --tol=1e-07 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=2
/home/runner/work/finufft/finufft/builds/master/perftest/perftest --prec=d --N1=192 --N2=192 --N3=128 --ntransf=1 --threads=0 --M=10000000.0 --tol=1e-07 --n_runs=15 --sort=1 --upsampfact=0 --kerevalmethod=1 --debug=0 --bandwidth=1.0 --type=3

@mreineck
Copy link
Copy Markdown
Collaborator

Good idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants