Use tweaked PSWF implementation#854
Conversation
|
I leave the decision to @ahbarnett and @lu1and10, My suggestion would be to make all the related changes in this PR in one go instead of follow up PR. As I want to use this code in the GPU next, I'd rather make all the changes now than making GPU changes on a moving target.
I would measure the performance of 2 and 3 and see where we end up. @mreineck I do not want you to taks with something that might be reverted so I would wait for @lu1and10 and @ahbarnett to confirm what they like. I think 2/3 can be a follow up PR as @lu1and10 prefers but to me we can just do tree commits here in this PR so the history is meaningful. |
|
Thanks for the comments. I like this PR overall and would be in favor of moving it forward. For (1), I agree that having an error path is better, even if it should basically never trigger. For (2), I am OK with either keeping the current cache for now or revisiting it if benchmarks show it matters. For (3), I would prefer to leave SIMD/batched evaluation for a separate PR. A follow-up PR would also let us benchmark the batched path properly and decide on the SIMD abstraction without expanding the scope here. So my preference is: merge this PR once the error handling/details are settled, and track cache/SIMD batching separately if needed. |
|
Thanks for the comments! I'll look at the error handling tomorrow or perhaps later today! Removing the cache will require structural changes in the rest of the code in order to keep efficiency, but perhaps they are minor (I haven't looked closely yet.) |
|
Just a small word of caution:
Using the PSWF setup code (not necessarily the function evaluation) with anything but double precision will require changes I don't feel competent to do, and the performance gains will probably not be worth the effort. |
|
Here is a version that avoids the caches and only builds one |
|
Interesting ... |
|
Thanks all, Re half-prec: Otherwise I'll bring it in... |
Okay.
Right you are correct.
I have no objections. |
Took the chance to update the compilers in Ci. It should work now. |
|
FFT backend: Numbers are advisory: GitHub-hosted runners have variable hardware. Treat <1.10× as noise. CPU and compiler configurationCPU name: Arch: Core count: ISA extensions: Compiler version: Compiler flags: perftest commands |
|
Hi @mreineck,
Fix: // kf==4
return (common::cyl_bessel_i(0, beta * std::sqrt(1.0 - z*z)) - 1.0) / besselbeta;
// kf==6
return (std::cosh(beta * std::sqrt(1.0 - z*z)) - 1.0) / coshbeta;
|
…al finufft::exception in PSWF setup
|
Wow, thank you for spotting the parenthesis errors - these were embarassing! In
I don't really understand that one. If I move the construction of the PSWF object into the lambda, the setup machinery will run every time the lambda is called, which we don't want. Copying the two vectors once is very cheap in comparison. |
Oh my bad! The copy happens once, at lambda construction time |
|
I guess only rebasing master is needed then I am happy. Though @lu1and10 and @ahbarnett are the ones that understand more about this than me. |
|
Master is merged. Please feel free to squash the commits on the branch in any way you see fit - I don't have enough experience with this to do it myself. I also re-added the "return 0 if z is outside [-1;1]" to the kernel evaluation; I had forgotten this when switching to the lambda. |
"Move-capturing" would have been a nice feature here, but I suspect that this was left out of the standard for good reasons ... |
|
looks good, should we merge? |
|
According to the meeting minutes, @ahbarnett wanted to have a look, so perhaps we should wait. |
ahbarnett
left a comment
There was a problem hiding this comment.
Hi Martin. Looks good.
Shame about the code duplication in the kernel defn func now :( (I liked my old short kernel func, but is there no way to write such a code?).
One issue is that we don't actually test all these other unused kernels, so a rewrite such as this can break them and we don't know. I guess they are listed in the docs as not for users anyway.
Could you comment if the PSWF code is capable of eval outside of [-1,1] ? My memory is that it isn't, and that we don't need that even if we use pswf for direct type-3 deconvolution weight calc.
Thanks, Alex
| else if (kf == 3) | ||
| const double expbeta = std::exp(beta); | ||
| return [beta, expbeta](double z) { | ||
| if (std::abs(z) > 1.0) return 0.0; // restrict support to [-1,1] |
There was a problem hiding this comment.
it's a bit sad how much code duplication this PR created. I guess there's no way to have the clean return of 0.0 as in my original l.43 ? Same for the arg which is now repeated several times...
There was a problem hiding this comment.
I think I can write a version with less code duplication, which is more similar in spirit to the current structure. That will have to wait till next week though.
My goal was to minimize the individual functions that are doing the actual evaluation; if we keep the current style, the evaluation function itself will contain the if(kf==<something>) sequence, which felt worse to me.
Some degree of duplication will be unavoidable however, if we want to separate the initialization part (i.r. precomputing the expbeta or setting up the PSWF0) and the evaluation proper - and I think we really want to do that.
| */ | ||
| double pswf(double c, double x); | ||
|
|
||
| /* Class for evaluation of the prolate spheroidal wavefunction |
There was a problem hiding this comment.
I like this doc - thanks!
My personal feeling is that we are close to the point where all the other kernel shapes can be removed - then we don't have to worry about testing them and keeping them up to date. I don't see any reason why anyone should prefer non-PSWF kernels, except for accuracy comparisons of course, but that can always be done by running two different versions of the library against each other. Once the Cuda part has switched to PSWF, I think this should be seriously considered. |
That's correct, the code can't produce anything meaningful outside of If values outside this range are required for type-3 deconvolution (and I honestly don't know if they are), then this patch shouldn't go in. But please note that the current PSWF implementation doesn't support this range extension either yet - some of the ingredients are there, but I think the code needs quite some overhaul to activate them, |
|
THanks for the responses, Martin. If you switch this PR from Draft, I'll bring it in. |
|
Done! Sorry, I was offline yesterday. |
This PR introduces an optimized implementation of the PSWF code.
Advantages:
Open issues: