Skip to content

storeChunk: Workaround for bug with parallel flushing#5510

Merged
PrometheusPi merged 3 commits intoComputationalRadiationPhysics:devfrom
franzpoeschel:fix-span-parallel-flushing
Mar 23, 2026
Merged

storeChunk: Workaround for bug with parallel flushing#5510
PrometheusPi merged 3 commits intoComputationalRadiationPhysics:devfrom
franzpoeschel:fix-span-parallel-flushing

Conversation

@franzpoeschel
Copy link
Copy Markdown
Contributor

@franzpoeschel franzpoeschel commented Oct 17, 2025

Workaround for this bug: openPMD/openPMD-api#1794

I think that this has little importance for PIConGPU since we process Iterations collectively anyway, making it unlikely to happen.
But I did stumble over this issue a while ago. Of course I did not document it and have no reproducer at hand now…

TODO:

  • Check if other places are affected. I think not, only the particles may have zero contributions from some ranks only.
  • Check for a reproducer.. I guess replacing .writeIterations() with .snapshots() and using WRITE_RANDOM_ACCESS might trigger the issue, but in that case it would be restricted to dev versions of openPMD, hence not so important.

Reproducer: Either find a simulation that does not write particles on some rank, or fake it:

diff --git a/include/picongpu/plugins/openPMD/writer/ParticleAttribute.hpp b/include/picongpu/plugins/openPMD/writer/ParticleAttribute.hpp
index 6435b6e86..ae35be4f3 100644
--- a/include/picongpu/plugins/openPMD/writer/ParticleAttribute.hpp
+++ b/include/picongpu/plugins/openPMD/writer/ParticleAttribute.hpp
@@ -155,7 +155,11 @@ namespace picongpu
                     float_X const timeOffset = 0.0;
                     record.setAttribute("timeOffset", timeOffset);
                 }
-                if(elements == 0)
+                DataConnector& dc = Environment<>::get().DataConnector();
+                GridController<simDim>& gc = Environment<simDim>::get().GridController();
+                uint64_t mpiSize = gc.getGlobalSize();
+                uint64_t mpiRank = gc.getGlobalRank();
+                if(elements == 0 || mpiRank != 0)
                 {
                     // accumulateWrittenBytes += 0;

Then mpirun -n 2 picongpu -g 192 192 192 -d 1 2 1 --openPMD.period 100:100 -s 100 --openPMD.ext h5 e.g. within a LaserWakefield simulation.

@franzpoeschel franzpoeschel marked this pull request as draft October 29, 2025 12:42
@psychocoderHPC psychocoderHPC added this to the 0.9.0 / next stable milestone Jan 14, 2026
@psychocoderHPC psychocoderHPC added the bug a bug in the project's code label Jan 14, 2026
@franzpoeschel franzpoeschel force-pushed the fix-span-parallel-flushing branch from 6a4f325 to 250408d Compare March 13, 2026 13:59
@franzpoeschel
Copy link
Copy Markdown
Contributor Author

Alright, it seems we do need to merge this after all, HDF5 became a lot more painful about collective metadata setup as of HDF5 2.0, see openPMD/openPMD-api#1862.
Trying to fix this within openPMD, but for now this workaround is fine for PIConGPU.

@franzpoeschel franzpoeschel marked this pull request as ready for review March 13, 2026 14:01
@PrometheusPi PrometheusPi merged commit 8d018f7 into ComputationalRadiationPhysics:dev Mar 23, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug a bug in the project's code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants