Skip to content

Using Linkcells to accelerate the NL#1403

Open
Iximiel wants to merge 5 commits into
plumed:masterfrom
Iximiel:feature/combineNLandLC
Open

Using Linkcells to accelerate the NL#1403
Iximiel wants to merge 5 commits into
plumed:masterfrom
Iximiel:feature/combineNLandLC

Conversation

@Iximiel
Copy link
Copy Markdown
Member

@Iximiel Iximiel commented Apr 24, 2026

Description

Hello, I think this is the last (or at least the latest) PR in the NL series.

Fist of all I need to do some extra modification to the code (like compacting and move the new template function in the LC header, and rebasing this on the modifications of #1401, if it gets accepted), but I wanted to show the current state of this to get some feedback.

I tried to combine linkCells with the standard NL algorithm, to see if things will be speed up.

Along the way I rationalized the LinkCells loops into a template function and I tried to parallelize the LC version of the NL.

Here a few graphs, they follow the #1401 nomenclature and I used the same commands **-v is this PR, and **-noomp is this PR with the openmp code for the linkcell commented out.

These are the results on the NL algorithm:

nllc_plot_NL

The LC-NL "linearizes" the time and, contrary to LC -see below-, this seems to benefit from the omp parallelization.
As we were predicting, you can see the old implementation start faster but scales worse than the new implementation. So I was thinking of keeping the two implementations and decide when use one or another, with the default behavior derived from some measurements that can be overridden by the user input.


These are the timing on LC the -v has openmp, the -noomp has the new interface but no openmp

nllc_plot_LC

I do not think that adding openmp is a good idea to LC, but that was not the point o these modifications. Maybe because it is working directly in the result array.

Target release

I would like my code to appear in release v2.11

Type of contribution
  • changes to code or doc authored by PLUMED developers, or additions of code in the core or within the default modules
  • changes to a module not authored by you
  • new module contribution or edit of a module authored by you
Copyright
  • I agree to transfer the copyright of the code I have written to the PLUMED developers or to the author of the code I am modifying.
  • the module I added or modified contains a COPYRIGHT file with the correct license information. Code should be released under an open source license. I also used the command cd src && ./header.sh mymodulename in order to make sure the headers of the module are correct.
Tests
  • I added a new regtest or modified an existing regtest to validate my changes.
  • I verified that all regtests are passed successfully on GitHub Actions.

}
double value=modulo2(distance);
if(value<=d2) {
//neighbors_.push_back({A,B});
Comment thread src/tools/NeighborList.cpp Fixed
Comment on lines +299 to +302
//const unsigned elementsPerRank = std::ceil(double(nc)/stride);
const unsigned int start=0;// rank*elementsPerRank;
const unsigned int end = nc;//((start + elementsPerRank)< nc)?(start + elementsPerRank): nc;
//Initialization of List A and B is here beausue the access to them is threadsafe (at the moment of writing this)
@Iximiel Iximiel force-pushed the feature/combineNLandLC branch from 6ab15f8 to dc376ac Compare April 24, 2026 14:45
@Iximiel Iximiel force-pushed the feature/combineNLandLC branch from dc376ac to aa4dd63 Compare May 7, 2026 13:52
@Iximiel Iximiel force-pushed the feature/combineNLandLC branch from 0b3036c to 948aadb Compare May 7, 2026 13:55
@GiovanniBussi
Copy link
Copy Markdown
Member

Let's leave this dangling.

This is providing speedup in some corner cases, but in order to use it properly we should either (a) implement proper heuristic or (b) allow the user to control it with an env var or better from the input file.

@Iximiel
Copy link
Copy Markdown
Member Author

Iximiel commented May 11, 2026

Here I made some measurements, as usual -v is this branch, with 125 atoms this is slower that the plan NL with the simple cubic, with more atoms it starts to scale more linearly, thanks to the LinkCells:
NLwithLC

and the histogram to confront the results:
NLwithLCHisto

In the last commit I added a environmental variable to control this and the architecture in the NL to add this option, I added a toggle method to not add any extra element to the constructor, since changing the option has no impact on the setup, and it will be easier to implement when the NL is used

@Iximiel Iximiel marked this pull request as ready for review May 12, 2026 06:48
@Iximiel
Copy link
Copy Markdown
Member Author

Iximiel commented May 12, 2026

@GiovanniBussi the nvhpc test and the pycv failed on checking out the repo, I think that rerunnning them will make them pass

I do not understand what is happening to the mac tests

@GiovanniBussi
Copy link
Copy Markdown
Member

@Iximiel only 5 tests are crashing. Are the tests using any of the features you implemented here? There might be numerical issues. Differences seem significant though (I quickly skimmed them)

@Iximiel
Copy link
Copy Markdown
Member Author

Iximiel commented May 12, 2026

I do not understand why they pass on linux and intel and not on mac
they are all about the neighbor list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants