Skip to content

coll: CSEL redesign#7547

Open
hzhou wants to merge 67 commits intopmodels:mainfrom
hzhou:2508_csel
Open

coll: CSEL redesign#7547
hzhou wants to merge 67 commits intopmodels:mainfrom
hzhou:2508_csel

Conversation

@hzhou
Copy link
Copy Markdown
Contributor

@hzhou hzhou commented Aug 25, 2025

Pull Request Description

  • coll_algorithms.txt catalogs all collective algorithms and conditions
  • coll_selection.json specifies decision tree
  • JSON subtree allows composition and local customization
  • MPIR_CVAR_DUMP_COLL_ALGO_COUNTERS for debug summary

MPIR_CVAR_DUMP_COLL_ALGO_COUNTERS

[0] ==== Dump collective algorithm counters ====
[0]          4  MPIR_Bcast_intra_scatter_ring_allgather
[0]         16  MPIDI_POSIX_mpi_bcast_release_gather
[0]          1  MPIR_Reduce_intra_binomial
[0] ==== END collective algorithm counters ====

[skip warnings]

Discussion

Reference: #7544
Also see comments in #7598 and #7666

image

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou hzhou force-pushed the 2508_csel branch 3 times, most recently from 23d90b4 to 98abcd7 Compare September 2, 2025 19:00
@hzhou hzhou force-pushed the 2508_csel branch 17 times, most recently from bc86294 to e3acd5a Compare September 5, 2025 20:06
@hzhou hzhou mentioned this pull request Apr 8, 2026
4 tasks
hzhou added 6 commits April 8, 2026 21:09
We will use a single-level JSON for algorithm selection including
device-specific algorithms. Remove the collective ADI for now. We'll add
the mechanism of selecting device-level algorithms later.

gen_coll.py is updated to skip calling MPID_ collectives.

Device collective CVARs are removed.
We will add the mechanism of selecting device-layer algorithms later.
Temporarily comment out the composition code that calls netmod/shm
collectives since we will remove these apis next.

Some NULL composition functions are removed.
We will replace the device-algorithm selelction later at MPIR-layer.
The auto selection should take care of restrictions. Error rather than
fallback.

If user use CVAR to select specific algorithm, we should check
restrictions before jumping the the algorithm. We will design a common
fallback handling there.
@hzhou hzhou force-pushed the 2508_csel branch 3 times, most recently from b1bcb29 to b8aa4f6 Compare April 18, 2026 02:24
hzhou and others added 15 commits April 17, 2026 22:41
Replace the impl functions (e.g. MPIR_Bcast_impl etc.) to assemble
coll_sig and call MPIR_Coll_auto.

Note that we need generate nonblocking tag functions such as
MPIR_Iallreduce_tag, introduced in previous commit 7784578 and abc3423,
PR 7648.
Current compositional algorithms call MPIR collectives. We will refactor
them later. But for now, generate a wrapper MPIR functions that calls
_impl functions.
Remove the old routines that are now unused.
Add MPIR_init_coll_sig and MPID_init_coll_sig so we can add arbitrary
attr bits or additional fields without hacking maint/gen_coll.py.
Provide a simple mechanism for a rank to dump collective algorithm
counters.

Set MPIR_CVAR_DUMP_COLL_ALGO_COUNTERS to the global rank of the process
that we want it to dump since it is undesirable for every process to
dump yet it does not always makes sense for rank 0 to dump especially
when we don't always use comm world.

It is counted in the CSEL framework so internal collectives are not
counted when we internally use _fallback algorithms.
A universal nb alglorithm for blocking collectives.
They are replaced by MPIR_Coll_nb.
In coll_algorithms.txt, add "inline" attribute to skip add prototype for
the corresponding algorithm function since it is inlined in the headers.

Add "func_name" to directly specify algorithm function name.

Add "macro_guard" to specify a preproc condition for the algorithm
function. For example, the ch4 posix algorithm function needs be
protected by "#if defined(MPIDI_CH4_SHM_POSIX)" (to be defined).
Add conditional condition - the condition function only can be called
inside preprocess macro guard.

We need generate another header file, coll_autogen.h, that are loaded
after mpidpos.h. "coll_algos.h" goes into mpir_coll.h, which is included
in between mpidpre.h and mpidpost.h.

Refactor a bit so all the conditions parsing logics are wrapped in
functions such as get_conditon_name, get_condition_func, etc. and they
are defined together.
Sometime we may want to do differently between restriction-check and
condition check. For example, algorithm like release_gather normally
gets selelcted only after user calls the collective certain number of
times. But if user selects the algorithm by CVAR, it won't make sense to
do this repeat check in the restriction-check.
Rather than add individual boolean flags, use bit mask "flags" instead.
It is easier to make sure we zero-initialize all the flags that way.
Enable CVARs and JSONs to select ch4-posix layer release_gather
algorithms.

Select MPIDI_POSIX_mpi_bcast_release_gather if it passes
MPIDI_CH4_release_gather condition check, which only passes if comm is
an posix intranode comm.
Extend the previous commit to activate release_gather algorithm for
reduce, allreduce, and barrier.
hzhou and others added 9 commits April 20, 2026 18:13
In MPIDI_POSIX_check_release_gather we check comm's hierarchical fields
to ensure the comm is a node-local comm, i.e. comm->num_external is 1.
Set these fields for subcomms so we can run release_gather checks on
these subcomms.
It is not a fatal condition if a comm is missing hierarchical
information. But it is likely a negligence issue. Thus we add an
assertion so we can catch such case during CI testing. At production
with assertion tuned off, it is okay to just return false in a
hierarchical condition check.
Treat the fail path in check_hierarchy as if it's no_local. This
simplifies lower-layer condition checks since we can always directly
check the hierarchical info.
Remove MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE. It is now replaced
with MPIR_CVAR_COLL_SELECTION_JSON_FILE.

Although we could reuse the same CVAR name, but since we altered the
syntax of JSON, using a different name prevents potential confusion.
Parse the json as a list of named subtrees such as:
{
  "name=main": {...},
  "name=bcast-intra-auto": {...},
  ...
}

Inside the subtree, we can refer to the named subtree using "call=name".

If the json does not contain named subtrees, treat it as a single tree
with the name "main".
Load src/mpi/coll/coll_selection.json as named subtrees.

Add MPIR_Coll_run_tree which runs the selection on a subtree.

Replace MPIR_Coll_auto with MPIR_Coll_json, and add
MPIR_Coll_run_tree(csel_tree_auto, coll_sig) to allow recursive
algorithms such as compositional algorithms.

csel_tree_auto will fallback to csel_tree_main if it is not defined in
the json file. But similarly, we can easily introduce more predefined
subtree later, e.g. bcast-intra-auto etc.

In CVAR selection, the "auto" should be default and value should be 0.
Thus it should automatically fallthrough and run on csel_tree_main.
@hzhou
Copy link
Copy Markdown
Contributor Author

hzhou commented Apr 20, 2026

test:mpich/ch4/most
test:mpich/ch3/most
test:mpich/ch4/ofi/more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant