Skip to content

prov/lnx: add FI_MSG and FI_RMA support#12209

Open
aingerson wants to merge 13 commits intoofiwg:mainfrom
aingerson:lnx2
Open

prov/lnx: add FI_MSG and FI_RMA support#12209
aingerson wants to merge 13 commits intoofiwg:mainfrom
aingerson:lnx2

Conversation

@aingerson
Copy link
Copy Markdown
Contributor

This is on top of the refactor in #12188
Fixes various bugs in lnx in addition to adding FI_MSG and FI_RMA support. Opening up for CI testing and initial comments but this is not finalized. There are still some lingering holes (for example supporting FI_MR_VIRT_ADDR properly)

@aingerson
Copy link
Copy Markdown
Contributor Author

@amirshehataornl @jfillers FYI this is what I have so far

aingerson added 13 commits May 7, 2026 08:02
This patch doesn't include any functional changes.
Just cleans up the code in various ways:

- rename lnx_ops.c to lnx_srx.c to include only shared receive code
- move tag ops to lnx_msg.c to prepare for adding more functionality
while not having lnx_ops.c get too out of hand
- reduce extern functions declared in lnx.h/add static to appropriate
functions
- remove unused functions and definitions
- add missing definition of LNX_SUB_ID_BITS instead of hardcoded value
- standardize function declaration formatting
- fix formatting changes (line length, alignment, trailing whitespace)
- remove a few unnecessary comments
- cleanup unnecessary headers

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
…cess

Consolidate environment variables into a single struct that is initialized
in one place on getinfo. This helps the code organization and makes the
environment variables easily findable.

This also removes redundant environment variable look ups (like the lookup
happening on everything single fi_av_insert).

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
fi_mr_test requires FI_RMA which lnx does not support yet

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
LNX has limitations in capabilities (for example, does not support
FI_RMA, FI_RMA, etc). Even if the linked providers all support a
requested capability, it does not mean the lnx provider can be used.
We need to check against the supported lnx capabilites and properly
return -FI_ENODATA if the application requested something the provider
does not support. This adds a call to ofi_check_info and updates the
lnx capabilities to the correct subset of supported capabilites for
validation. It also modifies the shared tx/rx ctx attributes since
lnx does not support those as well as the mr mode because lnx does
not require FI_MR_RAW.

In addition to the improper checking, lnx was not setting the returned
info->caps, tx_attr->caps, and rx_attr->caps to the application and
always returning 0 for all capabilities. This also adds checking of
linked provider capabilites during generation to properly set the
returned capabilities to the application.

Request FI_PEER and FI_AV_USER_ID for linked providers. Support for
the peer API is required to be linked together using the lnx provider.

Switch lnx_generate_link_info params to match convention of (input, output)

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
The current method for registering MRs with the core providers
works if there is only one domain or if the domains can somehow
use each other's keys but the keys for the domains could be different
and, since there is only one stored core mr fid, lnx will always use
the mr fid from the first domain it was used on.

Change the core mr fids into an array so we can register on every
domain. The ep/domain will contain the index so we can make sure
to register it on and return the correct core fid.

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
lnx was just taking the first iov/descriptor but advertising
support for multiple IOVs.
Support for multiple IOVs requires translating the array of descriptors
into an array of core provider descriptors

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
We shouldn't be relying on global resources. There's no reason
to have the entries come from different locations. We can just
use the lep receive bufpool. We also don't need a separate
lock for accessing the bufpool; we can just use the util_ep lock
which has the bonus of being able to be optimized out when not
necessary

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
Add support for the FI_RMA APIs. This is done by requiring
FI_MR_RAW if FI_RMA support is requested. The keys for all
underlying core providers are stored in an array (accessed
by domain index) so the application key is 8 * num_domains
(thus requiring the larger key).
The app will exchange the raw key and then map it on the
remote side to get a local uint64_t key for use in the RMA
calls. This key will be a pointer to an internal structure
(lnx_mr_key) which will hold all the core provider keys for
use in the actual RMA calls

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
Add FI_MSG support by using (existing) regular message queues

Consolidates and refactors some code to be used in both sets of
functions

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
Ubertest was sometimes skipping the MR raw attr/map steps for FI_MR_RAW
causing a map failure with providers that required the mapping

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
This lets the fi_av_xfer test pass which was failing on reinsert
because the buffer was already allocated and could not be allocated
for the re-insert

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
Remove exclusions for tests now valid with addition of FI_MSG and FI_RMA
Add fi_ubertest configurations for new functionality

Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant