Skip to content

prov/efa: Refactor AV to split it for efa/efa-direct#12222

Open
a-szegel wants to merge 16 commits intoofiwg:mainfrom
a-szegel:refactor-av-component
Open

prov/efa: Refactor AV to split it for efa/efa-direct#12222
a-szegel wants to merge 16 commits intoofiwg:mainfrom
a-szegel:refactor-av-component

Conversation

@a-szegel
Copy link
Copy Markdown
Contributor

@a-szegel a-szegel commented May 6, 2026

Refactor EFA AV layer into base and protocol components

What

Splits the EFA provider's AV layer into base (efa_av) and protocol (efa_proto_av) components, mirroring the existing pattern used elsewhere in the provider. Does the same split for efa_ah. Deletes efa_conn.{c,h}.

Why

Three interrelated design decisions drive this refactor:

  1. Split AV into base + protocol. struct efa_av carried implicit-AV fields (LRU list, implicit util_av, evicted_peers hashset) that efa-direct never uses. struct efa_av_entry embedded struct efa_conn, which bundled RDM-specific concerns (AH lifetime, peer map, SHM insertion, implicit-to-explicit migration). After this refactor, the efa-direct data path uses minimal base structs; all RDM-specific state lives in efa_proto_* wrappers that container_of-embed the base types.

  2. Delete efa_conn entirely. Once struct efa_proto_av_entry exists as the RDM-side AV entry, struct efa_conn is redundant — it was always just"the RDM-specific tail" of struct efa_av_entry. Keeping both would leave the codebase with two types representing the same thing. The cleaner resolution is to absorb efa_conn's responsibilities into efa_proto_av_entry (for protocol behaviors) and efa_av_entry (for base behaviors), then remove efa_conn.c/h outright. struct efa_rdm_peer's backing pointer is retyped from efa_conn * to efa_proto_av_entry * in the same change.

  3. Split AH along the same base/protocol lines. struct efa_ah carried protocol-only state: split implicit/explicit refcounts, an
    implicit_conn_list, and an LRU-list entry for AH eviction. None of this matters on the efa-direct path. Worse, efa_ah.c iterated struct efa_proto_av_entry for its eviction logic — a layering violation where the base AH translation unit had a hard dependency on protocol-layer types. The fix is to shrink struct efa_ah to a single unified refcount and wrap it in a new struct efa_proto_ah that holds all the protocol concerns, then move the eviction logic out of efa_ah.c into efa_proto_av.c.

Testing

Every commit individually validated with --enable-debug --enable-asan --enable-efa-mem-poisoning plus the cmocka unit-test suite. 13 new unit tests added covering insert/lookup/remove cycles, reverse-AV behaviors, implicit-to-explicit migration, QPN reuse, and error paths.

a-szegel added 16 commits May 6, 2026 17:49
Add a shared helper that checks whether an efa_ep_addr has a non-zero
GID. This validation is needed by both the base AV insert path
(efa_av.c) and the protocol AV insert path (efa_proto_av.c) to reject
all-zero addresses. Placing it in efa_av.h alongside the efa_ep_addr
definition avoids duplicating the check in both files.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Add rdm/efa_proto_av.h containing the struct definitions for the
protocol AV layer. The header defines efa_proto_av_entry as a flat
layout AV entry with ep_addr, ah, fi_addr, implicit_fi_addr, peer map,
shm_fi_addr, implicit-AV LRU linkage, AH implicit-conn linkage, and a
back-pointer to the owning AV. efa_proto_av embeds struct efa_av as
its first member so a base-layer efa_av pointer can be recovered via
container_of. efa_proto_av_entry_ep_peer_map_entry describes a single
(endpoint, peer) mapping stored on each entry's per-entry peer hash.

Header-only commit — no implementation is added here. The structs are
referenced by the efa_proto_av.c implementation added in a subsequent
commit, and the protocol-AV-aware peer switch several commits later
changes efa_rdm_peer's backing object from struct efa_conn * to
struct efa_proto_av_entry *. Introducing the types ahead of their
first use lets each later commit focus on a single concern.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
efa_av_init_util_av currently hard-codes util_attr.context_len to
sizeof(struct efa_av_entry) - EFA_EP_ADDR_LEN. Add a context_len
parameter and update the sole caller to pass this value explicitly.

This prepares for efa_proto_av_open (added in a later commit) to
call efa_av_init_util_av with a different context_len reflecting
struct efa_proto_av_entry's size.

No behavior change.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Introduce two base-layer helpers on struct efa_av_entry so callers no
longer need to reach through the embedded struct efa_conn to get at
the entry's identity or its ep_addr byte array.

efa_av_addr_to_entry wraps the existing efa_av_addr_to_conn with a
container_of so it returns the enclosing efa_av_entry directly.
efa_av_entry_ep_addr returns a typed struct efa_ep_addr * view over
the ep_addr[] byte array stored at the front of the entry. Both are
inline. Callers in the base path can use these helpers today; more
importantly, when the protocol-AV variant struct efa_proto_av_entry
is added later it will have the same leading byte array but no conn
pointer, so the byte-array accessor makes it possible to share
reverse-AV helpers between the two entry types without a type shim.

This commit also promotes efa_av_init_util_av from file-local to a
public declaration in efa_av.h so the protocol AV implementation,
added in a subsequent commit, can call it.

Purely additive — no existing callers are migrated in this commit.
Subsequent commits update base-path callers to use the new helpers.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Update efa_msg.c, efa_rma.c, efa_base_ep.c, and efa_domain.c to use
the new efa_av_addr_to_entry() / efa_av_entry_ep_addr() helpers
instead of efa_av_addr_to_conn() / conn->ep_addr.

Reads are still routed through the embedded conn for ah
(av_entry->conn.ah) because struct efa_conn is still embedded in
efa_av_entry. A later commit strips the conn embedding and collapses
these accesses to av_entry->ah directly.

Mechanical rename; no behavior change.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
…av_entry *

The reverse-AV helpers previously took struct efa_conn *. Change them
to take struct efa_av_entry * so they can be shared across the base
(efa-direct) and protocol (RDM, added in a later commit) layers
without a type shim.

Update internal field reads to go through the byte-array accessor
efa_av_entry_ep_addr() rather than conn->ep_addr pointer deref.
The ep_addr pointer and the byte array reference the same storage,
so behavior is unchanged — but efa_av_entry_ep_addr() also works
for struct efa_proto_av_entry (the protocol's flat layout variant
added later), which lacks an ep_addr pointer field.

Store struct efa_av_entry * in the efa_cur_reverse_av /
efa_prv_reverse_av structs instead of struct efa_conn *.

Update callers in efa_av.c and efa_conn.c using container_of() to
pass the enclosing efa_av_entry when they only have a conn pointer.

No behavior change.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Add the full efa_proto_av.c implementation as a single unit. Protocol
AV concentrates all RDM-specific AV concerns in one file: implicit
AV insert and release, implicit-to-explicit migration, per-endpoint
peer map, implicit-AV LRU tracking, SHM AV insertion, connid-aware
reverse lookup, and peer allocation tied to entry lifetime. Moving
these responsibilities out of efa_av.c / efa_conn.c is what lets the
base AV shrink to an efa-direct-only layer later in this series.

This commit drops the file in but deliberately leaves it unwired.
fi_ops_domain_rdm.av_open still points at efa_av_open, so nothing in
the RDM domain path reaches any code in efa_proto_av.c yet. Functions
are given external linkage so the library still links. A later commit
registers efa_proto_av_open as the RDM domain's av_open, switches
struct efa_rdm_peer's backing object from struct efa_conn to struct
efa_proto_av_entry, and updates all RDM callers in one atomic step.

One interim-state detail: the newly added pke peer-fi_addr callback
reads peer->conn->fi_addr because struct efa_rdm_peer still carries a
struct efa_conn * at this point. When the peer switch lands, that
single expression becomes peer->av_entry->fi_addr along with the rest
of the peer-type change.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
efa_rdm_ep_destroy_buffer_pools tests whether a peer's explicit fi_addr
is populated; if not, falls through to the implicit AV. The test was
written with FI_ADDR_UNSPEC, which is semantically "unspecified peer
address" (used by e.g. fi_recvfrom for any source) rather than
"this slot has no valid fi_addr".

FI_ADDR_UNSPEC and FI_ADDR_NOTAVAIL are both defined as (uint64_t)-1
in fabric.h, so the runtime behavior is identical. This is a naming
correction, not a behavior change.

Every other site in the provider that tests the same
peer->conn->fi_addr / ->implicit_fi_addr condition uses
FI_ADDR_NOTAVAIL (e.g. efa_rdm_cq.c, efa_proto_av.c, efa_av.c).
Bring this site in line for consistency.

No behavior change.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
…o_av_entry

This is the atomic switch that moves the RDM protocol layer off of
struct efa_conn and onto struct efa_proto_av_entry. Because the peer's
backing object changes type and every caller that reaches through a
peer pointer must be updated in lockstep, splitting this work across
multiple commits would leave the tree uncompilable between them.

At the domain level, fi_ops_domain_rdm.av_open is set to
efa_proto_av_open so RDM domains instantiate an efa_proto_av instead
of the base efa_av. A proto_av field is added to struct efa_rdm_ep
and populated in efa_rdm_ep_bind via container_of(av, struct
efa_proto_av, efa_av); caching it on the endpoint avoids repeating
the container_of on the hot path.

struct efa_rdm_peer's backing object changes from struct efa_conn *
to struct efa_proto_av_entry *. The field is renamed from conn to
av_entry to reflect the new type, and efa_rdm_peer_construct's
signature is updated to take the new pointer. All RDM sources that
dereferenced peer->conn are updated to peer->av_entry: efa_rdm_cq.c,
efa_rdm_msg.c, efa_rdm_ope.c, efa_rdm_pke*.c, efa_rdm_util.c,
efa_domain.c, and efa_rdm_ep_utils.c. The efa_rdm_ep_get_peer_*
helpers in efa_rdm_ep_utils.c are migrated from efa_av_addr_to_conn /
conn-based peer-map lookup to efa_proto_av_addr_to_entry and
efa_proto_av_entry_ep_peer_map_lookup.

efa_rdm_ep_get_explicit_shm_fi_addr moves out of efa_rdm_ep.h into
efa_rdm_ep_utils.c. Keeping the accessor inline in the header would
force efa_rdm_ep.h to include efa_proto_av.h, and efa_proto_av.h
depends transitively on efa_rdm_peer.h; an out-of-line definition
keeps the include graph acyclic.

The lone pke callback introduced with efa_proto_av.c that still read
peer->conn->fi_addr is flipped to peer->av_entry->fi_addr as part of
this commit — it is the single expression that had to wait for the
peer-type switch to land. efa_rdm_ep_destroy_buffer_pools similarly
migrates to the efa_proto_av_entry peer-map entry type.

efa_ah_implicit_av_evict_ah in efa_ah.c is updated to iterate
efa_proto_av_entry on the AH LRU list and call
efa_proto_av_entry_release_ah_unsafe. This function is relocated to
efa_proto_av.c entirely in the subsequent AH-layering commit; the
update here keeps the intermediate state functional.

Test mocks and the efa AV test file are updated for the new peer
field name, the new efa_rdm_peer_construct signature, and the new
peer-map entry type. test_av_verify_av_hash_cnt gains a struct
efa_proto_av * parameter so it can reach both the base efa_av hash
counters and the protocol-only implicit-AV fields through a single
call.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
efa_ah_alloc acquires util_domain.lock before looking up the AH in the
domain's hash map. If the subsequent malloc for a new efa_ah fails, the
function sets errno to FI_ENOMEM, logs a warning, and returns NULL
without releasing the lock. The caller has no way to know the lock is
still held, so every subsequent operation that touches util_domain.lock
(including the caller's own cleanup path and any other thread trying to
allocate an AH) deadlocks or UBs.

Release util_domain.lock on the malloc-failure path before returning
NULL, matching the two other failure paths in the function that do the
unlock via err_free_efa_ah.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
efa_ah.c reached into protocol-layer types to run AH eviction on
ENOMEM: it iterated an AH's implicit_conn_list as struct
efa_proto_av_entry and called efa_proto_av_entry_release_ah_unsafe to
drop the associated AV entries. Those types are defined in
rdm/efa_proto_av.h, which efa_ah.c picks up transitively through
efa.h; the base AH translation unit should not be aware of them.
The circularity shows up in practice whenever the protocol layer
grows a new concern — every change to efa_proto_av_entry forces a
recompile of efa_ah.c and risks dragging protocol-only behavior into
the efa-direct build.

The fix splits AH allocation into a base path and a protocol path.
efa_ah_alloc stays in efa_ah.c but returns NULL on ENOMEM without
attempting eviction; its only responsibilities are now creating the
ibv_ah, populating gid/ahn/refcnt, and inserting into the domain's
AH hash. The pre-existing EINVAL diagnostic (pretty-printing local
and remote GIDs plus the PD pointer for the common cross-AZ /
invalid-GID / invalid-PD causes) is preserved in the base path. A
new efa_proto_ah_alloc is added in efa_proto_av.c. It calls
efa_ah_alloc and, on ENOMEM, invokes efa_proto_ah_evict to release
the AH at the head of the LRU whose explicit_refcnt is zero, then
retries. The eviction logic itself is lifted wholesale from efa_ah.c
into efa_proto_av.c, where it belongs alongside the other
implicit-AV bookkeeping.

efa_ah_destroy_ah is promoted from a file-local function prototype to
a public declaration in efa_ah.h so efa_proto_ah_evict can call it
after dropping the last reference. efa_ah.c no longer references
efa_proto_av_entry.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
The efa_ah struct contained protocol-specific fields (implicit_refcnt,
explicit_refcnt, implicit_conn_list, domain_lru_ah_list_entry) that
are only used by the RDM protocol path for implicit AV management and
AH LRU eviction. The efa-direct path only needs the ibv_ah, GID, AHN,
and a reference count. This commit separates them following the same
pattern used for efa_av / efa_proto_av.

The base efa_ah struct is stripped to: gid, ibv_ah, ahn, refcnt, and
the hash handle. The split implicit/explicit refcounts are replaced
with a single refcnt. efa_ah_alloc now takes an alloc_size parameter
instead of bool insert_implicit_av, allowing the protocol layer to
allocate a larger efa_proto_ah. efa_ah_release no longer takes a bool
parameter — it simply decrements refcnt and destroys at zero. The
base layer has no knowledge of LRU lists, implicit AV entries, or
eviction.

A new efa_proto_ah struct is added to efa_proto_av.h, embedding
efa_ah as its first member (castable via container_of). It adds the
protocol-specific fields: implicit_refcnt, explicit_refcnt,
implicit_conn_list (list of implicit AV entries using this AH), and
lru_list_entry (position in the domain's AH LRU list).

efa_proto_ah_alloc in efa_proto_av.c allocates sizeof(efa_proto_ah)
via efa_ah_alloc, initializes the protocol fields for new AHs,
manages the implicit/explicit refcount split, maintains the LRU list,
and handles ENOMEM with eviction retry. efa_proto_ah_release
decrements the appropriate protocol refcount and removes from the LRU
list when both reach zero before calling efa_ah_release to decrement
the base refcount.

The efa_proto_ah_lru_move function (formerly efa_ah_implicit_av_lru_ah_move
in efa_ah.c) is now static in efa_proto_av.c, operating on the
efa_proto_ah lru_list_entry. The efa_proto_ah_from_ah inline accessor
is added to efa_proto_av.h for converting base efa_ah pointers to
their protocol wrapper.

Update efa_conn.c and efa_av.c to access the moved fields
(implicit_refcnt, explicit_refcnt, implicit_conn_list) through
efa_proto_ah_from_ah(). These files contain the old implicit-AV code
paths that remain alive until the next commit strips efa_av and
deletes efa_conn. Update efa_ah_alloc/release callers in these files
for the new signatures.

Split the efa_ah_cnt_av_impl test helper into two path-specific
variants: efa_ah_cnt_av_efa_impl uses efa_proto_ah_from_ah() to
check the protocol refcnts (implicit/explicit), while
efa_ah_cnt_av_efa_direct_impl checks the base efa_ah refcnt
directly. The combined helper would have dereferenced efa_proto_ah
fields past the end of a base-sized efa_ah allocation on the
efa-direct path; splitting the helpers removes the runtime
branching and the out-of-bounds access.

Test mocks are updated for the new efa_ah_alloc (size_t alloc_size)
and efa_ah_release (no bool) signatures.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
With every protocol-specific AV concern now living in efa_proto_av.c,
remove the remnants from the base AV layer. efa_av.h and efa_av.c are
reduced to what the efa-direct path actually needs: the efa_av struct
now contains only domain, used, type, cur_reverse_av, prv_reverse_av,
and util_av. All implicit-AV fields, LRU tracking, peer-map logic,
SHM AV handling, and implicit reverse-lookup functions are gone from
this file.

struct efa_av_entry becomes a flat 48-byte record of ep_addr[32], a
struct efa_ah *ah, and fi_addr_t fi_addr. struct efa_conn is no
longer embedded, and the base path's callers switch from
av_entry->conn.ah to av_entry->ah directly. The _Static_assert that
previously tied efa_av_entry.conn.ah and efa_proto_av_entry.ah to a
common offset is updated to reference efa_av_entry.ah directly.

Two new static helpers efa_av_entry_init and efa_av_entry_release
replace the old efa_conn_alloc / efa_conn_release paths for the
base-only lifecycle. efa_av_insert_one and efa_av_insert lose their
external linkage because they are only called from within efa_av.c
now (RDM uses efa_proto_av_insert_one).

Drop efa_conn.c and efa_conn.h from prov/efa/Makefile.include and
libfabric.vcxproj. The files themselves are left on disk and deleted
in a follow-up commit so this commit is strictly a functionality
change and the next one is a pure file deletion. Leaving the files
out of the build list means their compile errors against the new
efa_av / efa_av_entry shapes are harmless — nothing includes or
compiles them from this commit forward.

Update efa_ah.c to drop the now-unused #include "efa_conn.h". Update
the efa_unit_test_proto_av.c test file to call efa_proto_av_reverse_lookup
in place of the deleted efa_av_reverse_lookup_rdm.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
The previous commit removed the last references to efa_conn from the
tree (both as source and as a build target). The files are no longer
compiled or included by anything. Delete them.

Pure file deletion — no other changes.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Split efa_unit_test_av.c into two files:
- efa_unit_test_av.c: base AV tests (efa-direct, AH counting, EP binding)
- efa_unit_test_proto_av.c: protocol AV tests (implicit AV, LRU, peer map,
  reverse lookup, implicit-to-explicit migration, AH eviction)

The new efa_unit_test_proto_av.c also contains the body of
test_av_reverse_av_remove_qpn_collision, whose declaration and
registration were added earlier but whose implementation had not
been defined. Co-locating it with the other protocol AV tests
keeps declaration, registration, and definition in sync.

Update prov/efa/Makefile.include to build the new file.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Add thirteen new unit tests that exercise the refactored AV code
paths. The existing suite focused mostly on AH refcount and
endpoint-binding behavior; these tests add direct coverage of entry
insert/remove/lookup, protocol-AV-only behaviors, and a handful of
error paths that previously had no assertions.

Base AV coverage adds test_av_insert_remove_lookup_efa_direct, which
walks a full insert/lookup/remove cycle against the base efa_av, and
test_av_base_addr_to_entry_invalid, which confirms that
efa_av_addr_to_entry returns NULL for FI_ADDR_NOTAVAIL and
FI_ADDR_UNSPEC inputs.

Protocol AV coverage exercises lookup and lifetime of
efa_proto_av_entry: test_av_proto_reverse_lookup_explicit covers
reverse lookup after insert and remove, test_av_proto_prv_reverse_av
covers QPN reuse falling through to prv_reverse_av,
test_av_proto_addr_to_entry_after_remove checks the validity bit
(ah != NULL) after a remove, test_av_proto_insert_remove_with_peer
drives the per-endpoint peer-map lifecycle alongside insert and
remove, test_av_proto_batch_insert covers a multi-address
fi_av_insert, and test_av_proto_remove_nonexistent covers the error
path for a fi_addr that was never inserted.

Implicit-AV behavior adds test_av_implicit_to_explicit_peer_updated,
which verifies that migrating a peer from the implicit AV to the
explicit AV preserves peer and AH identity;
test_av_implicit_av_unbounded, which confirms the implicit AV grows
without an explicit cap; and test_av_implicit_av_lru_move_single,
which exercises LRU move when only a single implicit entry exists.

Input validation adds test_av_proto_insert_invalid_address for
inserts of all-zero endpoint addresses and
test_av_proto_open_unsupported_attrs for fi_av_open with attributes
the protocol AV does not accept.

Signed-off-by: Seth Zegelstein <szegel@amazon.com>
@a-szegel a-szegel requested a review from a team May 7, 2026 03:06
Comment thread prov/efa/src/efa_av.h Outdated
* @param[in] addr address to check
* @return non-zero if valid, 0 if all-zeros
*/
static inline int efa_av_is_valid_address(struct efa_ep_addr *addr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message says "add" but it's really being moved

* ep_addr[32] off=0 — TX hot (qpn@+16, qkey@+20)
* ah* off=32 — TX hot (EFA send path)
* fi_addr off=40 — RX hot (explicit peer lookup, CQ poll)
* implicit_fi_addr off=48 — RX hot (implicit peer lookup, CQ poll)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit AV entries are not considered hot path. In the hot path, the application inserts the peer into the AV

I think switching implicit_fi_addr and shm_fi_addr would be better

* efa_av.prv_reverse_av (off=32) — HASH_FIND fallback for QPN reuse (connid mismatch)
* These are in cacheline 0 — explicit peer reverse lookup stays in one line.
*
* RX hot path for implicit (unknown) peers:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Let's not call the implicit peer path "hot"

/**
* @brief Protocol AV entry — flat layout with same field prefix as efa_av_entry
*
* pahole:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layout is nice to have in the comments but we must make sure the comment is up to date

return NULL;

entry = (struct efa_proto_av_entry *)util_av_entry->data;
return entry->ah ? entry : NULL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be assert(entry->ah)? Based on the util_av removal code below, you set entry->ah to NULL after calling ofi_av_remove_addr which calls ofi_ibuf_free. So we shouldn't ever have the case where ofi_bufpool_ibuf_is_valid returns true but ah is NULL

#include "efa_data_path_ops.h"
#include "ofi_util.h"
#include "efa_av.h"
#include "rdm/efa_proto_av.h"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rdm/ prefix shouldn't be necessary because this file is also in rdm/ folder. Also happens in multiple other files.

* ah_implicit_conn_list_entry off=88 — implicit AV insert/release
* av* off=104 — back-pointer for AH eviction
*/
struct efa_proto_av_entry {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the prefix efa_proto? Both the MR and counter refactors used efa_rdm as the prefix 8dd874e 251031c

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rdm stands for reliable datagram, and all of efa is reliable datagram.... Its a terrible name, and I think that a pure rename change %s/rdm/proto won't go so well... so I want to do it in bits and pieces.

Copy link
Copy Markdown
Contributor

@shijin-aws shijin-aws May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will go well if we do s/efa_rdm/efa_proto finally, AFAIU we don't have such combo beyond the code


struct efa_rdm_ep {
struct efa_base_ep base_ep;
struct efa_proto_av *proto_av; /* set during fi_ep_bind, avoids container_of on hot path */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment about prefix. rdm_av would be consistent with the previous refactors.

* The protocol extension fields start at offset 88 (cacheline 1), so
* accessing them on the eviction path does not pollute the TX cache line.
*/
struct efa_proto_ah {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment about prefix. efa_rdm_ah would be better.

Comment thread prov/efa/test/efa_unit_test_av.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants