prov/efa: Refactor AV to split it for efa/efa-direct#12222
prov/efa: Refactor AV to split it for efa/efa-direct#12222a-szegel wants to merge 16 commits intoofiwg:mainfrom
Conversation
Add a shared helper that checks whether an efa_ep_addr has a non-zero GID. This validation is needed by both the base AV insert path (efa_av.c) and the protocol AV insert path (efa_proto_av.c) to reject all-zero addresses. Placing it in efa_av.h alongside the efa_ep_addr definition avoids duplicating the check in both files. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Add rdm/efa_proto_av.h containing the struct definitions for the protocol AV layer. The header defines efa_proto_av_entry as a flat layout AV entry with ep_addr, ah, fi_addr, implicit_fi_addr, peer map, shm_fi_addr, implicit-AV LRU linkage, AH implicit-conn linkage, and a back-pointer to the owning AV. efa_proto_av embeds struct efa_av as its first member so a base-layer efa_av pointer can be recovered via container_of. efa_proto_av_entry_ep_peer_map_entry describes a single (endpoint, peer) mapping stored on each entry's per-entry peer hash. Header-only commit — no implementation is added here. The structs are referenced by the efa_proto_av.c implementation added in a subsequent commit, and the protocol-AV-aware peer switch several commits later changes efa_rdm_peer's backing object from struct efa_conn * to struct efa_proto_av_entry *. Introducing the types ahead of their first use lets each later commit focus on a single concern. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
efa_av_init_util_av currently hard-codes util_attr.context_len to sizeof(struct efa_av_entry) - EFA_EP_ADDR_LEN. Add a context_len parameter and update the sole caller to pass this value explicitly. This prepares for efa_proto_av_open (added in a later commit) to call efa_av_init_util_av with a different context_len reflecting struct efa_proto_av_entry's size. No behavior change. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Introduce two base-layer helpers on struct efa_av_entry so callers no longer need to reach through the embedded struct efa_conn to get at the entry's identity or its ep_addr byte array. efa_av_addr_to_entry wraps the existing efa_av_addr_to_conn with a container_of so it returns the enclosing efa_av_entry directly. efa_av_entry_ep_addr returns a typed struct efa_ep_addr * view over the ep_addr[] byte array stored at the front of the entry. Both are inline. Callers in the base path can use these helpers today; more importantly, when the protocol-AV variant struct efa_proto_av_entry is added later it will have the same leading byte array but no conn pointer, so the byte-array accessor makes it possible to share reverse-AV helpers between the two entry types without a type shim. This commit also promotes efa_av_init_util_av from file-local to a public declaration in efa_av.h so the protocol AV implementation, added in a subsequent commit, can call it. Purely additive — no existing callers are migrated in this commit. Subsequent commits update base-path callers to use the new helpers. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Update efa_msg.c, efa_rma.c, efa_base_ep.c, and efa_domain.c to use the new efa_av_addr_to_entry() / efa_av_entry_ep_addr() helpers instead of efa_av_addr_to_conn() / conn->ep_addr. Reads are still routed through the embedded conn for ah (av_entry->conn.ah) because struct efa_conn is still embedded in efa_av_entry. A later commit strips the conn embedding and collapses these accesses to av_entry->ah directly. Mechanical rename; no behavior change. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
…av_entry * The reverse-AV helpers previously took struct efa_conn *. Change them to take struct efa_av_entry * so they can be shared across the base (efa-direct) and protocol (RDM, added in a later commit) layers without a type shim. Update internal field reads to go through the byte-array accessor efa_av_entry_ep_addr() rather than conn->ep_addr pointer deref. The ep_addr pointer and the byte array reference the same storage, so behavior is unchanged — but efa_av_entry_ep_addr() also works for struct efa_proto_av_entry (the protocol's flat layout variant added later), which lacks an ep_addr pointer field. Store struct efa_av_entry * in the efa_cur_reverse_av / efa_prv_reverse_av structs instead of struct efa_conn *. Update callers in efa_av.c and efa_conn.c using container_of() to pass the enclosing efa_av_entry when they only have a conn pointer. No behavior change. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Add the full efa_proto_av.c implementation as a single unit. Protocol AV concentrates all RDM-specific AV concerns in one file: implicit AV insert and release, implicit-to-explicit migration, per-endpoint peer map, implicit-AV LRU tracking, SHM AV insertion, connid-aware reverse lookup, and peer allocation tied to entry lifetime. Moving these responsibilities out of efa_av.c / efa_conn.c is what lets the base AV shrink to an efa-direct-only layer later in this series. This commit drops the file in but deliberately leaves it unwired. fi_ops_domain_rdm.av_open still points at efa_av_open, so nothing in the RDM domain path reaches any code in efa_proto_av.c yet. Functions are given external linkage so the library still links. A later commit registers efa_proto_av_open as the RDM domain's av_open, switches struct efa_rdm_peer's backing object from struct efa_conn to struct efa_proto_av_entry, and updates all RDM callers in one atomic step. One interim-state detail: the newly added pke peer-fi_addr callback reads peer->conn->fi_addr because struct efa_rdm_peer still carries a struct efa_conn * at this point. When the peer switch lands, that single expression becomes peer->av_entry->fi_addr along with the rest of the peer-type change. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
efa_rdm_ep_destroy_buffer_pools tests whether a peer's explicit fi_addr is populated; if not, falls through to the implicit AV. The test was written with FI_ADDR_UNSPEC, which is semantically "unspecified peer address" (used by e.g. fi_recvfrom for any source) rather than "this slot has no valid fi_addr". FI_ADDR_UNSPEC and FI_ADDR_NOTAVAIL are both defined as (uint64_t)-1 in fabric.h, so the runtime behavior is identical. This is a naming correction, not a behavior change. Every other site in the provider that tests the same peer->conn->fi_addr / ->implicit_fi_addr condition uses FI_ADDR_NOTAVAIL (e.g. efa_rdm_cq.c, efa_proto_av.c, efa_av.c). Bring this site in line for consistency. No behavior change. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
…o_av_entry This is the atomic switch that moves the RDM protocol layer off of struct efa_conn and onto struct efa_proto_av_entry. Because the peer's backing object changes type and every caller that reaches through a peer pointer must be updated in lockstep, splitting this work across multiple commits would leave the tree uncompilable between them. At the domain level, fi_ops_domain_rdm.av_open is set to efa_proto_av_open so RDM domains instantiate an efa_proto_av instead of the base efa_av. A proto_av field is added to struct efa_rdm_ep and populated in efa_rdm_ep_bind via container_of(av, struct efa_proto_av, efa_av); caching it on the endpoint avoids repeating the container_of on the hot path. struct efa_rdm_peer's backing object changes from struct efa_conn * to struct efa_proto_av_entry *. The field is renamed from conn to av_entry to reflect the new type, and efa_rdm_peer_construct's signature is updated to take the new pointer. All RDM sources that dereferenced peer->conn are updated to peer->av_entry: efa_rdm_cq.c, efa_rdm_msg.c, efa_rdm_ope.c, efa_rdm_pke*.c, efa_rdm_util.c, efa_domain.c, and efa_rdm_ep_utils.c. The efa_rdm_ep_get_peer_* helpers in efa_rdm_ep_utils.c are migrated from efa_av_addr_to_conn / conn-based peer-map lookup to efa_proto_av_addr_to_entry and efa_proto_av_entry_ep_peer_map_lookup. efa_rdm_ep_get_explicit_shm_fi_addr moves out of efa_rdm_ep.h into efa_rdm_ep_utils.c. Keeping the accessor inline in the header would force efa_rdm_ep.h to include efa_proto_av.h, and efa_proto_av.h depends transitively on efa_rdm_peer.h; an out-of-line definition keeps the include graph acyclic. The lone pke callback introduced with efa_proto_av.c that still read peer->conn->fi_addr is flipped to peer->av_entry->fi_addr as part of this commit — it is the single expression that had to wait for the peer-type switch to land. efa_rdm_ep_destroy_buffer_pools similarly migrates to the efa_proto_av_entry peer-map entry type. efa_ah_implicit_av_evict_ah in efa_ah.c is updated to iterate efa_proto_av_entry on the AH LRU list and call efa_proto_av_entry_release_ah_unsafe. This function is relocated to efa_proto_av.c entirely in the subsequent AH-layering commit; the update here keeps the intermediate state functional. Test mocks and the efa AV test file are updated for the new peer field name, the new efa_rdm_peer_construct signature, and the new peer-map entry type. test_av_verify_av_hash_cnt gains a struct efa_proto_av * parameter so it can reach both the base efa_av hash counters and the protocol-only implicit-AV fields through a single call. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
efa_ah_alloc acquires util_domain.lock before looking up the AH in the domain's hash map. If the subsequent malloc for a new efa_ah fails, the function sets errno to FI_ENOMEM, logs a warning, and returns NULL without releasing the lock. The caller has no way to know the lock is still held, so every subsequent operation that touches util_domain.lock (including the caller's own cleanup path and any other thread trying to allocate an AH) deadlocks or UBs. Release util_domain.lock on the malloc-failure path before returning NULL, matching the two other failure paths in the function that do the unlock via err_free_efa_ah. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
efa_ah.c reached into protocol-layer types to run AH eviction on ENOMEM: it iterated an AH's implicit_conn_list as struct efa_proto_av_entry and called efa_proto_av_entry_release_ah_unsafe to drop the associated AV entries. Those types are defined in rdm/efa_proto_av.h, which efa_ah.c picks up transitively through efa.h; the base AH translation unit should not be aware of them. The circularity shows up in practice whenever the protocol layer grows a new concern — every change to efa_proto_av_entry forces a recompile of efa_ah.c and risks dragging protocol-only behavior into the efa-direct build. The fix splits AH allocation into a base path and a protocol path. efa_ah_alloc stays in efa_ah.c but returns NULL on ENOMEM without attempting eviction; its only responsibilities are now creating the ibv_ah, populating gid/ahn/refcnt, and inserting into the domain's AH hash. The pre-existing EINVAL diagnostic (pretty-printing local and remote GIDs plus the PD pointer for the common cross-AZ / invalid-GID / invalid-PD causes) is preserved in the base path. A new efa_proto_ah_alloc is added in efa_proto_av.c. It calls efa_ah_alloc and, on ENOMEM, invokes efa_proto_ah_evict to release the AH at the head of the LRU whose explicit_refcnt is zero, then retries. The eviction logic itself is lifted wholesale from efa_ah.c into efa_proto_av.c, where it belongs alongside the other implicit-AV bookkeeping. efa_ah_destroy_ah is promoted from a file-local function prototype to a public declaration in efa_ah.h so efa_proto_ah_evict can call it after dropping the last reference. efa_ah.c no longer references efa_proto_av_entry. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
The efa_ah struct contained protocol-specific fields (implicit_refcnt, explicit_refcnt, implicit_conn_list, domain_lru_ah_list_entry) that are only used by the RDM protocol path for implicit AV management and AH LRU eviction. The efa-direct path only needs the ibv_ah, GID, AHN, and a reference count. This commit separates them following the same pattern used for efa_av / efa_proto_av. The base efa_ah struct is stripped to: gid, ibv_ah, ahn, refcnt, and the hash handle. The split implicit/explicit refcounts are replaced with a single refcnt. efa_ah_alloc now takes an alloc_size parameter instead of bool insert_implicit_av, allowing the protocol layer to allocate a larger efa_proto_ah. efa_ah_release no longer takes a bool parameter — it simply decrements refcnt and destroys at zero. The base layer has no knowledge of LRU lists, implicit AV entries, or eviction. A new efa_proto_ah struct is added to efa_proto_av.h, embedding efa_ah as its first member (castable via container_of). It adds the protocol-specific fields: implicit_refcnt, explicit_refcnt, implicit_conn_list (list of implicit AV entries using this AH), and lru_list_entry (position in the domain's AH LRU list). efa_proto_ah_alloc in efa_proto_av.c allocates sizeof(efa_proto_ah) via efa_ah_alloc, initializes the protocol fields for new AHs, manages the implicit/explicit refcount split, maintains the LRU list, and handles ENOMEM with eviction retry. efa_proto_ah_release decrements the appropriate protocol refcount and removes from the LRU list when both reach zero before calling efa_ah_release to decrement the base refcount. The efa_proto_ah_lru_move function (formerly efa_ah_implicit_av_lru_ah_move in efa_ah.c) is now static in efa_proto_av.c, operating on the efa_proto_ah lru_list_entry. The efa_proto_ah_from_ah inline accessor is added to efa_proto_av.h for converting base efa_ah pointers to their protocol wrapper. Update efa_conn.c and efa_av.c to access the moved fields (implicit_refcnt, explicit_refcnt, implicit_conn_list) through efa_proto_ah_from_ah(). These files contain the old implicit-AV code paths that remain alive until the next commit strips efa_av and deletes efa_conn. Update efa_ah_alloc/release callers in these files for the new signatures. Split the efa_ah_cnt_av_impl test helper into two path-specific variants: efa_ah_cnt_av_efa_impl uses efa_proto_ah_from_ah() to check the protocol refcnts (implicit/explicit), while efa_ah_cnt_av_efa_direct_impl checks the base efa_ah refcnt directly. The combined helper would have dereferenced efa_proto_ah fields past the end of a base-sized efa_ah allocation on the efa-direct path; splitting the helpers removes the runtime branching and the out-of-bounds access. Test mocks are updated for the new efa_ah_alloc (size_t alloc_size) and efa_ah_release (no bool) signatures. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
With every protocol-specific AV concern now living in efa_proto_av.c, remove the remnants from the base AV layer. efa_av.h and efa_av.c are reduced to what the efa-direct path actually needs: the efa_av struct now contains only domain, used, type, cur_reverse_av, prv_reverse_av, and util_av. All implicit-AV fields, LRU tracking, peer-map logic, SHM AV handling, and implicit reverse-lookup functions are gone from this file. struct efa_av_entry becomes a flat 48-byte record of ep_addr[32], a struct efa_ah *ah, and fi_addr_t fi_addr. struct efa_conn is no longer embedded, and the base path's callers switch from av_entry->conn.ah to av_entry->ah directly. The _Static_assert that previously tied efa_av_entry.conn.ah and efa_proto_av_entry.ah to a common offset is updated to reference efa_av_entry.ah directly. Two new static helpers efa_av_entry_init and efa_av_entry_release replace the old efa_conn_alloc / efa_conn_release paths for the base-only lifecycle. efa_av_insert_one and efa_av_insert lose their external linkage because they are only called from within efa_av.c now (RDM uses efa_proto_av_insert_one). Drop efa_conn.c and efa_conn.h from prov/efa/Makefile.include and libfabric.vcxproj. The files themselves are left on disk and deleted in a follow-up commit so this commit is strictly a functionality change and the next one is a pure file deletion. Leaving the files out of the build list means their compile errors against the new efa_av / efa_av_entry shapes are harmless — nothing includes or compiles them from this commit forward. Update efa_ah.c to drop the now-unused #include "efa_conn.h". Update the efa_unit_test_proto_av.c test file to call efa_proto_av_reverse_lookup in place of the deleted efa_av_reverse_lookup_rdm. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
The previous commit removed the last references to efa_conn from the tree (both as source and as a build target). The files are no longer compiled or included by anything. Delete them. Pure file deletion — no other changes. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Split efa_unit_test_av.c into two files: - efa_unit_test_av.c: base AV tests (efa-direct, AH counting, EP binding) - efa_unit_test_proto_av.c: protocol AV tests (implicit AV, LRU, peer map, reverse lookup, implicit-to-explicit migration, AH eviction) The new efa_unit_test_proto_av.c also contains the body of test_av_reverse_av_remove_qpn_collision, whose declaration and registration were added earlier but whose implementation had not been defined. Co-locating it with the other protocol AV tests keeps declaration, registration, and definition in sync. Update prov/efa/Makefile.include to build the new file. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Add thirteen new unit tests that exercise the refactored AV code paths. The existing suite focused mostly on AH refcount and endpoint-binding behavior; these tests add direct coverage of entry insert/remove/lookup, protocol-AV-only behaviors, and a handful of error paths that previously had no assertions. Base AV coverage adds test_av_insert_remove_lookup_efa_direct, which walks a full insert/lookup/remove cycle against the base efa_av, and test_av_base_addr_to_entry_invalid, which confirms that efa_av_addr_to_entry returns NULL for FI_ADDR_NOTAVAIL and FI_ADDR_UNSPEC inputs. Protocol AV coverage exercises lookup and lifetime of efa_proto_av_entry: test_av_proto_reverse_lookup_explicit covers reverse lookup after insert and remove, test_av_proto_prv_reverse_av covers QPN reuse falling through to prv_reverse_av, test_av_proto_addr_to_entry_after_remove checks the validity bit (ah != NULL) after a remove, test_av_proto_insert_remove_with_peer drives the per-endpoint peer-map lifecycle alongside insert and remove, test_av_proto_batch_insert covers a multi-address fi_av_insert, and test_av_proto_remove_nonexistent covers the error path for a fi_addr that was never inserted. Implicit-AV behavior adds test_av_implicit_to_explicit_peer_updated, which verifies that migrating a peer from the implicit AV to the explicit AV preserves peer and AH identity; test_av_implicit_av_unbounded, which confirms the implicit AV grows without an explicit cap; and test_av_implicit_av_lru_move_single, which exercises LRU move when only a single implicit entry exists. Input validation adds test_av_proto_insert_invalid_address for inserts of all-zero endpoint addresses and test_av_proto_open_unsupported_attrs for fi_av_open with attributes the protocol AV does not accept. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
| * @param[in] addr address to check | ||
| * @return non-zero if valid, 0 if all-zeros | ||
| */ | ||
| static inline int efa_av_is_valid_address(struct efa_ep_addr *addr) |
There was a problem hiding this comment.
Commit message says "add" but it's really being moved
| * ep_addr[32] off=0 — TX hot (qpn@+16, qkey@+20) | ||
| * ah* off=32 — TX hot (EFA send path) | ||
| * fi_addr off=40 — RX hot (explicit peer lookup, CQ poll) | ||
| * implicit_fi_addr off=48 — RX hot (implicit peer lookup, CQ poll) |
There was a problem hiding this comment.
Implicit AV entries are not considered hot path. In the hot path, the application inserts the peer into the AV
I think switching implicit_fi_addr and shm_fi_addr would be better
| * efa_av.prv_reverse_av (off=32) — HASH_FIND fallback for QPN reuse (connid mismatch) | ||
| * These are in cacheline 0 — explicit peer reverse lookup stays in one line. | ||
| * | ||
| * RX hot path for implicit (unknown) peers: |
There was a problem hiding this comment.
Same as above. Let's not call the implicit peer path "hot"
| /** | ||
| * @brief Protocol AV entry — flat layout with same field prefix as efa_av_entry | ||
| * | ||
| * pahole: |
There was a problem hiding this comment.
The layout is nice to have in the comments but we must make sure the comment is up to date
| return NULL; | ||
|
|
||
| entry = (struct efa_proto_av_entry *)util_av_entry->data; | ||
| return entry->ah ? entry : NULL; |
There was a problem hiding this comment.
Shouldn't this be assert(entry->ah)? Based on the util_av removal code below, you set entry->ah to NULL after calling ofi_av_remove_addr which calls ofi_ibuf_free. So we shouldn't ever have the case where ofi_bufpool_ibuf_is_valid returns true but ah is NULL
| #include "efa_data_path_ops.h" | ||
| #include "ofi_util.h" | ||
| #include "efa_av.h" | ||
| #include "rdm/efa_proto_av.h" |
There was a problem hiding this comment.
The rdm/ prefix shouldn't be necessary because this file is also in rdm/ folder. Also happens in multiple other files.
| * ah_implicit_conn_list_entry off=88 — implicit AV insert/release | ||
| * av* off=104 — back-pointer for AH eviction | ||
| */ | ||
| struct efa_proto_av_entry { |
There was a problem hiding this comment.
rdm stands for reliable datagram, and all of efa is reliable datagram.... Its a terrible name, and I think that a pure rename change %s/rdm/proto won't go so well... so I want to do it in bits and pieces.
There was a problem hiding this comment.
I think it will go well if we do s/efa_rdm/efa_proto finally, AFAIU we don't have such combo beyond the code
|
|
||
| struct efa_rdm_ep { | ||
| struct efa_base_ep base_ep; | ||
| struct efa_proto_av *proto_av; /* set during fi_ep_bind, avoids container_of on hot path */ |
There was a problem hiding this comment.
Similar comment about prefix. rdm_av would be consistent with the previous refactors.
| * The protocol extension fields start at offset 88 (cacheline 1), so | ||
| * accessing them on the eviction path does not pollute the TX cache line. | ||
| */ | ||
| struct efa_proto_ah { |
There was a problem hiding this comment.
Similar comment about prefix. efa_rdm_ah would be better.
Refactor EFA AV layer into base and protocol components
What
Splits the EFA provider's AV layer into base (efa_av) and protocol (efa_proto_av) components, mirroring the existing pattern used elsewhere in the provider. Does the same split for efa_ah. Deletes efa_conn.{c,h}.
Why
Three interrelated design decisions drive this refactor:
Split AV into base + protocol. struct efa_av carried implicit-AV fields (LRU list, implicit util_av, evicted_peers hashset) that efa-direct never uses. struct efa_av_entry embedded struct efa_conn, which bundled RDM-specific concerns (AH lifetime, peer map, SHM insertion, implicit-to-explicit migration). After this refactor, the efa-direct data path uses minimal base structs; all RDM-specific state lives in efa_proto_* wrappers that container_of-embed the base types.
Delete efa_conn entirely. Once struct efa_proto_av_entry exists as the RDM-side AV entry, struct efa_conn is redundant — it was always just"the RDM-specific tail" of struct efa_av_entry. Keeping both would leave the codebase with two types representing the same thing. The cleaner resolution is to absorb efa_conn's responsibilities into efa_proto_av_entry (for protocol behaviors) and efa_av_entry (for base behaviors), then remove efa_conn.c/h outright. struct efa_rdm_peer's backing pointer is retyped from efa_conn * to efa_proto_av_entry * in the same change.
Split AH along the same base/protocol lines. struct efa_ah carried protocol-only state: split implicit/explicit refcounts, an
implicit_conn_list, and an LRU-list entry for AH eviction. None of this matters on the efa-direct path. Worse, efa_ah.c iterated struct efa_proto_av_entry for its eviction logic — a layering violation where the base AH translation unit had a hard dependency on protocol-layer types. The fix is to shrink struct efa_ah to a single unified refcount and wrap it in a new struct efa_proto_ah that holds all the protocol concerns, then move the eviction logic out of efa_ah.c into efa_proto_av.c.
Testing
Every commit individually validated with --enable-debug --enable-asan --enable-efa-mem-poisoning plus the cmocka unit-test suite. 13 new unit tests added covering insert/lookup/remove cycles, reverse-AV behaviors, implicit-to-explicit migration, QPN reuse, and error paths.