prov/efa: Introduce new protocol interface and implement it for the eager protocol in the TX path#12144
prov/efa: Introduce new protocol interface and implement it for the eager protocol in the TX path#12144sunkuamzn wants to merge 11 commits into
Conversation
99faebb to
9e224cf
Compare
| #endif | ||
| efa_rdm_pke_handle_send_completion(pkt_entry); | ||
| efa_rdm_cq_increment_pkt_entry_gen(pkt_entry); | ||
| if (pkt_entry->callback) { |
There was a problem hiding this comment.
nit: a function would be neat here, e.g. - packet_completion_handle.
Also, should we assert on certain packet types having the callbacks set or not?
| dlist_insert_tail(&txe->peer_entry, &txe->peer->txe_list); | ||
| } | ||
|
|
||
| txe->internal_flags = 0; |
There was a problem hiding this comment.
memset to 0 at the top?
There was a problem hiding this comment.
I think the goal is to zero out only the fields we care about and save a few instructions
| txe->cq_entry.buf = | ||
| OFI_LIKELY(txe->cq_entry.len > 0) ? txe->iov[0].iov_base : NULL; | ||
|
|
||
| /* set flags */ |
There was a problem hiding this comment.
nit: would be neat to have txe_flags_set function
talavr-amazon
left a comment
There was a problem hiding this comment.
minor styling comments
| */ | ||
|
|
||
| /* | ||
| * Description of the protocol |
There was a problem hiding this comment.
nit: should it be on top?
| } else { | ||
| pkt_entry->callback(pkt_entry); | ||
| } | ||
| efa_rdm_cq_increment_pkt_entry_gen(pkt_entry); |
There was a problem hiding this comment.
will it be better to make efa_rdm_cq_increment_pkt_entry_gen part of the callback .. ?
There was a problem hiding this comment.
I considered that. The callback is part of the protocol. The gen counter is used by the progress engine to keep track of duplicated completions. So I felt that it's better to leave the gen counter increment here
|
|
||
| switch (op) { | ||
| case ofi_op_tagged: | ||
| txe->cq_entry.flags = FI_TRANSMIT | FI_MSG | FI_TAGGED; |
There was a problem hiding this comment.
Nit: maybe we can define an array which index is ofi_op_code and value is the flag
, since the op code is an enum
enum {
ofi_op_msg,
ofi_op_tagged,
ofi_op_read_req,
ofi_op_read_rsp,
ofi_op_write,
ofi_op_write_async,
ofi_op_atomic,
ofi_op_atomic_fetch,
ofi_op_atomic_compare,
ofi_op_read_async,
ofi_op_max,
};
so can get rid of the switch case by using the index, and keep an if for tag case to fill the tag fields
| break; | ||
| case EFA_RDM_DC_EAGER_MSGRTM_PKT: | ||
| case EFA_RDM_DC_EAGER_TAGRTM_PKT: | ||
| assert(0 && "Eager protocol moved to refactored code path"); |
There was a problem hiding this comment.
This is not a packet construction ? seems it should be part of your next commit
| else | ||
| delivery_complete_requested = flags & FI_DELIVERY_COMPLETE; | ||
|
|
||
| req_pkt_type = delivery_complete_requested ? |
There was a problem hiding this comment.
It seems the protocol selection function already implement the same logic for delivery_complete_requested, req_pkt_type, etc. Why implementing them here again?
| #include "efa_rdm_pke_utils.h" | ||
| #include "efa_rdm_pke_req.h" | ||
|
|
||
| #include "efa_mr.h" |
There was a problem hiding this comment.
Since we only active the new code in this commit, should the earlier assertion in the switch cases added only in this commit?
| return txe->total_len <= max_rtm_data_capacity; | ||
| } | ||
|
|
||
| struct efa_rdm_proto efa_rdm_proto_eager = { |
There was a problem hiding this comment.
nit: this should go at the bottom
|
|
||
| /* TX path callbacks - one callback for each packet type that this protocol uses | ||
| */ | ||
| void efa_rdm_proto_eager_handle_rtm_send_completion( |
| struct efa_rdm_proto **proto) | ||
| { | ||
| return 0; | ||
| /* TODO: Handle memory registration of user buffers. |
There was a problem hiding this comment.
Is this something you're planning to complete as part of this refactor?
There was a problem hiding this comment.
yes, it's already implemented for medium and read based protocols in the full PR #12002
| assert(data_offset == 0 && data_size == -1); | ||
| ret = efa_rdm_pke_init_eager_msgrtm(pkt_entry, ope); | ||
| break; | ||
| case EFA_RDM_EAGER_TAGRTM_PKT: |
There was a problem hiding this comment.
is it not an opportunity to united these cases?
There was a problem hiding this comment.
This entire function will be removed when the refactor is complete, so I'm not sure it matters
|
|
||
| /* First try to use the refactored code path */ | ||
| err = efa_rdm_proto_select_send_protocol(ep, peer, msg, op, flags, txe, | ||
| &proto); |
There was a problem hiding this comment.
Please split the new and old paths to two different functions
9e224cf to
d12fd4c
Compare
cff7c7a to
9561f23
Compare
Add a callback function pointer to the efa_rdm_pke struct. When set, the CQ send completion handler invokes the callback instead of dispatching through the pkt_type-based switch statement in efa_rdm_pke_handle_send_completion. This allows each protocol to define its own send completion logic directly on the packet entry, decoupling completion handling from the centralized switch. The callback is checked conditionally so the old code path continues to work for protocols that have not yet been migrated. The member is has a generic name "callback" because the same callback will be used for two different purposes in the RX path in the future. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Add efa_rdm_proto.h defining the efa_rdm_proto struct which serves as the interface for all EFA RDM protocols. The interface includes: - can_use_protocol_for_send: determines if this protocol can handle a given send operation based on message size, peer capabilities, etc. - construct_tx_pkes: builds the TX packet entries for the operation - handle_tx_pkes_posted: post-send hook for protocol-specific bookkeeping - Packet type fields for msg/tagged/DC variants Add efa_rdm_proto.c with the protocol registry array and the efa_rdm_proto_send_pkes_posted_no_op default implementation. The RX path will be introduced in later commits. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Add efa_rdm_proto_txe_fill() which initializes a TXE for use by the new protocol interface. The logic is derived from efa_rdm_txe_construct but differs in that it does not set the mr, desc, or total_len fields. These fields will be populated earlier in efa_rdm_proto_select_send_protocol where they are needed for memory registration via efa_rdm_ope_try_fill_desc. This function is called by each protocol's construct_tx_pkes implementation to set up the remaining TXE fields (op, peer, flags, iov, cq_entry, etc.) before building the packet entries. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Add efa_rdm_proto_eager.c and efa_rdm_proto_eager.h defining the eager protocol using the efa_rdm_proto interface. The eager protocol sends messages that fit within a single MTU-sized packet. This commit adds the protocol struct with packet type definitions and a can_use_for_send function that checks if the message fits in a single packet after accounting for header size. The construct_tx_pkes and send completion callbacks are placeholders that will be implemented in subsequent commits. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Add efa_rdm_proto_select_send_protocol() which iterates through the registered protocols in priority order and selects the first one whose can_use_protocol_for_send() returns true. The function precomputes header flags and tagged/DC packet type variants once before the loop to avoid redundant computation across protocols. The function also initializes TXE fields (iov, desc, mr) from the fi_msg before protocol selection so that efa_rdm_ope_try_fill_desc can attempt internal memory registration for non-eager protocols. For now only the eager protocol is registered. Messages that do not fit eager fall through and return proto=NULL, signaling the caller to use the old code path. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Move the eager RTM packet construction logic from efa_rdm_pke_rtm.c into efa_rdm_proto_eager.c, implementing the construct_tx_pkes callback. The function initializes the TXE via efa_rdm_proto_txe_fill, allocates a single TX packet entry, initializes the RTM header with the message payload, and sets the per-packet send completion callback. Both regular and delivery-complete (DC) eager packets are supported, with DC packets using a separate callback and header struct that includes the send_id. Remove the old efa_rdm_pke_init_eager_msgrtm/tagrtm functions and their DC variants from efa_rdm_pke_rtm.c. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Move the eager RTM send completion handling from efa_rdm_pke_rtm.c into efa_rdm_proto_eager.c. Two callbacks are implemented: - Regular eager: verifies payload size matches total length, calls efa_rdm_ope_handle_send_completed to report the CQ completion and release the TXE, then releases the TX packet entry. - DC eager: checks efa_rdm_txe_dc_ready_for_release which requires both efa_outstanding_tx_ops == 0 and receipt received. Only releases the TXE when both conditions are met. Replace the old eager case in efa_rdm_pke_handle_send_completion with an assertion to catch any packets that incorrectly bypass the new callback. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Wire the new protocol interface into efa_rdm_msg_generic_send. The function now calls efa_rdm_proto_select_send_protocol first. If a protocol is selected, it calls construct_tx_pkes to build the packets, sends them via efa_rdm_pke_sendv, and calls handle_tx_pkes_posted. If no protocol is selected (proto == NULL), the function falls back to the existing code path using efa_rdm_txe_construct and efa_rdm_msg_post_rtm. This allows incremental migration: as each protocol is added to the registry, it will be picked up automatically while unimplemented protocols continue to use the old path. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Add efa_unit_test_proto.c with tests verifying protocol selection: - Eager for small/zero-length messages Add eager protocol tests to efa_unit_test_proto.c: - test_proto_eager_construct_pkes_single_pke: verifies 1 PKE with callback - test_proto_eager_send_completion_releases_txe: verifies TXE release - test_proto_eager_assigns_msg_id: verifies msg_id from peer->next_msg_id Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Some unit tests call efa_rdm_pke_handle_send_completion directly on eager packet entries, which now triggers an assertion since eager packets have been migrated to the callback path. Add efa_unit_test_pke_handle_send_completion utility in efa_unit_tests.h that checks for a callback first and falls through to the old handler if none is set. Update the RNR and DC OPE tests to use this utility. Add a TODO to remove the utility once all protocols are migrated. Also set the DC eager callback on manually constructed packet entries in the DC TXE release tests. Signed-off-by: Sai Sunku <sunkusa@amazon.com>
Add test verifying that: - A send is queued when peer_may_have_zcpy_rx is true and handshake has not been received yet - The queued OPE is dequeued and the packet is posted once handshake arrives and progress is called Signed-off-by: Sai Sunku <sunkusa@amazon.com>
9561f23 to
ae68528
Compare
A series of commits that introduce the new protocol interface for the efa fabric. The TX path for the eager protocols is implemented. Other protocols, RX, RMA and atomic paths are implemented in #12002