Skip to content

prov/efa: Fix the use-after-free bugs for all protocols#12162

Open
shijin-aws wants to merge 3 commits intoofiwg:mainfrom
shijin-aws:fix_use_after_free_txe
Open

prov/efa: Fix the use-after-free bugs for all protocols#12162
shijin-aws wants to merge 3 commits intoofiwg:mainfrom
shijin-aws:fix_use_after_free_txe

Conversation

@shijin-aws
Copy link
Copy Markdown
Contributor

@shijin-aws shijin-aws commented Apr 21, 2026

Fix use-after-free in ope lifecycle by deferring release
until all outstanding TX ops complete. Multiple protocols
(CTS, SHORT_RTR, FETCH_RTA, COMPARE_RTA, DC eager/longcts,
EOR, RECEIPT) could release the ope before all associated
packet send completions arrived, allowing the buffer pool
slot to be reused and stale completions to corrupt the new
ope's tx ops counter. Introduce three flags --
EFA_RDM_TXE_REMOTE_ACK_RECEIVED, EFA_RDM_OPE_RECV_COMPLETED,
and EFA_RDM_RXE_ACK_IN_FLIGHT -- with per-protocol
ready-for-release helpers that gate release on both the
response/ack condition and efa_outstanding_tx_ops == 0. Add
a CTS send completion handler and an assertion in
efa_rdm_ep_record_tx_op_completed to catch counter underflow.

@shijin-aws
Copy link
Copy Markdown
Contributor Author

bot:aws:retest

@shijin-aws shijin-aws force-pushed the fix_use_after_free_txe branch 4 times, most recently from 69bbdf8 to e3630ad Compare April 24, 2026 00:25
@shijin-aws
Copy link
Copy Markdown
Contributor Author

bot:aws:retest

@shijin-aws shijin-aws force-pushed the fix_use_after_free_txe branch 6 times, most recently from 084238c to c493ece Compare April 27, 2026 22:26
@shijin-aws
Copy link
Copy Markdown
Contributor Author

bot:aws:retest

@shijin-aws shijin-aws force-pushed the fix_use_after_free_txe branch 4 times, most recently from c1cafce to 8bb6f56 Compare May 1, 2026 18:06
shijin-aws added 3 commits May 5, 2026 05:06
If a CTSDATA is used by a DC protocol, the tx entry should be
released when both TX ops are done and the receipt has
been received.

Signed-off-by: Shi Jin <sjina@amazon.com>
Fix use-after-free in ope lifecycle by deferring release
until all outstanding TX ops complete. Multiple protocols
(CTS, SHORT_RTR, FETCH_RTA, COMPARE_RTA, DC eager/longcts,
EOR, RECEIPT) could release the ope before all associated
packet send completions arrived, allowing the buffer pool
slot to be reused and stale completions to corrupt the new
ope's tx ops counter. Introduce three flags --
EFA_RDM_TXE_REMOTE_ACK_RECEIVED, EFA_RDM_OPE_RECV_COMPLETED,
and EFA_RDM_RXE_ACK_IN_FLIGHT -- with per-protocol
ready-for-release helpers that gate release on both the
response/ack condition and efa_outstanding_tx_ops == 0. Add
a CTS send completion handler and an assertion in
efa_rdm_ep_record_tx_op_completed to catch counter underflow.

Signed-off-by: Shi Jin <sjina@amazon.com>
@shijin-aws shijin-aws force-pushed the fix_use_after_free_txe branch from 32f3a72 to c7c9d0f Compare May 5, 2026 05:06
@shijin-aws
Copy link
Copy Markdown
Contributor Author

bot:aws:retest

1 similar comment
@shijin-aws
Copy link
Copy Markdown
Contributor Author

bot:aws:retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant