Skip to content

prov/efa: Fix the completion report for delivery complete#12115

Closed
shijin-aws wants to merge 4 commits intoofiwg:mainfrom
shijin-aws:fix_dc
Closed

prov/efa: Fix the completion report for delivery complete#12115
shijin-aws wants to merge 4 commits intoofiwg:mainfrom
shijin-aws:fix_dc

Conversation

@shijin-aws
Copy link
Copy Markdown
Contributor

Today, we report tx completions upon getting the receipt packet rx completion. This is wrong, because the tx completions may not arrive yet at this time and the buffer can still be used by the device. We cannot report completion in this case because the buffer cannot be safely reused by the application after getting the completion.

This patch fixes this bug by making delivery complete protocols only report completions when it get all tx ops finished and receives the receipt pkts.

@shijin-aws shijin-aws changed the title prov/efa: Fix the completion report for delivery complte prov/efa: Fix the completion report for delivery complete Apr 7, 2026
If a CTSDATA is used by a DC protocol, the tx entry should be
released when both TX ops are done and the receipt has
been received.

Signed-off-by: Shi Jin <sjina@amazon.com>
We have other protocols (emulated read, fetch/compare atomics)
that can have ack/resp packets delivered before the local tx
packets are completed. Apply the same fix we have done for
DC protocols earlier on these protocols as well that we need
to wait for both tx and rx packets completion before releasing
the tx entry.

This fix also renamed the earlier EFA_RDM_TXE_RECEIPT_RECEIVED
bit to EFA_RDM_TXE_RESPONSE_RECEIVED as a general flag for
all protocols (including DC) that involves ack/response from
the rx side to decide the tx entry lifeycle.

Signed-off-by: Shi Jin <sjina@amazon.com>
@shijin-aws shijin-aws marked this pull request as draft April 21, 2026 00:14
@shijin-aws
Copy link
Copy Markdown
Contributor Author

shijin-aws commented Apr 21, 2026

The first 3 commits are from #12162, need to rebase after its merged.

@shijin-aws
Copy link
Copy Markdown
Contributor Author

bot:aws:retest

… decrement

Signed-off-by: Shi Jin <sjina@amazon.com>
Today, we report tx completions for delivery complete (DC) protocols
upon getting the receipt packet. This is wrong because the tx send
completions may not have arrived yet, meaning the device can still be
using the buffer. We cannot report completion to the application until
the buffer is safe to reuse. Fix this by making DC protocols only
report completions via efa_rdm_ope_handle_send_completed when both
all TX ops have finished and the receipt packet has been received.

Signed-off-by: Shi Jin <sjina@amazon.com>
@shijin-aws
Copy link
Copy Markdown
Contributor Author

This PR needs major rework and depends on #12162. Will reopen

@shijin-aws shijin-aws closed this Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant