Skip to content

xpay can return failure after MPP completes: routes remaining 0msat #9096

@vincenzopalazzo

Description

@vincenzopalazzo

Privacy / anonymization note

The concrete payment identifiers, invoice text, amounts, part amounts, and recipient details below have been anonymized/faked before posting publicly. The important property preserved is the state-machine relationship: pay was redirected to xpay, the RPC returned code=209 with remaining 0msat, then lower-level sendpay_success notifications with a preimage arrived and the successful parts summed to the intended amount.

Summary

On CLN v25.05.1, an application called pay for a BOLT12 invoice. The xpay plugin intercepted the call (pay -> xpay-as-pay). The RPC returned a failure:

JSONRPCError: code=209, message=Failed after 18 attempts...
... Then routing for remaining 0msat failed: amount must be non-zero

However, immediately afterwards CLN emitted successful sendpay_success notifications for the same payment_hash, including the payment_preimage. The recipient later confirmed receiving the exact amount for the same invoice and preimage.

This looks like an xpay MPP state-machine race/bug: xpay reaches a state where the remaining amount is 0msat, still calls askrene getroutes with amount_msat=0, askrene rejects it with amount must be non-zero, and xpay turns that internal zero-amount routing failure into a final user-facing RPC error even though the lower-level payment completed.

Version

Core Lightning / CLN v25.05.1

Relevant setup

  • Application/plugin calls pay.
  • xpay-handle-pay is enabled, so pay is redirected by xpay.
  • Payment was a BOLT12 invoice.
  • MPP/partial payment attempts were involved.

Payment identifiers / fake example values

payment_hash:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

payment_preimage:
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

completed amount:
12345000msat = 12345 sats

Observed log sequence, with anonymized/fake values

  1. The plugin fetched the BOLT12 invoice successfully.

  2. Application started payment via pay:

plugin-cln4go-plugin: paying the offer ...
plugin-cln-xpay: Got command pay
plugin-cln-xpay: Redirecting pay->xpay
  1. xpay attempted many payment parts. Several failed with routing/payment errors such as:
Error fee_insufficient
Error temporary_channel_failure
  1. The RPC caller received an error:
plugin-cln4go-plugin: WrapError received raw error:
JSONRPCError: code=209, message=Failed after 18 attempts...
... Then routing for remaining 0msat failed: amount must be non-zero
  1. Immediately afterwards, CLN emitted successful sendpay_success notifications for the same payment hash, for example:
{
  "payment_hash": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "payment_preimage": "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb",
  "status": "complete",
  "bolt12": "lni1qqexample..."
}
  1. The successful parts were:
1000000 msat
8000000 msat
3345000 msat

Sum:

12345000msat = 12345 sats
  1. The recipient later confirmed receiving exactly:
Amount: 12,345 sats
Invoice: same anonymized lni1...
Lightning preimage: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
  1. CLN also logged:
UNUSUAL plugin-cln-xpay:
Destination accepted partial payment, failed a part (...), but accepted only 12345000msat of 12345000msat. Winning?!

Note that in this case the log says 12345000msat of 12345000msat, i.e. the amount accepted equals the intended amount.

Suspected code path

pay redirection

In plugins/xpay/xpay.c, handle_rpc_command() logs:

plugin_log(cmd->plugin, LOG_DBG, "Got command %s", ...);
plugin_log(cmd->plugin, LOG_INFORM, "Redirecting pay->xpay");

and replaces the method with:

json_add_string(response, "method", "xpay-as-pay");

Final user-facing error construction

The code=209 error appears to be constructed in plugins/xpay/xpay.c:getroutes_done_err():

if (amount_msat_eq(payment->amount_being_routed, payment->amount))
        complaint = "Then routing failed";
else
        complaint = tal_fmt(tmpctx, "Then routing for remaining %s failed",
                            fmt_amount_msat(tmpctx, payment->amount_being_routed));

payment_give_up(aux_cmd, payment, PAY_UNSPECIFIED_ERROR,
        "Failed after %"PRIu64" attempts. %s%s: %s",
        payment->total_num_attempts,
        payment->prior_results,
        complaint,
        msg);

amount must be non-zero

That text appears to come from plugins/askrene/askrene.c:json_getroutes():

if (amount_msat_is_zero(*amount)) {
        return command_fail(cmd, JSONRPC2_INVALID_PARAMS,
                            "amount must be non-zero");
}

xpay already treats getroutes_for(0msat) as abnormal

In plugins/xpay/xpay.c:getroutes_for():

/* I would normally assert here, but we have reports of this happening... */
if (amount_msat_is_zero(deliver)) {
        payment_log(payment, LOG_BROKEN, "getroutes for 0msat!");
        send_backtrace("getroutes for 0msat!");
}

So it seems xpay knows routing 0msat is an abnormal state, but still continues and sends getroutes to askrene, which then rejects it and causes the final RPC failure.

Winning?! log

The Destination accepted partial payment... Winning?! log is in plugins/xpay/xpay.c:update_knowledge_from_error() and is triggered if a previous attempt succeeded and then another part fails:

if (any_attempts_succeeded(attempt->payment)) {
        payment_log(attempt->payment, LOG_UNUSUAL,
                    "Destination accepted partial payment,"
                    " failed a part (%s), but accepted only %s of %s."
                    "  Winning?!",
                    description,
                    fmt_amount_msat(tmpctx, total_delivered(attempt->payment)),
                    fmt_amount_msat(tmpctx, attempt->payment->amount));
}

In this incident it logged 12345000msat of 12345000msat, which appears to be a completed amount, not a partial one.

Expected behavior

If the remaining amount becomes 0msat, xpay should not call askrene getroutes with amount_msat=0.

If sufficient parts have completed and a preimage is known, the high-level RPC should return success.

If there are still in-flight parts, xpay should wait/reconcile rather than returning a final failure caused only by zero-amount route computation.

Actual behavior

The high-level pay/xpay RPC returned error code 209, while lower-level sendpay state completed and the recipient received the payment/preimage.

Impact

External applications using pay can mark a payout as failed even though the payment completed. This can lead to incorrect accounting, duplicate payout attempts, or manual reconciliation work.

In our case, the application dashboard did not record the Ocean payout as successful because the RPC returned an error, even though the recipient had the preimage and exact amount.

Current mitigation / question

As a temporary mitigation, I am testing/running:

xpay-slow-mode=true

My understanding is that this is intended to make xpay wait until all current payment parts have completed or failed before returning success/failure to the RPC caller. That seems relevant to this issue because the observed failure was that the high-level RPC returned an error while lower-level sendpay_success notifications for the same payment_hash arrived immediately afterwards.

Could you confirm whether xpay-slow-mode=true is the recommended workaround for this class of issue, or whether you would suggest a different temporary mitigation, such as disabling xpay-handle-pay and falling back to the legacy pay plugin?

I am also adding application-level reconciliation after any pay/xpay error using:

lightning-cli listpays payment_hash=<payment_hash>
lightning-cli listsendpays payment_hash=<payment_hash>

The main question is whether xpay-slow-mode=true should be enough to avoid returning a final RPC failure while in-flight MPP parts can still complete, or whether xpay still needs a code fix to avoid calling getroutes for 0msat and converting that into a user-facing payment failure.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions