Skip to content

Alexeyk/runnable future fix netty#11350

Draft
AlexeyKuznetsov-DD wants to merge 4 commits into
masterfrom
alexeyk/runnable-future-fix-netty
Draft

Alexeyk/runnable future fix netty#11350
AlexeyKuznetsov-DD wants to merge 4 commits into
masterfrom
alexeyk/runnable-future-fix-netty

Conversation

@AlexeyKuznetsov-DD
Copy link
Copy Markdown
Contributor

What Does This Do

Fixes context propagation for Netty ScheduledFutureTask on Netty 4.1.44+ by preserving captured continuations through the delayed-task self-enqueue path.

Also prevents Vert.x internal timers from capturing active request spans when Vert.x schedules framework-owned timer handlers. User-created Vert.x timers still keep async context propagation.

Motivation

Netty 4.1.44+ changed delayed scheduling for tasks submitted from outside the event loop. A delayed ScheduledFutureTask can have run() invoked once while the delay is still positive to enqueue itself onto the scheduled task queue, then invoked again when the deadline expires. The runnable-future instrumentation was consuming the captured continuation on the first invocation, so the actual delayed task ran without the scheduling span context.

The first fix preserved the continuation by skipping that early Netty self-enqueue call.

Vert.x 5 is affected because it uses Netty underneath; In the failing tests Vert.x timers are wrapped in Netty ScheduledFutureTask, so the generic Netty fix also applied to Vert.x internal timers.

That exposed a second issue: some Vert.x framework timers can be scheduled while a request span is active. If those long-lived internal timers capture the request continuation, the request trace can stay open until the timer fires, causing test timeouts waiting for traces. The fix suppresses propagation only while Vert.x schedules Vert.x-owned timer handlers, leaving user vertx.setTimer(...) propagation intact.

Additional Notes

Test legacyNettyTest was added to test previous legacy Netty 4.1.9.
Test latestDepTestImplementation was configured to run on latest Netty 4+.
Netty 5 is in its early stage only Alpha2 available so far, skipped until first final release will be available.

@AlexeyKuznetsov-DD AlexeyKuznetsov-DD self-assigned this May 12, 2026
@AlexeyKuznetsov-DD AlexeyKuznetsov-DD added type: bug Bug report and fix inst: netty Netty instrumentation labels May 12, 2026
@Advice.OnMethodEnter(suppress = Throwable.class)
public static boolean before(@Advice.AllArguments Object[] args) {
for (Object arg : args) {
if (isVertxInternalHandler(arg)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often is this called? I'm worried that the reflective type checking could negatively impact throughput.

Copy link
Copy Markdown
Contributor

@mcculls mcculls May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the usual isInstance / isAssignableFrom methods are not available then we should at least consider using a ClassValue to cache the result per-lookup-class.

Also this advice snippet is referring to a method in the surrounding advice class, which is a big no-no.

This would likely result in an error at runtime because the advice bytecode is lifted and patched into the library's class. It won't have access to the surrounding methods in this class because it is no longer part of the class (it has literally been cut out and pasted somewhere else.)

Even if somehow it did find the original class (say you mistakenly added the advice class as a helper) - that would result in the advice class getting pinned, and with it the entire instrumentation class-loader which means advice and instrumentation classes wouldn't unload.

The only code that advice snippets should call are:

  • JDK classes
  • classes from the library being instrumented
  • classes that the agent makes available via the boot-class-path (internal-api etc.)
  • and classes specifically listed as helpers to be injected into the library's class-loader

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see this in the failing instrumentation tests:

java.lang.IllegalAccessError: tried to access method datadog.trace.instrumentation.java.concurrent.AsyncPropagatingDisableInstrumentation$DisableVertxInternalTimerAdvice.isVertxInternalHandler(Ljava/lang/Object;)Z from class io.vertx.core.impl.VertxImpl
	at io.vertx.core.impl.VertxImpl.scheduleTimeout(VertxImpl.java:695)
	at io.vertx.core.impl.ContextInternal.setTimer(ContextInternal.java:476)
	at io.vertx.sqlclient.impl.pool.SqlConnectionPool$1PoolRequest.onEnqueue(SqlConnectionPool.java:309)
	at io.vertx.sqlclient.impl.pool.SqlConnectionPool$1PoolRequest.onConnect(SqlConnectionPool.java:326)
	at io.vertx.core.net.impl.pool.SimpleConnectionPool$Acquire$3.run(SimpleConnectionPool.java:593)
	at io.vertx.core.net.impl.pool.Task.runNextTasks(Task.java:43)
	at io.vertx.core.net.impl.pool.CombinerExecutor.submit(CombinerExecutor.java:91)
	at io.vertx.core.net.impl.pool.SimpleConnectionPool.execute(SimpleConnectionPool.java:244)
	at io.vertx.core.net.impl.pool.SimpleConnectionPool.acquire(SimpleConnectionPool.java:639)
	at io.vertx.sqlclient.impl.pool.SqlConnectionPool.acquire(SqlConnectionPool.java:331)
	at io.vertx.sqlclient.impl.PoolImpl.acquire(PoolImpl.java:189)
	at io.vertx.sqlclient.impl.PoolImpl.getConnection(PoolImpl.java:152)
	at io.vertx.sqlclient.impl.PoolImpl.getConnection(PoolImpl.java:139)
	at io.vertx.sqlclient.impl.PoolBase.getConnection(PoolBase.java:56)

Copy link
Copy Markdown
Contributor

@dougqh dougqh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude flagged these two concerns...
On the first issue, I don't have an opinion. I'll leave it to your discretion.
On the second issue, my preference is also a more precise check; however, I don't feel strongly that it must be changed.

1 . Run.activate — predicate may be too narrow. (RunnableFutureInstrumentation.java:159-170)
if (delayNanos > 0 && state != null && state.getSpan() != null) {
return null;
}
return startTaskScope(state);
The fix skips activation only when state.getSpan() != null. But startTaskScope(state) consumes the captured continuation regardless of whether the span field is
populated — so a State that has a captured continuation but a null span field would still be consumed on the early enqueue call. Confirm that capture() always sets
both, or relax to state != null (simpler and safer). Worth verifying with AdviceUtils#capture / State to be sure the invariant holds.

  1. ByteBuddy matcher uses parameter-count for scheduleTimeout. (AsyncPropagatingDisableInstrumentation.java:178-185)
    Matching 4-, 6-, and 7-arg overloads by takesArguments(N) is fragile across Vert.x versions. If Vert.x adds an 8-arg overload in a future minor (or renames one),
    suppression silently stops applying. Two options:
  • Match on the handler-argument type directly (e.g., takesArgument(K, named("io.vertx.core.Handler"))) instead of arity, and use a single advice.
  • At minimum, leave a short comment pinning the assumption to a specific Vert.x version range so the next person knows what to re-check.

if (arg == null || !arg.getClass().getName().startsWith("io.vertx.")) {
return false;
}
return implementsVertxHandler(arg.getClass());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just checking arg instanceof io.vertx.core.Handler?
If so, why the reflection?
Is it because Handler isn't present in all netty versions?

Even so, I think isAssignableFrom would likely be better.

Copy link
Copy Markdown
Contributor

@dougqh dougqh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude mentions possible interactions with specific handlers...

  1. ReadTimeoutHandler and IdleStateHandler are not in the disable list.
    WriteTimeoutHandler.scheduleTimeout is suppressed, but the analogous ReadTimeoutHandler and IdleStateHandler are not — confirmed by searching the codebase. Both schedule
    long-running framework-owned timers via the same EventExecutor.schedule(...) path. With this PR's fix making Netty 4.1.44+ propagation actually work, any request that
    adds/touches these handlers mid-request can now capture the request span and hold the trace open until the timeout fires (likely seconds). This is the exact problem the
    Vert.x half of the PR addresses — same shape, different framework. Worth adding both to AsyncPropagatingDisableInstrumentation.

  2. HashedWheelTimer has zero coverage.
    No instrumentation in the repo (verified with code search). It's used by Redisson, Hystrix, some HTTP clients, and Netty itself in places. Tasks submitted to it don't
    propagate context at all — neither the request span when scheduling nor any captured continuation when firing. Out of scope for this PR but a real gap.

if (task instanceof ScheduledFuture
&& task.getClass().getName().endsWith(".netty.util.concurrent.ScheduledFutureTask")) {
long delayNanos = ((ScheduledFuture<?>) task).getDelay(TimeUnit.NANOSECONDS);
if (delayNanos > 0 && state != null && state.getSpan() != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is delayNanos always > 0 at this moment?

return false;
}

if (minor > 1) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is missing otherwise will make it pass things like 4.0.99

if (minor != 1) {
return false;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inst: netty Netty instrumentation type: bug Bug report and fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants