Alexeyk/runnable future fix netty#11350
Conversation
| @Advice.OnMethodEnter(suppress = Throwable.class) | ||
| public static boolean before(@Advice.AllArguments Object[] args) { | ||
| for (Object arg : args) { | ||
| if (isVertxInternalHandler(arg)) { |
There was a problem hiding this comment.
How often is this called? I'm worried that the reflective type checking could negatively impact throughput.
There was a problem hiding this comment.
If the usual isInstance / isAssignableFrom methods are not available then we should at least consider using a ClassValue to cache the result per-lookup-class.
Also this advice snippet is referring to a method in the surrounding advice class, which is a big no-no.
This would likely result in an error at runtime because the advice bytecode is lifted and patched into the library's class. It won't have access to the surrounding methods in this class because it is no longer part of the class (it has literally been cut out and pasted somewhere else.)
Even if somehow it did find the original class (say you mistakenly added the advice class as a helper) - that would result in the advice class getting pinned, and with it the entire instrumentation class-loader which means advice and instrumentation classes wouldn't unload.
The only code that advice snippets should call are:
- JDK classes
- classes from the library being instrumented
- classes that the agent makes available via the boot-class-path (
internal-apietc.) - and classes specifically listed as helpers to be injected into the library's class-loader
There was a problem hiding this comment.
You can see this in the failing instrumentation tests:
java.lang.IllegalAccessError: tried to access method datadog.trace.instrumentation.java.concurrent.AsyncPropagatingDisableInstrumentation$DisableVertxInternalTimerAdvice.isVertxInternalHandler(Ljava/lang/Object;)Z from class io.vertx.core.impl.VertxImpl
at io.vertx.core.impl.VertxImpl.scheduleTimeout(VertxImpl.java:695)
at io.vertx.core.impl.ContextInternal.setTimer(ContextInternal.java:476)
at io.vertx.sqlclient.impl.pool.SqlConnectionPool$1PoolRequest.onEnqueue(SqlConnectionPool.java:309)
at io.vertx.sqlclient.impl.pool.SqlConnectionPool$1PoolRequest.onConnect(SqlConnectionPool.java:326)
at io.vertx.core.net.impl.pool.SimpleConnectionPool$Acquire$3.run(SimpleConnectionPool.java:593)
at io.vertx.core.net.impl.pool.Task.runNextTasks(Task.java:43)
at io.vertx.core.net.impl.pool.CombinerExecutor.submit(CombinerExecutor.java:91)
at io.vertx.core.net.impl.pool.SimpleConnectionPool.execute(SimpleConnectionPool.java:244)
at io.vertx.core.net.impl.pool.SimpleConnectionPool.acquire(SimpleConnectionPool.java:639)
at io.vertx.sqlclient.impl.pool.SqlConnectionPool.acquire(SqlConnectionPool.java:331)
at io.vertx.sqlclient.impl.PoolImpl.acquire(PoolImpl.java:189)
at io.vertx.sqlclient.impl.PoolImpl.getConnection(PoolImpl.java:152)
at io.vertx.sqlclient.impl.PoolImpl.getConnection(PoolImpl.java:139)
at io.vertx.sqlclient.impl.PoolBase.getConnection(PoolBase.java:56)
dougqh
left a comment
There was a problem hiding this comment.
Claude flagged these two concerns...
On the first issue, I don't have an opinion. I'll leave it to your discretion.
On the second issue, my preference is also a more precise check; however, I don't feel strongly that it must be changed.
1 . Run.activate — predicate may be too narrow. (RunnableFutureInstrumentation.java:159-170)
if (delayNanos > 0 && state != null && state.getSpan() != null) {
return null;
}
return startTaskScope(state);
The fix skips activation only when state.getSpan() != null. But startTaskScope(state) consumes the captured continuation regardless of whether the span field is
populated — so a State that has a captured continuation but a null span field would still be consumed on the early enqueue call. Confirm that capture() always sets
both, or relax to state != null (simpler and safer). Worth verifying with AdviceUtils#capture / State to be sure the invariant holds.
- ByteBuddy matcher uses parameter-count for scheduleTimeout. (AsyncPropagatingDisableInstrumentation.java:178-185)
Matching 4-, 6-, and 7-arg overloads by takesArguments(N) is fragile across Vert.x versions. If Vert.x adds an 8-arg overload in a future minor (or renames one),
suppression silently stops applying. Two options:
- Match on the handler-argument type directly (e.g., takesArgument(K, named("io.vertx.core.Handler"))) instead of arity, and use a single advice.
- At minimum, leave a short comment pinning the assumption to a specific Vert.x version range so the next person knows what to re-check.
| if (arg == null || !arg.getClass().getName().startsWith("io.vertx.")) { | ||
| return false; | ||
| } | ||
| return implementsVertxHandler(arg.getClass()); |
There was a problem hiding this comment.
Is this just checking arg instanceof io.vertx.core.Handler?
If so, why the reflection?
Is it because Handler isn't present in all netty versions?
Even so, I think isAssignableFrom would likely be better.
dougqh
left a comment
There was a problem hiding this comment.
Claude mentions possible interactions with specific handlers...
ReadTimeoutHandler and IdleStateHandler are not in the disable list.
WriteTimeoutHandler.scheduleTimeout is suppressed, but the analogous ReadTimeoutHandler and IdleStateHandler are not — confirmed by searching the codebase. Both schedule
long-running framework-owned timers via the same EventExecutor.schedule(...) path. With this PR's fix making Netty 4.1.44+ propagation actually work, any request that
adds/touches these handlers mid-request can now capture the request span and hold the trace open until the timeout fires (likely seconds). This is the exact problem the
Vert.x half of the PR addresses — same shape, different framework. Worth adding both to AsyncPropagatingDisableInstrumentation.HashedWheelTimer has zero coverage.
No instrumentation in the repo (verified with code search). It's used by Redisson, Hystrix, some HTTP clients, and Netty itself in places. Tasks submitted to it don't
propagate context at all — neither the request span when scheduling nor any captured continuation when firing. Out of scope for this PR but a real gap.
| if (task instanceof ScheduledFuture | ||
| && task.getClass().getName().endsWith(".netty.util.concurrent.ScheduledFutureTask")) { | ||
| long delayNanos = ((ScheduledFuture<?>) task).getDelay(TimeUnit.NANOSECONDS); | ||
| if (delayNanos > 0 && state != null && state.getSpan() != null) { |
There was a problem hiding this comment.
question: is delayNanos always > 0 at this moment?
| return false; | ||
| } | ||
|
|
||
| if (minor > 1) { |
There was a problem hiding this comment.
I think that this is missing otherwise will make it pass things like 4.0.99
if (minor != 1) {
return false;
}
What Does This Do
Fixes context propagation for Netty
ScheduledFutureTaskon Netty4.1.44+by preserving captured continuations through the delayed-task self-enqueue path.Also prevents Vert.x internal timers from capturing active request spans when Vert.x schedules framework-owned timer handlers. User-created Vert.x timers still keep async context propagation.
Motivation
Netty
4.1.44+changed delayed scheduling for tasks submitted from outside the event loop. A delayedScheduledFutureTaskcan haverun()invoked once while the delay is still positive to enqueue itself onto the scheduled task queue, then invoked again when the deadline expires. The runnable-future instrumentation was consuming the captured continuation on the first invocation, so the actual delayed task ran without the scheduling span context.The first fix preserved the continuation by skipping that early Netty self-enqueue call.
Vert.x 5 is affected because it uses Netty underneath; In the failing tests Vert.x timers are wrapped in Netty
ScheduledFutureTask, so the generic Netty fix also applied to Vert.x internal timers.That exposed a second issue: some Vert.x framework timers can be scheduled while a request span is active. If those long-lived internal timers capture the request continuation, the request trace can stay open until the timer fires, causing test timeouts waiting for traces. The fix suppresses propagation only while Vert.x schedules Vert.x-owned timer handlers, leaving user
vertx.setTimer(...)propagation intact.Additional Notes
Test
legacyNettyTestwas added to test previous legacy Netty4.1.9.Test
latestDepTestImplementationwas configured to run on latest Netty4+.Netty
5is in its early stage onlyAlpha2available so far, skipped until first final release will be available.