-
Notifications
You must be signed in to change notification settings - Fork 599
HDDS-15114. Replace misconfigured ThreadPoolExecutor with Executors factory methods #10133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -32,9 +32,10 @@ | |
| import java.util.Set; | ||
| import java.util.concurrent.CompletableFuture; | ||
| import java.util.concurrent.ExecutionException; | ||
| import java.util.concurrent.ExecutorService; | ||
| import java.util.concurrent.Executors; | ||
| import java.util.concurrent.Future; | ||
| import java.util.concurrent.LinkedBlockingQueue; | ||
| import java.util.concurrent.ThreadPoolExecutor; | ||
| import java.util.concurrent.ThreadFactory; | ||
| import java.util.concurrent.TimeUnit; | ||
| import java.util.concurrent.atomic.AtomicBoolean; | ||
| import java.util.concurrent.atomic.AtomicLong; | ||
|
|
@@ -61,11 +62,9 @@ | |
| public class DataNodeMetricsService { | ||
|
|
||
| private static final Logger LOG = LoggerFactory.getLogger(DataNodeMetricsService.class); | ||
| private static final int MAX_POOL_SIZE = 500; | ||
| private static final int KEEP_ALIVE_TIME = 5; | ||
| private static final int POLL_INTERVAL_MS = 200; | ||
|
|
||
| private final ThreadPoolExecutor executorService; | ||
| private final ExecutorService executorService; | ||
| private final ReconNodeManager reconNodeManager; | ||
| private final boolean httpsEnabled; | ||
| private final int minimumApiDelayMs; | ||
|
|
@@ -97,13 +96,10 @@ public DataNodeMetricsService( | |
| this.metricsServiceProviderFactory = metricsServiceProviderFactory; | ||
| this.lastCollectionEndTime.set(-minimumApiDelayMs); | ||
| int corePoolSize = Runtime.getRuntime().availableProcessors() * 2; | ||
| this.executorService = new ThreadPoolExecutor( | ||
| corePoolSize, MAX_POOL_SIZE, | ||
| KEEP_ALIVE_TIME, TimeUnit.SECONDS, | ||
| new LinkedBlockingQueue<>(), | ||
| new ThreadFactoryBuilder() | ||
| .setNameFormat("DataNodeMetricsCollector-%d") | ||
| .build()); | ||
| ThreadFactory threadFactory = new ThreadFactoryBuilder() | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The original pool was intended (if misconfigured) to scale up to 500 threads under load (e.g., Recon polling thousands of DataNodes simultaneously). The fix correctly reflects actual behavior today, but it also avoiding efficient future scaling. If the intent was truly to allow bursting to a larger pool, can we think of using a bounded queue rather than simply removing it. At minimum, a comment explaining the corePoolSize choice and impact on removing the MAX_POOL_SIZE = 500. Bu wait, this could make it slower in large cluster having hundreds of datanodes where it makes concurrent DataNode queries. The original MAX_POOL_SIZE = 500 captured the right intent — it was just broken by the unbounded LinkedBlockingQueue. A better fix would be to make the pool size configurable with a sensible default: |
||
| .setNameFormat("DataNodeMetricsCollector-%d") | ||
| .build(); | ||
| this.executorService = Executors.newFixedThreadPool(corePoolSize, threadFactory); | ||
| } | ||
|
|
||
| /** | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: The pool size is
Math.max(64, Runtime.getRuntime().availableProcessors())at the call site inOzoneManagerServiceProviderImpl. On high-CPU machines this could be large. With the old code it barely mattered; now it will actually create that many threads. On a 256-core host that's 256 threads all blocking on I/O at the same time. Consider documenting or probably IMO we should be capping this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have switched from max to min. We should now be capped to 64 threads.