Component(s)
No response
Is your feature request related to a problem? Please describe.
The batch processor deprecation formally never happened, yet it still has well-known defects (mainly error propagation, concurrency), and yet the exporterhelper still does not entirely replace it for the metadata_keys feature was not finished.
We have debated whether to modernize it #13582.
We have tried to remove references to it #13766.
There are two obstacles to the proposed solution in #13583:
- Metrics are named incorrectly: exporterhelper is used to implement the "modernized" batch processor, its internal metrics show.
- Many configurations will gain "double batching" by default. We cannot let the OTel helm chart by default begin double batching.
Describe the solution you'd like
To address stated problem (2) in #13583, consider renaming the component: inlinebatchprocessor would be appropriate, to keep the existing component alive. As we know, there are a few processors that specifically advertise they are better when used with a batch processor before them, and we could say "use the inline batch processor".
However, we can also expand the scope. In open-telemetry/opentelemetry-collector-contrib#37787 I proposed a "pipeline processor" component, which is effectively all of the exporterhelper features bundled into an accessory for introducing a queue (maybe persistent), a batcher, a retry, and a timeout anywhere you need one. This component does plumbing inclusive of the batch processor feature set, but then it would ideally use configuration closer to exporterhelper, not the legacy batchprocessor.
Considering both concerns, I personally prefer the pipelineprocessor direction: it would be like #13583 except:
- Configuration uses struct { Timeout; Retry; QueueBatch } i.e., exactly/precisely the exporterhelper feature set including the storage extension to enable mid-pipeline persistence
- Component is named
pipelineprocessor
Then, batchprocessor is deprecated. As in #13766, we remove all references to the batch processor except the few places where it continues to make sense, e.g., groupbyattrsprocessor, where we'll say "use the pipeline processor with batch settings".
Describe alternatives you've considered
A similar proposal to "pipeline processor" was made, namely "queue processor":
open-telemetry/opentelemetry-collector-contrib#35803
The reason why someone might want a queue, has to do with failoverconnector, see:
open-telemetry/opentelemetry-collector-contrib#33007
The failoverconnector eventually added QueueBatch configuration. This has come full circle!
Untested! This means you can use a failover connector as an in-line batching process, and that the "modern" batch processor is already available by configuring a single failover connection. That is, if you're willing to (a) modify batchprocessor to exporterhelper configuration, (b) insert a connector for this purpose.
We might argue for "pipeline connector", then, which is exporterhelper's feature set in a connector, like "pipeline processor" is exporterhelper's feature set in a processor. Both appear to be reasonable ideas.
Additional context
I believe the Collector has power because mostly users do not have to think about connectors. The model of multiple receivers fanning-in (parallel), multiple processors (serial), multiple exporters fanning-out (parallel) is part of its success, its concise mental model. I would not want to recommend using a connector where a processor will do.
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Component(s)
No response
Is your feature request related to a problem? Please describe.
The batch processor deprecation formally never happened, yet it still has well-known defects (mainly error propagation, concurrency), and yet the exporterhelper still does not entirely replace it for the
metadata_keysfeature was not finished.We have debated whether to modernize it #13582.
We have tried to remove references to it #13766.
There are two obstacles to the proposed solution in #13583:
Describe the solution you'd like
To address stated problem (2) in #13583, consider renaming the component:
inlinebatchprocessorwould be appropriate, to keep the existing component alive. As we know, there are a few processors that specifically advertise they are better when used with a batch processor before them, and we could say "use the inline batch processor".However, we can also expand the scope. In open-telemetry/opentelemetry-collector-contrib#37787 I proposed a "pipeline processor" component, which is effectively all of the
exporterhelperfeatures bundled into an accessory for introducing a queue (maybe persistent), a batcher, a retry, and a timeout anywhere you need one. This component does plumbing inclusive of the batch processor feature set, but then it would ideally use configuration closer toexporterhelper, not the legacybatchprocessor.Considering both concerns, I personally prefer the
pipelineprocessordirection: it would be like #13583 except:pipelineprocessorThen,
batchprocessoris deprecated. As in #13766, we remove all references to the batch processor except the few places where it continues to make sense, e.g.,groupbyattrsprocessor, where we'll say "use the pipeline processor with batch settings".Describe alternatives you've considered
A similar proposal to "pipeline processor" was made, namely "queue processor":
open-telemetry/opentelemetry-collector-contrib#35803
The reason why someone might want a queue, has to do with
failoverconnector, see:open-telemetry/opentelemetry-collector-contrib#33007
The failoverconnector eventually added
QueueBatchconfiguration. This has come full circle!Untested! This means you can use a failover connector as an in-line batching process, and that the "modern" batch processor is already available by configuring a single failover connection. That is, if you're willing to (a) modify batchprocessor to exporterhelper configuration, (b) insert a connector for this purpose.
We might argue for "pipeline connector", then, which is exporterhelper's feature set in a connector, like "pipeline processor" is exporterhelper's feature set in a processor. Both appear to be reasonable ideas.
Additional context
I believe the Collector has power because mostly users do not have to think about connectors. The model of multiple receivers fanning-in (parallel), multiple processors (serial), multiple exporters fanning-out (parallel) is part of its success, its concise mental model. I would not want to recommend using a connector where a processor will do.
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.