Skip to content

Resolution: Batching processor and/or connector still needed #15047

@jmacd

Description

@jmacd

Component(s)

No response

Is your feature request related to a problem? Please describe.

The batch processor deprecation formally never happened, yet it still has well-known defects (mainly error propagation, concurrency), and yet the exporterhelper still does not entirely replace it for the metadata_keys feature was not finished.

We have debated whether to modernize it #13582.

We have tried to remove references to it #13766.

There are two obstacles to the proposed solution in #13583:

  1. Metrics are named incorrectly: exporterhelper is used to implement the "modernized" batch processor, its internal metrics show.
  2. Many configurations will gain "double batching" by default. We cannot let the OTel helm chart by default begin double batching.

Describe the solution you'd like

To address stated problem (2) in #13583, consider renaming the component: inlinebatchprocessor would be appropriate, to keep the existing component alive. As we know, there are a few processors that specifically advertise they are better when used with a batch processor before them, and we could say "use the inline batch processor".

However, we can also expand the scope. In open-telemetry/opentelemetry-collector-contrib#37787 I proposed a "pipeline processor" component, which is effectively all of the exporterhelper features bundled into an accessory for introducing a queue (maybe persistent), a batcher, a retry, and a timeout anywhere you need one. This component does plumbing inclusive of the batch processor feature set, but then it would ideally use configuration closer to exporterhelper, not the legacy batchprocessor.

Considering both concerns, I personally prefer the pipelineprocessor direction: it would be like #13583 except:

  • Configuration uses struct { Timeout; Retry; QueueBatch } i.e., exactly/precisely the exporterhelper feature set including the storage extension to enable mid-pipeline persistence
  • Component is named pipelineprocessor

Then, batchprocessor is deprecated. As in #13766, we remove all references to the batch processor except the few places where it continues to make sense, e.g., groupbyattrsprocessor, where we'll say "use the pipeline processor with batch settings".

Describe alternatives you've considered

A similar proposal to "pipeline processor" was made, namely "queue processor":
open-telemetry/opentelemetry-collector-contrib#35803

The reason why someone might want a queue, has to do with failoverconnector, see:
open-telemetry/opentelemetry-collector-contrib#33007

The failoverconnector eventually added QueueBatch configuration. This has come full circle!

Untested! This means you can use a failover connector as an in-line batching process, and that the "modern" batch processor is already available by configuring a single failover connection. That is, if you're willing to (a) modify batchprocessor to exporterhelper configuration, (b) insert a connector for this purpose.

We might argue for "pipeline connector", then, which is exporterhelper's feature set in a connector, like "pipeline processor" is exporterhelper's feature set in a processor. Both appear to be reasonable ideas.

Additional context

I believe the Collector has power because mostly users do not have to think about connectors. The model of multiple receivers fanning-in (parallel), multiple processors (serial), multiple exporters fanning-out (parallel) is part of its success, its concise mental model. I would not want to recommend using a connector where a processor will do.

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions