Component(s)
receiver/otlp
What happened?
Describe the bug
Memory leak in gRPC server where bufio.Reader buffers allocated in newFramer() are never cleaned up when connections close, despite proper goroutine cleanup.
Steps to reproduce
1. Collector config
Save as collector.yaml:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
read_buffer_size: 524288
write_buffer_size: 32768
keepalive:
enforcement_policy:
min_time: 5m
permit_without_stream: false
server_parameters:
time: 2h
timeout: 20s
max_connection_idle: 3m
max_connection_age: 5m
max_connection_age_grace: 45s
processors:
batch:
send_batch_max_size: 16384
exporters:
debug: {}
extensions:
pprof:
endpoint: 0.0.0.0:1777
service:
extensions: [pprof]
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug]
Run the collector:
otelcol-contrib --config collector.yaml
2. Connection-churn client
Save as main.go:
package main
import (
"context"
"log"
"sync"
"time"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
func main() {
const (
endpoint = "127.0.0.1:4317"
workers = 32
runtime_ = 45 * time.Minute
)
deadline := time.Now().Add(runtime_)
var wg sync.WaitGroup
for worker := 0; worker < workers; worker++ {
wg.Add(1)
go func(workerID int) {
defer wg.Done()
for time.Now().Before(deadline) {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
exporter, err := otlptracegrpc.New(ctx,
otlptracegrpc.WithEndpoint(endpoint),
otlptracegrpc.WithTLSCredentials(insecure.NewCredentials()),
otlptracegrpc.WithDialOption(grpc.WithBlock()),
)
if err != nil {
cancel()
log.Printf("worker %d: exporter create failed: %v", workerID, err)
time.Sleep(250 * time.Millisecond)
continue
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceName("otlp-grpc-repro"),
)),
)
otel.SetTracerProvider(tp)
tracer := otel.Tracer("repro")
for i := 0; i < 10; i++ {
_, span := tracer.Start(ctx, "repro-span")
span.End()
}
if err := tp.ForceFlush(ctx); err != nil {
log.Printf("worker %d: force flush failed: %v", workerID, err)
}
if err := tp.Shutdown(ctx); err != nil {
log.Printf("worker %d: shutdown failed: %v", workerID, err)
}
cancel()
time.Sleep(200 * time.Millisecond)
}
}(worker)
}
wg.Wait()
}
3. Inspect retained heap after GC
After the client has run long enough to create meaningful churn:
curl -s 'http://127.0.0.1:1777/debug/pprof/heap?gc=1' > heap.pb.gz
go tool pprof -top -sample_index=inuse_space heap.pb.gz
What we expect when the issue reproduces:
bufio.NewReaderSize remains visible after forced GC
- cumulative stack includes:
google.golang.org/grpc/internal/transport.newFramer
google.golang.org/grpc/internal/transport.NewServerTransport
google.golang.org/grpc.(*Server).newHTTP2Transport
Control experiment that did not reproduce
For comparison, a grpc-go-only harness with the same ReadBufferSize, keepalive settings, and server-driven connection churn stayed flat after quiescence plus forced GC.
That is the main reason I think this issue should start here instead of in grpc-go.
Missing Cleanup in http2Server.Close()
File: internal/transport/http2_server.go:1269-1292
func (t *http2Server) Close(err error) {
t.mu.Lock()
if t.state == closing {
t.mu.Unlock()
return
}
t.state = closing
streams := t.activeStreams
t.activeStreams = nil
t.mu.Unlock()
t.controlBuf.finish()
close(t.done)
if err := t.conn.Close(); err != nil && t.logger.V(logLevel) {
t.logger.Infof("Error closing underlying net.Conn during Close: %v", err)
}
channelz.RemoveEntry(t.channelz.ID)
for _, s := range streams {
s.cancel()
}
// ❌ MISSING: No cleanup of t.framer!
// The framer holds a bufio.Reader that is never released
}
The Framer Lifecycle
-
Created in NewServerTransport (line 172):
framer := newFramer(conn, writeBufSize, readBufSize, ...)
-
newFramer allocates bufio.Reader (http_util.go:419):
func newFramer(conn io.ReadWriter, writeBufferSize, readBufferSize int, ...) *framer {
var r io.Reader = conn
if readBufferSize > 0 {
r = bufio.NewReaderSize(r, readBufferSize) // ← Allocation
}
f := &framer{
reader: r, // ← bufio.Reader stored
// ...
}
return f
}
-
Stored in http2Server (line 83):
type http2Server struct {
framer *framer // ← Never cleaned up!
// ...
}
-
NOT cleaned up when Close() is called
No Cleanup Method Exists
The framer struct has no cleanup/close method:
type framer struct {
writer *bufWriter
fr *http2.Framer
headerBuf []byte
reader io.Reader // ← This is the bufio.Reader!
dataFrame parsedDataFrame
pool mem.BufferPool
errDetail error
}
// ❌ No cleanup() method exists
I will note, that the lack of cleanup methods here doesn't necessarily guarantee a memory leak. The above report was created with the help of both codex and claude. They weren't able to conclusively identify the real cause or fix. However, since the issue is easily reproducible in the latest version of the collector, I figured I better submit an upstream issue for more guidance.
Collector version
v0.149.0
Environment information
Environment
- collector v0.148.0
- Go version: 1.25.1
- gRPC version: v1.79.0
OpenTelemetry Collector configuration
Log output
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Component(s)
receiver/otlp
What happened?
Describe the bug
Memory leak in gRPC server where
bufio.Readerbuffers allocated innewFramer()are never cleaned up when connections close, despite proper goroutine cleanup.Steps to reproduce
1. Collector config
Save as
collector.yaml:Run the collector:
2. Connection-churn client
Save as
main.go:3. Inspect retained heap after GC
After the client has run long enough to create meaningful churn:
What we expect when the issue reproduces:
bufio.NewReaderSizeremains visible after forced GCControl experiment that did not reproduce
For comparison, a grpc-go-only harness with the same
ReadBufferSize, keepalive settings, and server-driven connection churn stayed flat after quiescence plus forced GC.That is the main reason I think this issue should start here instead of in
grpc-go.Missing Cleanup in http2Server.Close()
File:
internal/transport/http2_server.go:1269-1292The Framer Lifecycle
Created in
NewServerTransport(line 172):newFramer allocates bufio.Reader (http_util.go:419):
Stored in http2Server (line 83):
NOT cleaned up when Close() is called
No Cleanup Method Exists
The
framerstruct has no cleanup/close method:I will note, that the lack of cleanup methods here doesn't necessarily guarantee a memory leak. The above report was created with the help of both codex and claude. They weren't able to conclusively identify the real cause or fix. However, since the issue is easily reproducible in the latest version of the collector, I figured I better submit an upstream issue for more guidance.
Collector version
v0.149.0
Environment information
Environment
OpenTelemetry Collector configuration
Log output
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.