Skip to content

Potential memory leak in Apache Ignite 3: Millions of CompletableFuture, UUID, and PartitionReplicaListener instances #13282

Description

@echoeslove

I am running an application with Apache Ignite 3 (implied by the org.apache.ignite.internal package structure) and experiencing high memory usage / OutOfMemoryError.

To investigate, I took a Java heap histogram using jcmd/jmap. The results show a massive number of instances (~7.45 million) correlated across several standard Java classes and Ignite internal classes.

Here is the top part of the heap histogram:

 num     #instances          #bytes  class name (module)
-------------------------------------------------------
   1:        7521853       727150840  [Ljava.lang.Object; (java.base@25.0.1)
   2:        7456001       596505648  [Ljava.util.HashMap$Node; (java.base@25.0.1)
   3:        7455079       357843792  java.util.HashMap (java.base@25.0.1)
   4:        9388272       300424704  java.util.concurrent.ConcurrentHashMap$Node (java.base@25.0.1)
   5:        9335795       298745440  java.util.UUID (java.base@25.0.1)
   6:        7450168       298006720  java.util.EnumMap (java.base@25.0.1)
   7:        7468385       238988320  java.util.HashMap$Node (java.base@25.0.1)
   8:        7452120       178850880  java.util.concurrent.CompletableFuture (java.base@25.0.1)
   9:        7450136       178803264  org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener$OperationId
  10:        7450136       119202176  org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener$TxCleanupReadyFutureList
  11:        1881021        90289008  org.apache.ignite.internal.tx.TxStateMeta
  14:        1881756        45162144  org.apache.ignite.internal.replicator.ZonePartitionId

Observations:

  • There are exactly ~7.45 million instances of CompletableFuture, HashMap, EnumMap, and UUID.
  • This number perfectly matches Ignite's internal classes: PartitionReplicaListener$OperationId and PartitionReplicaListener$TxCleanupReadyFutureList.
  • It seems like a huge backlog of transaction cleanups or replica operations are being held in memory and never released.

My Environment:

  • Java Version: 25.0.1 (as seen in the log)
  • Ignite Version: Ignite 3 (Beta/RC or stable version depending on your setup)
  • Deployment: Using K8S,cluster has 3 nodes

Questions:

  1. What could cause PartitionReplicaListener futures (TxCleanupReadyFutureList) or OperationId to accumulate like this without being garbage collected?
  2. Is this a known issue/bug in Apache Ignite 3 regarding transaction or replication context cleanup?
  3. Are there any specific configuration parameters (e.g., transaction timeouts, replication parameters) I should tune to prevent this build-up?

Any guidance on how to debug this further or configuration fixes would be highly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions