Skip to content

Apache Ignite 3.1.0 – Node3 JVM Old Generation Saturation, Replication Future Accumulation, Deadlocks, and Raft Instability Under Concurrent RW_GET_ALL Load #13218

Description

@mdkhusro

Description:

We are testing Apache Ignite 3.1.0 using a 3-node cluster under heavy JMeter load (~3 million records).

Environment:

Ignite 3.1.0
Java 17
3 nodes
Xms/Xmx = 16GB
G1GC

Problem:
Only node3 experiences severe JVM heap pressure while node1/node2 remain relatively stable.

Observed JVM Usage:

node1 → ~50-60%
node2 → ~60-70%
node3 → ~99.96% Old Gen

even High CPU usage

Image

GC Statistics on node3:

Full GC Count = 2189
Full GC Time = 16220 sec

Errors Observed:
1.
IGN-TX-4 Failed to acquire a lock due to a possible deadlock

Replication is timed out

SYSTEM_WORKER_BLOCKED

Example:
A critical thread is blocked for 11772 ms:

node3-network-worker-5

JRaft PreVote timeout / unsuccessful election rounds

Heap Histogram Findings on node3:
Very large accumulation of:

CompletableFuture (~36 million)
PartitionReplicaListener$OperationId (~36 million)
TxCleanupReadyFutureList (~36 million)

We also observed very high raft activity for Zone 20 partitions (13k+ events in logs).

Example workload:

requestType=RW_GET_ALL
primaryKeys.size=39

Questions:

Is this expected under heavy concurrent RW_GET_ALL workload?
Could this indicate replication/transaction cleanup backlog or future accumulation issue?
Are there recommended tuning settings for this workload pattern?
Has this behavior improved in newer Ignite 3 versions?

I can provide:

GC logs
heap histogram
thread dumps
JVM graphs
additional logs if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions