Skip to content

Fix : disk space and file descriptor leaks in ChangelogStreamHandleReaderWithCache#27969

Open
Manishnemade12 wants to merge 1 commit intoapache:masterfrom
Manishnemade12:Leaks
Open

Fix : disk space and file descriptor leaks in ChangelogStreamHandleReaderWithCache#27969
Manishnemade12 wants to merge 1 commit intoapache:masterfrom
Manishnemade12:Leaks

Conversation

@Manishnemade12
Copy link
Copy Markdown

What is the purpose of the change

This pull request fixes severe disk space and file descriptor leaks in ChangelogStreamHandleReaderWithCache that occur during filesystem or network errors.

Previously, if a network cluster error disrupted DFS stream copying in downloadToCacheFile, the partially transferred temporary cache file was permanently left on disk, eventually causing disk full crashes on TaskManagers. Furthermore, if openAndSeek threw an IOException during channel positioning, the instantiated FileInputStream was leaked without being closed, exhausting OS file handles.

Brief change log

  • Explicitly invoke file.delete() on the target temporary block when IOException is encountered during DFS buffer copying in downloadToCacheFile.
  • Wrap the fin.getChannel().position(offset) inside a try-catch block under openAndSeek which guarantees IOUtils.closeQuietly(fin) is triggered if positioning fails, preventing unclosed file streams.

Verifying this change

This change is already covered by existing tests, such as standard Changelog State Backend recovery tests that implicitly test file system operations and DFS stream caching logic.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes (Improves resilience of Checkpointing recovery on TaskManagers)
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

… on seek errors in ChangelogStreamHandleReaderWithCache
@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Apr 19, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants