-
Notifications
You must be signed in to change notification settings - Fork 4
Optimize OME-TIFF reader performance #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
20b46b2
7474679
72a9411
62de372
32f2cda
ef25e6b
a172d63
36002bb
2746509
019ce12
119ad4b
fc581da
774b929
b064809
7b67c8f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -6,6 +6,7 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| import gc | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| import json | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| import threading | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from concurrent.futures import ThreadPoolExecutor | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from multiprocessing import cpu_count | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -217,6 +218,94 @@ def __init__( | |||||||||||||||||||||||||||||||||||||||||||||||||||
| self.fused_ts = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.center = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Thread-local storage for TiffFile handles (thread-safe concurrent access) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| self._thread_local = threading.local() | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| self._handles_lock = threading.Lock() | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| self._all_handles: List = [] # Track all handles for cleanup | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| def close(self) -> None: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Close any open file handles to release resources. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| This should be called when finished using a TileFusion instance, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| or use it as a context manager (``with TileFusion(...) as tf:``) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| for automatic cleanup. Important for OME-TIFF inputs where file | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| handles are kept open for performance. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Close all thread-local handles | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| with self._handles_lock: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| for handle in self._all_handles: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| handle.close() | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| except Exception: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| pass # Best-effort cleanup | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| self._all_handles.clear() | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Clear this thread's handle reference | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| if hasattr(self._thread_local, "tiff_handle"): | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| self._thread_local.tiff_handle = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Clear this thread's handle reference | |
| if hasattr(self._thread_local, "tiff_handle"): | |
| self._thread_local.tiff_handle = None | |
| # Reset thread-local handle container so all threads drop stale references | |
| # A new threading.local() instance has no per-thread attributes, ensuring | |
| # that subsequent access will not see closed handles. | |
| self._thread_local = threading.local() |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The broad exception handler silently swallows all exceptions during cleanup. While this is acceptable for best-effort cleanup, it would be better to catch specific exceptions (e.g., AttributeError, RuntimeError) or at least log the suppressed exceptions for debugging purposes. Consider adding a debug log statement within the except block to aid troubleshooting.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context manager's exit method does not suppress exceptions (return value is None). This means that if an exception occurs within the context and close() also raises an exception, the original exception will be masked by the close() exception. Consider wrapping the close() call in a try-except block to ensure the original exception is preserved.
| def __exit__(self, exc_type, exc_val, exc_tb) -> None: | |
| """Exit the runtime context and close file handles.""" | |
| self.close() | |
| def __exit__(self, exc_type, exc_val, exc_tb) -> bool: | |
| """ | |
| Exit the runtime context and close file handles. | |
| If an exception occurred inside the context, any exception raised by | |
| close() is suppressed so that the original exception is not masked. | |
| """ | |
| if exc_type is not None: | |
| # An exception is already in progress; do best-effort cleanup | |
| # without masking the original error. | |
| try: | |
| self.close() | |
| except Exception: | |
| # Suppress cleanup errors to preserve the original exception. | |
| return False | |
| # Do not suppress the original exception. | |
| return False | |
| # No exception from the with-block; let close() errors propagate. | |
| self.close() | |
| # Always return False to avoid suppressing exceptions. | |
| return False |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The del method is generally discouraged in Python for resource cleanup because its execution timing is unpredictable and it may not be called at all in some circumstances (e.g., circular references, interpreter shutdown). This can lead to unreliable resource cleanup. Consider relying solely on the context manager protocol and explicit close() calls. If del must be kept for defensive cleanup, document this limitation clearly in the close() docstring.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __del__ destructor catches AttributeError for partially initialized objects, but the close() method may raise other exceptions during cleanup (e.g., if threading.Lock() operations fail). Consider catching a broader exception type or being more specific about what can fail during partial initialization. The comment mentions "Object may be partially initialized" but doesn't explain which attributes might be missing.
| self.close() | |
| except AttributeError: | |
| pass # Object may be partially initialized | |
| # During partial initialization or interpreter shutdown, attributes | |
| # such as ``_handles_lock``, ``_all_handles`` or ``_thread_local`` | |
| # may be missing or in an invalid state, and lock operations inside | |
| # ``close()`` may fail. Never let such errors escape from __del__. | |
| self.close() | |
| except Exception: | |
| # Best-effort cleanup; ignore all errors in the destructor. | |
| pass |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _get_thread_local_handle() method lacks a return type annotation. According to the docstring, it returns tifffile.TiffFile or None, which should be annotated as -> Optional[tifffile.TiffFile]. This would improve type checking and code clarity. Note that you'll need to import tifffile at the module level or use a string literal for forward reference.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing tifffile inside the _get_thread_local_handle() method is unnecessary since it's already used elsewhere in the module (e.g., in the I/O functions). The import statement is executed every time a new thread-local handle is created, adding minor overhead. Consider moving this import to the module level along with the other imports, or removing it if tifffile.TiffFile is already accessible through the imported functions.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tiff_handle stored in metadata (line 87 in ome_tiff.py) is never actually used. The _get_thread_local_handle method always creates new TiffFile handles for each thread (line 300), including the main thread, rather than reusing the metadata handle. This means the original metadata handle just sits unused, consuming resources.
Consider either removing the tiff_handle from metadata and always creating thread-local handles, or reuse the metadata handle for the first/main thread to avoid the redundant handle.
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -5,7 +5,7 @@ | |||||
| """ | ||||||
|
|
||||||
| from pathlib import Path | ||||||
| from typing import Dict, List, Tuple, Any | ||||||
| from typing import Any, Dict, Optional | ||||||
| import xml.etree.ElementTree as ET | ||||||
|
|
||||||
| import numpy as np | ||||||
|
|
@@ -31,8 +31,17 @@ def load_ome_tiff_metadata(tiff_path: Path) -> Dict[str, Any]: | |||||
| - channels: int | ||||||
| - pixel_size: (py, px) | ||||||
| - tile_positions: list of (y, x) tuples | ||||||
| - tiff_handle: tifffile.TiffFile | ||||||
| Open TIFF file handle kept for fast repeated access. The caller is | ||||||
| responsible for closing this handle by calling ``tiff_handle.close()`` | ||||||
| when it is no longer needed to avoid resource leaks. The handle | ||||||
| remains valid until it is explicitly closed, or until a higher-level | ||||||
| context manager (if used) closes it on your behalf. | ||||||
| """ | ||||||
| with tifffile.TiffFile(tiff_path) as tif: | ||||||
| # Keep file handle open for fast repeated access | ||||||
| tif = tifffile.TiffFile(tiff_path) | ||||||
|
Comment on lines
+49
to
+50
|
||||||
|
|
||||||
| try: | ||||||
| if not tif.ome_metadata: | ||||||
| raise ValueError("TIFF file does not contain OME metadata") | ||||||
|
|
||||||
|
|
@@ -66,19 +75,25 @@ def load_ome_tiff_metadata(tiff_path: Path) -> Dict[str, Any]: | |||||
| else: | ||||||
| tile_positions.append((0.0, 0.0)) | ||||||
|
|
||||||
| return { | ||||||
| "n_tiles": n_tiles, | ||||||
| "n_series": n_series, | ||||||
| "shape": (Y, X), | ||||||
| "channels": channels, | ||||||
| "time_dim": time_dim, | ||||||
| "position_dim": position_dim, | ||||||
| "pixel_size": pixel_size, | ||||||
| "tile_positions": tile_positions, | ||||||
| } | ||||||
|
|
||||||
|
|
||||||
| def read_ome_tiff_tile(tiff_path: Path, tile_idx: int) -> np.ndarray: | ||||||
| return { | ||||||
| "n_tiles": n_tiles, | ||||||
| "n_series": n_series, | ||||||
| "shape": (Y, X), | ||||||
| "channels": channels, | ||||||
| "time_dim": time_dim, | ||||||
| "position_dim": position_dim, | ||||||
| "pixel_size": pixel_size, | ||||||
| "tile_positions": tile_positions, | ||||||
| "tiff_handle": tif, | ||||||
| } | ||||||
|
Comment on lines
+86
to
+96
|
||||||
| except Exception: | ||||||
|
||||||
| except Exception: | |
| except (ValueError, ET.ParseError, AttributeError, KeyError): |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handler catches all exceptions too broadly. If an exception occurs after the tiff_handle has been stored in the return dictionary (line 87), the handle will be closed but the partially constructed dictionary with the now-closed handle will not be returned. However, if an exception occurs before line 78 (the return statement starts), the file is properly closed. Consider whether this broad exception handling is needed or if more specific exceptions should be caught.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handling in load_ome_tiff_metadata properly closes the file handle on errors, but this behavior is not tested. Consider adding a test case that triggers an exception (e.g., invalid OME metadata) and verifies the file handle is properly closed, preventing resource leaks.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature uses 'tiff_handle: tifffile.TiffFile = None' which requires the tifffile type to be in scope. Consider using 'Optional[tifffile.TiffFile]' from typing for better clarity and type checking, or document the type annotation convention being used. This makes the optional nature of the parameter more explicit in the function signature.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring states "Warning: A single TiffFile handle is NOT thread-safe for concurrent reads." This is accurate and helpful. However, it would be clearer to also mention that this function creates thread-local handles specifically to address this limitation, so callers don't need to worry about it when using TileFusion.
Consider adding: "TileFusion addresses this by maintaining thread-local handles automatically, so concurrent reads through TileFusion methods are safe."
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cached file handle is accessed using dict.get() which returns None by default. While this is handled correctly by the conditional checks in the read functions, this pattern means users who call these functions directly without the cached handle won't benefit from the performance optimization. Consider adding a note in the function documentation about best practices for repeated tile reads.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a tiff_handle is provided, errors during asarray() do not clean up the handle. If the caller passes an open handle and an exception occurs, the caller may not know the state of the handle. Consider documenting that the caller is responsible for handle cleanup even if this function raises an exception.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a tiff_handle is provided, errors during asarray() do not clean up the handle. If the caller passes an open handle and an exception occurs, the caller may not know the state of the handle. Consider documenting that the caller is responsible for handle cleanup even if this function raises an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type annotation for
_all_handlesis incomplete. It should beList[tifffile.TiffFile]instead of justListto provide proper type hints. This would improve code clarity and enable better static type checking.