-
Notifications
You must be signed in to change notification settings - Fork 4
Optimize OME-TIFF reader performance #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
20b46b2
7474679
72a9411
62de372
32f2cda
ef25e6b
a172d63
36002bb
2746509
019ce12
119ad4b
fc581da
774b929
b064809
7b67c8f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -6,6 +6,7 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| import gc | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| import json | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| import threading | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from concurrent.futures import ThreadPoolExecutor | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from multiprocessing import cpu_count | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -144,6 +145,10 @@ def __init__( | |||||||||||||||||||||||||||||||||||||||||||||||||||
| self._metadata = load_individual_tiffs_metadata(self.tiff_path) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| self._metadata = load_ome_tiff_metadata(self.tiff_path) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Close the metadata handle immediately - we use thread-local handles | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| # for thread-safe concurrent reads instead of sharing this handle. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| if "tiff_handle" in self._metadata: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| self._metadata.pop("tiff_handle").close() | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Extract common properties | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.n_tiles = self._metadata["n_tiles"] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -217,6 +222,96 @@ def __init__( | |||||||||||||||||||||||||||||||||||||||||||||||||||
| self.fused_ts = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| self.center = None | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Thread-local storage for TiffFile handles (thread-safe concurrent access) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| self._thread_local = threading.local() | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| self._handles_lock = threading.Lock() | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| self._all_handles: List = [] # Track all handles for cleanup | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| def close(self) -> None: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Close any open file handles to release resources. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| This should be called when finished using a TileFusion instance, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| or use it as a context manager (``with TileFusion(...) as tf:``) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| for automatic cleanup. Important for OME-TIFF inputs where file | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| handles are kept open for performance. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Close all thread-local handles | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| with self._handles_lock: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| for handle in self._all_handles: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| handle.close() | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| except Exception: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| pass # Best-effort cleanup | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| self._all_handles.clear() | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Reset thread-local storage to prevent stale handle access. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Setting individual attributes to None only affects the current thread; | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| # resetting the entire object ensures all threads get fresh handles. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Reset thread-local storage to prevent stale handle access. | |
| # Setting individual attributes to None only affects the current thread; | |
| # resetting the entire object ensures all threads get fresh handles. | |
| # Reset thread-local storage for the *current* thread to prevent stale | |
| # handle access in this thread after close(). Note that each thread | |
| # has its own view of a threading.local() instance; this does *not* | |
| # clear attributes in other threads. Safety therefore relies on the | |
| # contract documented above: close() must only be called once all | |
| # worker threads have finished using this TileFusion instance. |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After resetting _thread_local to a new threading.local() object in the close() method, any thread that previously had a handle and tries to access it will get a fresh threading.local() instance without the tiff_handle attribute. This is correct behavior - the thread will create a new handle on next access. However, this creates a subtle issue: if a thread is still actively reading when close() is called, that thread's handle gets closed while it may be in use, potentially causing errors. Consider adding a note in the docstring warning that close() should only be called when all operations are complete.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment claims this "prevents stale handle access," but creating a new threading.local() object only affects the current thread's view. Other threads that previously accessed self._thread_local.tiff_handle still have their closed handles cached. When those threads call _get_thread_local_handle() later, the check hasattr(self._thread_local, "tiff_handle") will return True (since they still have the attribute in their thread-local namespace), but the handle is now closed. Consider either: 1) tracking thread IDs and clearing their attributes, 2) checking if handles are closed in _get_thread_local_handle(), or 3) removing this line since the warning already states operations during close() will fail.
| # Reset thread-local storage to prevent stale handle access. | |
| # Setting individual attributes to None only affects the current thread; | |
| # resetting the entire object ensures all threads get fresh handles. | |
| self._thread_local = threading.local() | |
| # Clear this thread's cached handle; other threads must not use this | |
| # TileFusion instance after close(), as documented in the warning above. | |
| if hasattr(self._thread_local, "tiff_handle"): | |
| del self._thread_local.tiff_handle |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context manager's exit method does not suppress exceptions (return value is None). This means that if an exception occurs within the context and close() also raises an exception, the original exception will be masked by the close() exception. Consider wrapping the close() call in a try-except block to ensure the original exception is preserved.
| def __exit__(self, exc_type, exc_val, exc_tb) -> None: | |
| """Exit the runtime context and close file handles.""" | |
| self.close() | |
| def __exit__(self, exc_type, exc_val, exc_tb) -> bool: | |
| """ | |
| Exit the runtime context and close file handles. | |
| If an exception occurred inside the context, any exception raised by | |
| close() is suppressed so that the original exception is not masked. | |
| """ | |
| if exc_type is not None: | |
| # An exception is already in progress; do best-effort cleanup | |
| # without masking the original error. | |
| try: | |
| self.close() | |
| except Exception: | |
| # Suppress cleanup errors to preserve the original exception. | |
| return False | |
| # Do not suppress the original exception. | |
| return False | |
| # No exception from the with-block; let close() errors propagate. | |
| self.close() | |
| # Always return False to avoid suppressing exceptions. | |
| return False |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __del__ destructor catches AttributeError for partially initialized objects, but the close() method may raise other exceptions during cleanup (e.g., if threading.Lock() operations fail). Consider catching a broader exception type or being more specific about what can fail during partial initialization. The comment mentions "Object may be partially initialized" but doesn't explain which attributes might be missing.
| self.close() | |
| except AttributeError: | |
| pass # Object may be partially initialized | |
| # During partial initialization or interpreter shutdown, attributes | |
| # such as ``_handles_lock``, ``_all_handles`` or ``_thread_local`` | |
| # may be missing or in an invalid state, and lock operations inside | |
| # ``close()`` may fail. Never let such errors escape from __del__. | |
| self.close() | |
| except Exception: | |
| # Best-effort cleanup; ignore all errors in the destructor. | |
| pass |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _get_thread_local_handle() method lacks a return type annotation. According to the docstring, it returns tifffile.TiffFile or None, which should be annotated as -> Optional[tifffile.TiffFile]. This would improve type checking and code clarity. Note that you'll need to import tifffile at the module level or use a string literal for forward reference.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After close() is called, threads may still have closed TiffFile handles in their thread-local storage. The check only verifies the handle is not None, but doesn't check if the handle is closed. If a thread calls _read_tile() after close() but before the next handle recreation, it will pass a closed handle to read_ome_tiff_tile(), causing a cryptic error. Consider adding a check like and not handle.filehandle.closed or catching and recreating handles that have been closed.
| if ( | |
| hasattr(self._thread_local, "tiff_handle") | |
| and self._thread_local.tiff_handle is not None | |
| ): | |
| return self._thread_local.tiff_handle | |
| if hasattr(self._thread_local, "tiff_handle"): | |
| handle = self._thread_local.tiff_handle | |
| if handle is not None: | |
| # Avoid reusing a closed TiffFile handle (e.g., after close()). | |
| filehandle = getattr(handle, "filehandle", None) | |
| if filehandle is not None and not getattr(filehandle, "closed", False): | |
| return handle |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing tifffile inside the _get_thread_local_handle() method is unnecessary since it's already used elsewhere in the module (e.g., in the I/O functions). The import statement is executed every time a new thread-local handle is created, adding minor overhead. Consider moving this import to the module level along with the other imports, or removing it if tifffile.TiffFile is already accessible through the imported functions.
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -5,7 +5,7 @@ | |||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||||||||||||
| from typing import Dict, List, Tuple, Any | ||||||||||||||||||||||||||||
| from typing import Any, Dict, Optional | ||||||||||||||||||||||||||||
| import xml.etree.ElementTree as ET | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| import numpy as np | ||||||||||||||||||||||||||||
|
|
@@ -31,8 +31,24 @@ def load_ome_tiff_metadata(tiff_path: Path) -> Dict[str, Any]: | |||||||||||||||||||||||||||
| - channels: int | ||||||||||||||||||||||||||||
| - pixel_size: (py, px) | ||||||||||||||||||||||||||||
| - tile_positions: list of (y, x) tuples | ||||||||||||||||||||||||||||
| - tiff_handle: tifffile.TiffFile | ||||||||||||||||||||||||||||
| Open TIFF file handle kept for fast repeated access. The caller is | ||||||||||||||||||||||||||||
| responsible for closing this handle by calling ``tiff_handle.close()`` | ||||||||||||||||||||||||||||
| when it is no longer needed to avoid resource leaks. The handle | ||||||||||||||||||||||||||||
| remains valid until it is explicitly closed, or until a higher-level | ||||||||||||||||||||||||||||
| context manager (if used) closes it on your behalf. | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| .. versionchanged:: 0.2.0 | ||||||||||||||||||||||||||||
| Breaking change: Now returns an open ``tiff_handle`` that requires | ||||||||||||||||||||||||||||
| explicit cleanup. Previously this function returned only data without | ||||||||||||||||||||||||||||
| keeping file handles open. Callers using this function directly must | ||||||||||||||||||||||||||||
| now call ``metadata['tiff_handle'].close()`` when done, or use | ||||||||||||||||||||||||||||
| ``TileFusion`` which handles cleanup automatically. | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
| .. versionchanged:: 0.2.0 | |
| Breaking change: Now returns an open ``tiff_handle`` that requires | |
| explicit cleanup. Previously this function returned only data without | |
| keeping file handles open. Callers using this function directly must | |
| now call ``metadata['tiff_handle'].close()`` when done, or use | |
| ``TileFusion`` which handles cleanup automatically. | |
| Note | |
| ---- | |
| This function returns an open ``tiff_handle`` that requires explicit | |
| cleanup. Previously this function returned only data without keeping | |
| file handles open. Callers using this function directly must call | |
| ``metadata['tiff_handle'].close()`` when done, or use ``TileFusion`` | |
| which handles cleanup automatically. |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TiffFile handle is opened and stored in the returned metadata dictionary, but there's no corresponding mechanism to ensure it gets closed. This can lead to resource leaks if the caller forgets to close it or if an exception occurs before cleanup.
Consider implementing one of these solutions:
- Add a context manager protocol to the TileFusion class (implement enter and exit methods)
- Add a close() method to TileFusion that closes the tiff_handle if present
- Implement del in TileFusion to close the handle on garbage collection (though this is less reliable)
The documentation mentions that "the caller is responsible for closing this handle" but without a clear mechanism in the TileFusion class, this resource will likely leak in practice.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking API change for load_ome_tiff_metadata. The function now returns an open file handle that must be explicitly closed by the caller, changing the contract from a pure data function to one that manages resources. While the change is well-documented and the TileFusion class handles cleanup properly, external callers who import and use this function directly will now be responsible for closing the handle or risk resource leaks. Consider mentioning this breaking change in release notes or migration guides.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handler catches all exceptions, which may hide unrelated errors and make them appear as if they were caused by invalid OME metadata. Consider catching only the specific exceptions related to metadata parsing (e.g., ValueError, ET.ParseError, AttributeError, KeyError) to allow genuine errors to propagate with their original context.
| except Exception: | |
| except (ValueError, ET.ParseError, AttributeError, KeyError): |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handler catches all exceptions too broadly. If an exception occurs after the tiff_handle has been stored in the return dictionary (line 87), the handle will be closed but the partially constructed dictionary with the now-closed handle will not be returned. However, if an exception occurs before line 78 (the return statement starts), the file is properly closed. Consider whether this broad exception handling is needed or if more specific exceptions should be caught.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handling in load_ome_tiff_metadata properly closes the file handle on errors, but this behavior is not tested. Consider adding a test case that triggers an exception (e.g., invalid OME metadata) and verifies the file handle is properly closed, preventing resource leaks.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature uses 'tiff_handle: tifffile.TiffFile = None' which requires the tifffile type to be in scope. Consider using 'Optional[tifffile.TiffFile]' from typing for better clarity and type checking, or document the type annotation convention being used. This makes the optional nature of the parameter more explicit in the function signature.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring states "Warning: A single TiffFile handle is NOT thread-safe for concurrent reads." This is accurate and helpful. However, it would be clearer to also mention that this function creates thread-local handles specifically to address this limitation, so callers don't need to worry about it when using TileFusion.
Consider adding: "TileFusion addresses this by maintaining thread-local handles automatically, so concurrent reads through TileFusion methods are safe."
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cached file handle is accessed using dict.get() which returns None by default. While this is handled correctly by the conditional checks in the read functions, this pattern means users who call these functions directly without the cached handle won't benefit from the performance optimization. Consider adding a note in the function documentation about best practices for repeated tile reads.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a tiff_handle is provided, errors during asarray() do not clean up the handle. If the caller passes an open handle and an exception occurs, the caller may not know the state of the handle. Consider documenting that the caller is responsible for handle cleanup even if this function raises an exception.
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a tiff_handle is provided, errors during asarray() do not clean up the handle. If the caller passes an open handle and an exception occurs, the caller may not know the state of the handle. Consider documenting that the caller is responsible for handle cleanup even if this function raises an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If load_ome_tiff_metadata raises an exception, the tiff_handle in the returned metadata dictionary will not be closed, leading to a resource leak. While load_ome_tiff_metadata has a try-except block to close the handle on error, if it returns successfully but an exception occurs before line 152 (e.g., during dictionary access on line 155), the handle will remain open. Consider wrapping this in a try-except block to ensure the handle is closed even if subsequent operations fail.