Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
f9b3fbc
feat(sdk): add sha1_of_source helper for checksum comparison
minitriga Apr 22, 2026
ccbb740
test(sdk): cover sha1_of_source BinaryIO position rewind from non-zer…
minitriga Apr 22, 2026
1e202f4
feat(sdk): add UploadResult dataclass for idempotent uploads
minitriga Apr 22, 2026
3e6bca0
docs(sdk): clarify UploadResult.checksum None cases per review
minitriga Apr 22, 2026
d33ae67
feat(sdk): add async matches_local_checksum to InfrahubNode
minitriga Apr 22, 2026
98f639c
refactor(sdk): apply Task 3 review feedback to matches_local_checksum
minitriga Apr 22, 2026
63f7a39
feat(sdk): add sync matches_local_checksum to InfrahubNodeSync
minitriga Apr 22, 2026
346c6cd
feat(sdk): add async upload_if_changed with SHA-1 idempotency
minitriga Apr 22, 2026
d961adc
refactor(sdk): apply Task 5 review feedback to upload_if_changed
minitriga Apr 22, 2026
b41827f
feat(sdk): add sync upload_if_changed to InfrahubNodeSync
minitriga Apr 22, 2026
b9932c9
feat(sdk): add skip_if_unchanged to async download_file
minitriga Apr 22, 2026
792d50e
fix(sdk): enforce unsaved-node check before skip_if_unchanged short-c…
minitriga Apr 22, 2026
ef6f927
feat(sdk): add skip_if_unchanged to sync download_file
minitriga Apr 22, 2026
1dae7e9
docs(sdk): add changelog fragment for idempotent file operations
minitriga Apr 22, 2026
0bb779e
docs(sdk): clarify UploadResult.checksum docstring and add fixed frag…
minitriga Apr 22, 2026
9bced92
style(sdk): apply ruff format to new idempotent file operations
minitriga Apr 22, 2026
f9b2177
refactor(sdk): rename UploadResult.uploaded to was_uploaded per review
minitriga Apr 23, 2026
402a73b
test(sdk): drop tautological UploadResult frozen test per review
minitriga Apr 23, 2026
3d492c3
docs(sdk): rewrite unsaved-node fix fragment in user-impact terms
minitriga Apr 23, 2026
e21edeb
test(sdk): add positive-path HTTP assertions to upload/download idemp…
minitriga Apr 23, 2026
2640278
docs(sdk): clarify that idempotency checks use cached node checksum
minitriga Apr 23, 2026
917d9f0
docs(sdk): drop pre-release unsaved-node fragment per review
minitriga Apr 24, 2026
1f5b6d6
test(sdk): add HTTP request assertion to upload_if_changed unsaved-no…
minitriga Apr 24, 2026
b9421fd
docs(sdk): regenerate Python SDK API docs for idempotent file ops doc…
minitriga Apr 24, 2026
af4e846
test(sdk): tighten upload_if_changed HTTP assertions to verify multip…
minitriga Apr 24, 2026
78b7d27
test(sdk): drop cross-test reference from upload_if_changed comments
minitriga Apr 27, 2026
fa272b1
Duplicates docstrings instead of sending the user to another docstring
gmazoyer May 7, 2026
d43387b
`_validate_file_object_support` guarantees attr
gmazoyer May 7, 2026
f540e02
Remove assert needs
gmazoyer May 7, 2026
fdc8be5
Apply post-rebase ruff docstring whitespace fixes
gmazoyer May 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions changelog/+idempotent-file-ops.added.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Added SHA-1 idempotency primitives for `CoreFileObject` nodes:

- `InfrahubNode.matches_local_checksum(source)` / sync variant — compare a local `bytes | Path | BinaryIO` source against the node's server-stored checksum without invoking a transfer.
- `InfrahubNode.upload_if_changed(source, name=None)` / sync variant — stage + save only when the local source differs from the server, returning an `UploadResult(was_uploaded, checksum)` dataclass.
- `download_file(..., skip_if_unchanged=True)` — short-circuit the download when `dest` already exists on disk with a matching SHA-1. Returns `0` bytes written when skipped.

A shared `sha1_of_source` helper (streaming, 64 KiB chunks) centralises the hashing convention in `infrahub_sdk.file_handler`.
237 changes: 233 additions & 4 deletions docs/docs/python-sdk/sdk_ref/infrahub_sdk/node/node.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,22 @@ artifact_fetch(self, name: str) -> str | dict[str, Any]
#### `download_file`

```python
download_file(self, dest: Path | None = None) -> bytes | int
download_file(self, dest: None = None, skip_if_unchanged: bool = ...) -> bytes
```

<details>
<summary>Show 2 other overloads</summary>

#### `download_file`

```python
download_file(self, dest: Path, skip_if_unchanged: bool = ...) -> int
```

#### `download_file`

```python
download_file(self, dest: Path | None = None, skip_if_unchanged: bool = False) -> bytes | int
```

Download the file content from this FileObject node.
Expand All @@ -54,16 +69,25 @@ The node must have been saved (have an id) before calling this method.
directly to this path (memory-efficient for large files) and the
number of bytes written will be returned. If not provided, the
file content will be returned as bytes.
- `skip_if_unchanged`: When ``True``, compute the SHA-1 of the file at
``dest`` (which must be provided) and compare against the
node's ``checksum`` attribute. If they match, return ``0``
without hitting the network. The ``checksum`` is the value
loaded when this node was fetched — a later server-side
change to the file will not be detected unless the caller
re-fetches the node first.

**Returns:**

- If ``dest`` is None: The file content as bytes.
- If ``dest`` is provided: The number of bytes written to the file.
- If ``skip_if_unchanged=True`` and the local file matches the server checksum: ``0``.

**Raises:**

- `FeatureNotSupportedError`: If this node doesn't inherit from CoreFileObject.
- `ValueError`: If the node hasn't been saved yet or file not found.
- `ValueError`: If the node hasn't been saved yet, file not found, or
``skip_if_unchanged=True`` was passed without a ``dest``.
- `AuthenticationError`: If authentication fails.

**Examples:**
Expand All @@ -73,8 +97,88 @@ The node must have been saved (have an id) before calling this method.
>>> content = await contract.download_file()
>>> # Stream to file (memory-efficient for large files)
>>> bytes_written = await contract.download_file(dest=Path("/tmp/contract.pdf"))
>>> # Skip download if local file already matches server checksum
>>> bytes_written = await contract.download_file(
... dest=Path("/tmp/contract.pdf"), skip_if_unchanged=True
... )
```

</details>
#### `matches_local_checksum`

```python
matches_local_checksum(self, source: bytes | Path | BinaryIO) -> bool
```

Return True if ``source``'s SHA-1 matches this node's server checksum.

Only available for nodes inheriting from ``CoreFileObject``. Callers
that want to branch on the comparison without invoking a transfer
should use this primitive instead of reading ``node.checksum.value``
and hashing ``source`` themselves, so the hashing convention stays
centralised in the SDK.

The comparison is against the ``checksum`` attribute as loaded
when this node was retrieved from the server. If the server's
file has been replaced since the node was fetched, this method
will not see that change — re-fetch the node to refresh the
checksum before comparing.

**Args:**

- `source`: Local content to hash and compare. Accepts the same
shapes as \:func\:`infrahub_sdk.file_handler.sha1_of_source`.

**Returns:**

- True if the local digest equals the server's stored checksum.

**Raises:**

- `FeatureNotSupportedError`: Node is not a ``CoreFileObject``.
- `ValueError`: Node has no server-side checksum yet (unsaved or
file never attached).

#### `upload_if_changed`

```python
upload_if_changed(self, source: bytes | Path | BinaryIO, name: str | None = None) -> UploadResult
```

Upload ``source`` only if its SHA-1 differs from the server checksum.

Composes :meth:`matches_local_checksum` with :meth:`upload_from_path`
(or :meth:`upload_from_bytes`) and :meth:`save`. For unsaved nodes or
nodes that have no prior server-side file, the upload is always
performed — there is nothing to compare against.

Idempotency is content-only: when the local SHA-1 matches the server
checksum the upload is skipped even if ``name`` differs from the
server-side filename. Use a regular :meth:`upload_from_path` /
:meth:`save` round-trip if you need to rename without changing
content.

**Args:**

- `source`: Content to upload. ``bytes`` and ``BinaryIO`` sources
must supply ``name``; for a ``Path`` the filename is derived
from ``source.name`` when ``name`` is omitted.
- `name`: Filename to use on the server. Required for ``bytes`` /
``BinaryIO`` sources.

**Returns:**

- class:`UploadResult` with ``was_uploaded=False`` (skipped) or
- ``was_uploaded=True`` (transfer occurred), and the resulting server
- checksum (``None`` only when no server checksum was available
- after the operation).

**Raises:**

- `FeatureNotSupportedError`: Node is not a ``CoreFileObject``.
- `ValueError`: ``source`` is ``bytes`` or ``BinaryIO`` and no
``name`` was supplied.

#### `delete`

```python
Expand Down Expand Up @@ -221,7 +325,22 @@ artifact_fetch(self, name: str) -> str | dict[str, Any]
#### `download_file`

```python
download_file(self, dest: Path | None = None) -> bytes | int
download_file(self, dest: None = None, skip_if_unchanged: bool = ...) -> bytes
```

<details>
<summary>Show 2 other overloads</summary>

#### `download_file`

```python
download_file(self, dest: Path, skip_if_unchanged: bool = ...) -> int
```

#### `download_file`

```python
download_file(self, dest: Path | None = None, skip_if_unchanged: bool = False) -> bytes | int
```

Download the file content from this FileObject node.
Expand All @@ -235,16 +354,25 @@ The node must have been saved (have an id) before calling this method.
directly to this path (memory-efficient for large files) and the
number of bytes written will be returned. If not provided, the
file content will be returned as bytes.
- `skip_if_unchanged`: When ``True``, compute the SHA-1 of the file at
``dest`` (which must be provided) and compare against the
node's ``checksum`` attribute. If they match, return ``0``
without hitting the network. The ``checksum`` is the value
loaded when this node was fetched — a later server-side
change to the file will not be detected unless the caller
re-fetches the node first.

**Returns:**

- If ``dest`` is None: The file content as bytes.
- If ``dest`` is provided: The number of bytes written to the file.
- If ``skip_if_unchanged=True`` and the local file matches the server checksum: ``0``.

**Raises:**

- `FeatureNotSupportedError`: If this node doesn't inherit from CoreFileObject.
- `ValueError`: If the node hasn't been saved yet or file not found.
- `ValueError`: If the node hasn't been saved yet, file not found, or
``skip_if_unchanged=True`` was passed without a ``dest``.
- `AuthenticationError`: If authentication fails.

**Examples:**
Expand All @@ -254,8 +382,88 @@ The node must have been saved (have an id) before calling this method.
>>> content = contract.download_file()
>>> # Stream to file (memory-efficient for large files)
>>> bytes_written = contract.download_file(dest=Path("/tmp/contract.pdf"))
>>> # Skip download if local file already matches server checksum
>>> bytes_written = contract.download_file(
... dest=Path("/tmp/contract.pdf"), skip_if_unchanged=True
... )
```

</details>
#### `matches_local_checksum`

```python
matches_local_checksum(self, source: bytes | Path | BinaryIO) -> bool
```

Return True if ``source``'s SHA-1 matches this node's server checksum.

Only available for nodes inheriting from ``CoreFileObject``. Callers
that want to branch on the comparison without invoking a transfer
should use this primitive instead of reading ``node.checksum.value``
and hashing ``source`` themselves, so the hashing convention stays
centralised in the SDK.

The comparison is against the ``checksum`` attribute as loaded
when this node was retrieved from the server. If the server's
file has been replaced since the node was fetched, this method
will not see that change — re-fetch the node to refresh the
checksum before comparing.

**Args:**

- `source`: Local content to hash and compare. Accepts the same
shapes as \:func\:`infrahub_sdk.file_handler.sha1_of_source`.

**Returns:**

- True if the local digest equals the server's stored checksum.

**Raises:**

- `FeatureNotSupportedError`: Node is not a ``CoreFileObject``.
- `ValueError`: Node has no server-side checksum yet (unsaved or
file never attached).

#### `upload_if_changed`

```python
upload_if_changed(self, source: bytes | Path | BinaryIO, name: str | None = None) -> UploadResult
```

Upload ``source`` only if its SHA-1 differs from the server checksum.

Composes :meth:`matches_local_checksum` with :meth:`upload_from_path`
(or :meth:`upload_from_bytes`) and :meth:`save`. For unsaved nodes or
nodes that have no prior server-side file, the upload is always
performed — there is nothing to compare against.

Idempotency is content-only: when the local SHA-1 matches the server
checksum the upload is skipped even if ``name`` differs from the
server-side filename. Use a regular :meth:`upload_from_path` /
:meth:`save` round-trip if you need to rename without changing
content.

**Args:**

- `source`: Content to upload. ``bytes`` and ``BinaryIO`` sources
must supply ``name``; for a ``Path`` the filename is derived
from ``source.name`` when ``name`` is omitted.
- `name`: Filename to use on the server. Required for ``bytes`` /
``BinaryIO`` sources.

**Returns:**

- class:`UploadResult` with ``was_uploaded=False`` (skipped) or
- ``was_uploaded=True`` (transfer occurred), and the resulting server
- checksum (``None`` only when no server checksum was available
- after the operation).

**Raises:**

- `FeatureNotSupportedError`: Node is not a ``CoreFileObject``.
- `ValueError`: ``source`` is ``bytes`` or ``BinaryIO`` and no
``name`` was supplied.

#### `delete`

```python
Expand Down Expand Up @@ -369,6 +577,27 @@ extract(self, params: dict[str, str]) -> dict[str, Any]

Extract some data points defined in a flat notation.

### `UploadResult`

Outcome of an idempotent upload attempt.

Returned by :meth:`InfrahubNode.upload_if_changed` and its sync twin.
``was_uploaded`` tells the caller whether a network transfer actually
happened; ``checksum`` carries the SHA-1 of the content held on the
server after the operation — on skip paths that is the server's
pre-existing value, on upload paths it is the locally-computed SHA-1
used as a proxy (which matches what a standard CoreFileObject server
stores, since the server computes SHA-1 of received bytes). ``None``
only when no server checksum was available (either the node was
unsaved and nothing was transferred, or the save returned no checksum
value).

The comparison used by ``upload_if_changed`` reads the node's
``checksum`` attribute, which was populated when the node was
fetched via ``client.get(...)``. A server-side change to the file
between the fetch and the call will not be detected unless the
caller re-fetches the node first.

### `InfrahubNodeBase`

Base class for InfrahubNode and InfrahubNodeSync
Expand Down
48 changes: 48 additions & 0 deletions infrahub_sdk/file_handler.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations

import hashlib
from dataclasses import dataclass
from io import BytesIO
from pathlib import Path
Expand All @@ -13,6 +14,53 @@
if TYPE_CHECKING:
from .client import InfrahubClient, InfrahubClientSync

_SHA1_CHUNK_BYTES = 64 * 1024


def sha1_of_source(source: bytes | Path | BinaryIO) -> str:
"""Compute the SHA-1 hex digest of an upload/download source.

Accepts the same shapes as :meth:`FileHandlerBase.prepare_upload` so
callers can compare local content against a server-stored checksum
without materialising the full file in memory.

Args:
source: The content to hash. ``bytes`` are hashed in one shot.
A ``Path`` is read in 64 KiB chunks. A ``BinaryIO`` is read
from its current position, then rewound so downstream
callers can re-read it.

Returns:
Lowercase SHA-1 hex digest, matching the algorithm Infrahub
stores in ``CoreFileObject.checksum``.

Raises:
TypeError: If ``source`` is not one of the supported types.

"""
hasher = hashlib.sha1(usedforsecurity=False)

if isinstance(source, bytes):
hasher.update(source)
return hasher.hexdigest()

if isinstance(source, Path):
with source.open("rb") as fh:
while chunk := fh.read(_SHA1_CHUNK_BYTES):
hasher.update(chunk)
return hasher.hexdigest()

if hasattr(source, "read") and hasattr(source, "seek"):
start = source.tell()
try:
while chunk := source.read(_SHA1_CHUNK_BYTES):
hasher.update(chunk)
finally:
source.seek(start)
return hasher.hexdigest()

raise TypeError(f"sha1_of_source expects bytes, Path, or BinaryIO; got {type(source).__name__}")


@dataclass
class PreparedFile:
Expand Down
Loading