Skip to content
Open
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
be1c79b
Closes #865 - Add get_changed_items() utility for git-based selective…
vipulb91 Mar 19, 2026
27a75df
docs: add token_credential to FabricWorkspace example in optional_fea…
vipulb91 Apr 13, 2026
21e6179
docs: add append_feature_flag calls above workspace init in example
vipulb91 Apr 13, 2026
374900f
refactor: move get_changed_items logic to _common/_git_utils.py
vipulb91 Apr 13, 2026
d14ff49
security: add validate_git_compare_ref to prevent git flag injection
vipulb91 Apr 13, 2026
754457b
fix: catch subprocess.TimeoutExpired in git diff call
vipulb91 Apr 13, 2026
34ce0fa
fix: add -- separator to git diff command to prevent flag injection
vipulb91 Apr 13, 2026
e63a481
fix: add 30s timeout to subprocess.run() calls in git_utils
vipulb91 Apr 13, 2026
a498bb0
security: add _resolve_git_diff_path() for path validation in git dif…
vipulb91 Apr 13, 2026
1756bb5
Merge branch 'main' into feature/get-changed-items
vipulb91 Apr 13, 2026
99c8289
Merge branch 'main' into feature/get-changed-items
shirasassoon Apr 13, 2026
66b9fca
Merge branch 'main' into feature/get-changed-items
shirasassoon Apr 15, 2026
a3f8632
fix: add underscore to allowed chars in validate_git_compare_ref regex
vipulb91 Apr 16, 2026
fc263b5
fix: remove redundant -- separator from git diff as ref is already va…
vipulb91 Apr 16, 2026
e9b30b8
docs: add token_credential to FabricWorkspace example in get_changed_…
vipulb91 Apr 16, 2026
17fea6f
docs: clarify feature flag requirement for items_to_include in option…
vipulb91 Apr 16, 2026
ce0f687
refactor: rename _git_utils.py to _git_diff_utils.py for specificity
vipulb91 Apr 16, 2026
178573e
docs: add override note to get_items_to_publish docstring
vipulb91 Apr 16, 2026
1c17387
refactor: remove redundant get_changed_items import from publish.py
vipulb91 Apr 16, 2026
543bcfa
test: move get_changed_items tests to test_git_diff_utils.py and add …
vipulb91 Apr 16, 2026
17ac12d
test: add stronger test coverage for git diff utils and validate_git_…
vipulb91 Apr 16, 2026
41be77b
docs: add token_credential to FabricWorkspace example in publish_all_…
vipulb91 Apr 16, 2026
b8bdbda
fix: remove unused import and fix import ordering in test_git_diff_utils
vipulb91 Apr 16, 2026
b9459e5
Merge branch 'main' into feature/get-changed-items
shirasassoon Apr 16, 2026
5d2ea14
fix: assert exact git diff argv list in test_uses_custom_git_compare_ref
vipulb91 Apr 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .changes/unreleased/added-20260319-095618.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
kind: added
body: Add `get_changed_items()` utility function to detect Fabric items changed via git diff for use with selective deployment
time: 2026-03-19T09:56:18.0000000+00:00
custom:
Author: vipulb91
AuthorLink: https://github.com/vipulb91
Issue: "865"
IssueLink: https://github.com/microsoft/fabric-cicd/issues/865

37 changes: 37 additions & 0 deletions docs/how_to/optional_feature.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,43 @@ Shortcuts are items associated with Lakehouse items and can be selectively publi

**Note:** This feature can be applied along with the other selective deployment features — please be cautious when using to avoid unexpected results.

## Git-Based Change Detection

`get_changed_items()` is a public utility function that uses `git diff` to detect which Fabric items have been added, modified, or renamed relative to a given git reference. It returns a list of strings in `"item_name.item_type"` format that can be passed directly to `items_to_include` in `publish_all_items()`.

While `get_changed_items()` itself requires no feature flags, passing its output to `items_to_include` requires the experimental feature flags.

**Important:** If `get_changed_items()` returns an empty list (no changes detected), do not call `publish_all_items()` without an explicit `items_to_include` list, as this would default to a full deployment. Always guard against the empty-list case:

```python
from fabric_cicd import FabricWorkspace, publish_all_items, get_changed_items, append_feature_flag

append_feature_flag("enable_experimental_features")
append_feature_flag("enable_items_to_include")

Comment thread
vipulb91 marked this conversation as resolved.
workspace = FabricWorkspace(
workspace_id="your-workspace-id",
repository_directory="/path/to/repo",
item_type_in_scope=["Notebook", "DataPipeline"],
token_credential=token_credential,
)

changed = get_changed_items(workspace.repository_directory)

if changed:
publish_all_items(workspace, items_to_include=changed)
else:
print("No changed items detected — skipping deployment.")
```

To compare against a branch or a specific commit instead of the previous commit, pass a custom `git_compare_ref`:

```python
changed = get_changed_items(workspace.repository_directory, git_compare_ref="main")
```

**Note:** `get_changed_items()` returns only items that were **modified or added** (i.e., candidates for publishing). It does not return deleted items. Passing `items_to_include` to `publish_all_items()` requires enabling the `enable_experimental_features` and `enable_items_to_include` feature flags.

## Debugging

If an error arises, or you want full transparency to all calls being made outside the library, enable debugging. Enabling debugging will write all API calls to the terminal. The logs can also be found in the `fabric_cicd.error.log` file.
Expand Down
2 changes: 2 additions & 0 deletions src/fabric_cicd/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import fabric_cicd.constants as constants
from fabric_cicd._common._check_utils import check_version
from fabric_cicd._common._deployment_result import DeploymentResult, DeploymentStatus
from fabric_cicd._common._git_diff_utils import get_changed_items
from fabric_cicd._common._logging import configure_logger, exception_handler, get_file_handler
from fabric_cicd.constants import FeatureFlag, ItemType
from fabric_cicd.fabric_workspace import FabricWorkspace
Expand Down Expand Up @@ -148,6 +149,7 @@ def disable_file_logging() -> None:
"configure_external_file_logging",
"deploy_with_config",
"disable_file_logging",
"get_changed_items",
"publish_all_items",
"unpublish_all_orphan_items",
]
235 changes: 235 additions & 0 deletions src/fabric_cicd/_common/_git_diff_utils.py
Comment thread
vipulb91 marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,235 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

"""Utility functions for detecting Fabric items changed via git diff."""

import json
import logging
import subprocess
from pathlib import Path
from typing import Optional

logger = logging.getLogger(__name__)


def _find_platform_item(file_path: Path, repo_root: Path) -> Optional[tuple[str, str]]:
"""
Walk up from file_path towards repo_root looking for a .platform file.

The .platform file marks the boundary of a Fabric item directory.
Its JSON content contains ``metadata.type`` (item type) and
``metadata.displayName`` (item name).

Returns:
A ``(item_name, item_type)`` tuple, or ``None`` if not found.
"""
current = file_path.parent
while True:
platform_file = current / ".platform"
if platform_file.exists():
try:
data = json.loads(platform_file.read_text(encoding="utf-8"))
metadata = data.get("metadata", {})
item_type = metadata.get("type")
item_name = metadata.get("displayName") or current.name
if item_type:
return item_name, item_type
except Exception as exc:
logger.debug(f"Could not parse .platform file at '{platform_file}': {exc}")
# Stop if we have reached the repository root or the filesystem root
if current == repo_root or current == current.parent:
break
current = current.parent
return None


def _resolve_git_diff_path(
file_path_str: str,
git_root: Path,
repository_directory: Path,
) -> Optional[Path]:
"""
Resolve and validate a file path from git diff output.

Follows the same resolve → boundary-check → reject contract as
``_resolve_file_path`` in ``_parameter/_utils.py``, adapted for
paths that are relative to a git root with containment checked
against a (potentially different) repository subdirectory.

Args:
file_path_str: Relative path string from git diff output.
git_root: Resolved absolute path of the git repository root.
repository_directory: Resolved absolute path of the configured
repository directory (may be a subdirectory of git_root).

Returns:
Resolved absolute Path if valid and within boundary, None otherwise.
"""
raw_path = Path(file_path_str)

# Reject absolute paths — git diff should only produce relative paths
if raw_path.is_absolute():
logger.debug(f"get_changed_items: skipping absolute path '{file_path_str}'")
return None

# Reject traversal sequences before resolution (mirrors _validate_wildcard_syntax)
if ".." in raw_path.parts:
logger.debug(f"get_changed_items: skipping path with traversal '{file_path_str}'")
return None

# Reject null bytes
if "\x00" in file_path_str:
logger.debug("get_changed_items: skipping path with null bytes")
return None

# Step 1: Resolve relative to git root (analogous to _resolve_file_path Step 1)
resolved_path = (git_root / file_path_str).resolve()

# Step 2: Boundary check against repository_directory (analogous to _resolve_file_path Step 2)
try:
resolved_path.relative_to(repository_directory)
except ValueError:
return None

# Note: No Step 3 (existence check) — deleted files won't exist on disk
return resolved_path


def get_changed_items(
repository_directory: Path,
git_compare_ref: str = "HEAD~1",
) -> list[str]:
"""
Return the list of Fabric items that were added, modified, or renamed relative to ``git_compare_ref``.

The returned list is in ``"item_name.item_type"`` format and can be passed directly
to the ``items_to_include`` parameter of :func:`publish_all_items` to deploy only
what has changed since the last commit.

Args:
repository_directory: Path to the local git repository directory
(e.g. ``FabricWorkspace.repository_directory``).
git_compare_ref: Git ref to compare against. Defaults to ``"HEAD~1"``.

Returns:
List of strings in ``"item_name.item_type"`` format. Returns an empty list when
no changes are detected, the git root cannot be found, or git is unavailable.

Examples:
Deploy only changed items
>>> from azure.identity import AzureCliCredential
>>> from fabric_cicd import FabricWorkspace, publish_all_items, get_changed_items
>>> workspace = FabricWorkspace(
... workspace_id="your-workspace-id",
... repository_directory="/path/to/repo",
... item_type_in_scope=["Notebook", "DataPipeline"],
... token_credential=AzureCliCredential()
... )
>>> changed = get_changed_items(workspace.repository_directory)
>>> if changed:
... publish_all_items(workspace, items_to_include=changed)

With a custom git ref
>>> changed = get_changed_items(workspace.repository_directory, git_compare_ref="main")
>>> if changed:
... publish_all_items(workspace, items_to_include=changed)
"""
changed, _ = _resolve_changed_items(Path(repository_directory), git_compare_ref)
return changed


def _resolve_changed_items(
repository_directory: Path,
git_compare_ref: str,
) -> tuple[list[str], list[str]]:
"""
Use ``git diff --name-status`` to detect Fabric items that changed or were
deleted relative to *git_compare_ref*.

Args:
repository_directory: Absolute path to the local repository directory
(as stored on ``FabricWorkspace.repository_directory``).
git_compare_ref: Git ref to diff against (e.g. ``"HEAD~1"``).

Returns:
A two-element tuple ``(changed_items, deleted_items)`` where each
element is a list of strings in ``"item_name.item_type"`` format.
Both lists are empty when the git root cannot be found or git fails.
"""
from fabric_cicd._common._config_validator import _find_git_root
from fabric_cicd._common._validate_input import validate_git_compare_ref

validate_git_compare_ref(git_compare_ref)

git_root = _find_git_root(repository_directory)
if git_root is None:
logger.warning("get_changed_items: could not locate a git repository root — returning empty list.")
return [], []

try:
result = subprocess.run(
["git", "diff", "--name-status", git_compare_ref],
cwd=str(git_root),
Comment thread
vipulb91 marked this conversation as resolved.
capture_output=True,
text=True,
check=True,
timeout=30,
)
except subprocess.CalledProcessError as exc:
logger.warning(f"get_changed_items: 'git diff' failed ({exc.stderr.strip()}) — returning empty list.")
return [], []
except subprocess.TimeoutExpired:
logger.warning("get_changed_items: 'git diff' timed out — returning empty list.")
return [], []

changed_items: set[str] = set()
deleted_items: set[str] = set()

git_root_resolved = git_root.resolve()
repo_dir_resolved = repository_directory.resolve()

for line in result.stdout.splitlines():
line = line.strip()
if not line:
continue

parts = line.split("\t")
status = parts[0].strip()

# Renames produce three tab-separated fields: R<score>\told\tnew
if status.startswith("R") and len(parts) >= 3:
file_path_str = parts[2]
elif len(parts) >= 2:
file_path_str = parts[1]
else:
continue

abs_path = _resolve_git_diff_path(file_path_str, git_root_resolved, repo_dir_resolved)
if abs_path is None:
continue

if status == "D":
if abs_path.name == ".platform":
try:
show_result = subprocess.run(
["git", "show", f"{git_compare_ref}:{file_path_str}"],
cwd=str(git_root_resolved),
capture_output=True,
text=True,
check=True,
timeout=30,
)
data = json.loads(show_result.stdout)
metadata = data.get("metadata", {})
item_type = metadata.get("type")
item_name = metadata.get("displayName") or abs_path.parent.name
if item_type and item_name:
deleted_items.add(f"{item_name}.{item_type}")
except Exception as exc:
logger.debug(f"get_changed_items: could not read deleted .platform '{file_path_str}': {exc}")
else:
item_info = _find_platform_item(abs_path, repo_dir_resolved)
if item_info:
changed_items.add(f"{item_info[0]}.{item_info[1]}")

return list(changed_items), list(deleted_items)
28 changes: 28 additions & 0 deletions src/fabric_cicd/_common/_validate_input.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,3 +278,31 @@ def validate_shortcut_exclude_regex(shortcut_exclude_regex: Optional[str]) -> No
warning_message="Shortcut exclusion is enabled.",
risk_warning="Using shortcut_exclude_regex will selectively exclude shortcuts from being deployed to lakehouses. Use with caution.",
)


def validate_git_compare_ref(git_compare_ref: str) -> str:
"""
Validate the git_compare_ref parameter to prevent git flag injection.

Args:
git_compare_ref: The git ref to compare against.

Raises:
InputError: If the ref is empty, starts with '-', or contains invalid characters.
"""
validate_data_type("string", "git_compare_ref", git_compare_ref)

if not git_compare_ref.strip():
msg = "git_compare_ref must not be an empty string."
raise InputError(msg, logger)

if git_compare_ref.startswith("-"):
msg = "git_compare_ref must not start with '-' to prevent git flag injection."
raise InputError(msg, logger)

# Allow only characters valid in git refs: alphanumeric, /, ., ~, ^, -, _
Comment thread
vipulb91 marked this conversation as resolved.
if not re.match(r"^[a-zA-Z0-9/_.\-~^@{}]+$", git_compare_ref):
msg = f"git_compare_ref '{git_compare_ref}' contains invalid characters."
raise InputError(msg, logger)

return git_compare_ref
21 changes: 18 additions & 3 deletions src/fabric_cicd/_items/_base_publisher.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,11 +357,26 @@ def get_items_to_publish(self) -> dict[str, "Item"]:
Get the items to publish for this item type.

Returns:
Dictionary mapping item names to Item objects.
Dictionary mapping item names to Item objects, pre-filtered by
items_to_include when set so that only relevant items are iterated.

Subclasses can override to filter or transform the items.
"""
return self.fabric_workspace_obj.repository_items.get(self.item_type, {})

Note:
The base implementation applies ``FabricWorkspace.items_to_include`` filtering.
To override this method and preserve this behavior, call ``super().get_items_to_publish()``
to keep ``items_to_include`` support, then apply any additional selection logic.
"""
Comment thread
vipulb91 marked this conversation as resolved.
all_items = self.fabric_workspace_obj.repository_items.get(self.item_type, {})
items_to_include = self.fabric_workspace_obj.items_to_include
if not items_to_include:
return all_items
normalized_include_set = {i.lower() for i in items_to_include}
return {
name: item
for name, item in all_items.items()
if f"{name}.{self.item_type}".lower() in normalized_include_set
}

def get_unpublish_order(self, items_to_unpublish: list[str]) -> list[str]:
"""
Expand Down
14 changes: 14 additions & 0 deletions src/fabric_cicd/publish.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,19 @@ def publish_all_items(
>>> # Access individual item response (dict with "header", "body", "status_code" keys)
>>> notebook_response = workspace.responses["Notebook"]["Hello World"]
>>> print(notebook_response["status_code"]) # e.g., 200

With get_changed_items (deploy only git-changed items)
>>> from fabric_cicd import FabricWorkspace, publish_all_items, get_changed_items
>>> from azure.identity import AzureCliCredential
>>> workspace = FabricWorkspace(
... workspace_id="your-workspace-id",
... repository_directory="/path/to/repo",
... item_type_in_scope=["Notebook", "DataPipeline"],
... token_credential=AzureCliCredential() # or any other TokenCredential
... )
Comment thread
vipulb91 marked this conversation as resolved.
>>> changed = get_changed_items(workspace.repository_directory)
>>> if changed:
... publish_all_items(workspace, items_to_include=changed)
"""
fabric_workspace_obj = validate_fabric_workspace_obj(fabric_workspace_obj)
responses_enabled = FeatureFlag.ENABLE_RESPONSE_COLLECTION.value in constants.FEATURE_FLAG
Expand Down Expand Up @@ -318,6 +331,7 @@ def unpublish_all_orphan_items(
>>> print(notebook_response["status_code"]) # e.g., 200
"""
fabric_workspace_obj = validate_fabric_workspace_obj(fabric_workspace_obj)

validate_items_to_include(items_to_include, operation=constants.OperationType.UNPUBLISH)

responses_enabled = FeatureFlag.ENABLE_RESPONSE_COLLECTION.value in constants.FEATURE_FLAG
Expand Down
Loading
Loading