Skip to content

Hf pull process-wide cancellation API with terminate handling.#4107

Open
rasapala wants to merge 53 commits intomainfrom
fix_pull_all_dl_failed
Open

Hf pull process-wide cancellation API with terminate handling.#4107
rasapala wants to merge 53 commits intomainfrom
fix_pull_all_dl_failed

Conversation

@rasapala
Copy link
Copy Markdown
Collaborator

@rasapala rasapala commented Apr 1, 2026

🛠 Summary

CVS-CVS-184757

Libgit2 Lfs-filter.c patch changes:

  1. Process-wide cancellation API. Added an exported atomic flag with git_lfs_cancel_set(int) / git_lfs_cancel_get(void) (declared GIT_LFS_CANCEL_EXPORT, backed by git_atomic32), plus a host-supplied probe git_lfs_shutdown_requested(void) that the filter calls to learn whether the embedding application has requested shutdown. This gives OVMS a reliable way to abort in-flight LFS operations from outside libgit2.
  2. Cancellation responsiveness across the LFS pipeline. Cancellation is now polled in the curl progress callbacks and around every long-running step: before/after each curl perform, between resume attempts, and inside the retry sleep loop (so a Ctrl+C during a back-off no longer waits the full interval).
  3. Cancellation coverage extended to both LFS HTTP requests. Both the batch metadata request and the object download request are wired to the progress callback and to explicit abort guards, so cancellation aborts the operation regardless of which phase is in flight.
  4. Error-focused logging with on-demand log file. Switched the patch to an error-focused logger; lfs_error.txt is now created/truncated only when an actual error is emitted, written under the repository workdir. Successful runs no longer leave a stale file behind, which OVMS relies on as a post-clone error signal.
  5. Configurable, interruptible resume. Resume behaviour is preserved but made safer: the number of attempts and the interval between them are configurable via GIT_LFS_RESUME_ATTEMPTS and GIT_LFS_RESUME_INTERVAL_SECONDS (initialized once via INIT_ONCE), and cancel checks are repeated around resume retries and the fallback path.

Ovms Libgit2.cpp Changes:

  1. End-to-end cancellation propagation. Server shutdown signaling is plumbed all the way into the clone and LFS download paths via libgit2::isCloneCancellationRequestedFromServer() and the exported git_lfs_shutdown_requested(void) C entry point, so Ctrl+C aborts both git_clone and ongoing LFS transfers (transfer/sideband/update-tips/checkout-notify callbacks all return -1 on cancel via the shared RETURN_IF_OVMS_CLONE_CANCELLED macro).
  2. Reliable resume after interruption. Resume logic was redesigned to re-trigger the LFS smudge filter for partially downloaded files: it repairs the index entry from the HEAD tree (an aborted clone may leave the index entry missing), removes the stale worktree pointer file, and forces git_checkout_head with GIT_CHECKOUT_FORCE | GIT_CHECKOUT_RECREATE_MISSING for the affected paths. A complementary path restores tracked non-LFS files that were missing after a crash, then normalizes the index against HEAD.
    .lfswip work-in-progress marker. A sibling marker file (.lfswip) is created before clone/resume and removed only on a clean finish. Its presence on a subsequent run indicates a prior interruption and switches the pull into resume mode (including a HEAD-tree scan for tracked files missing from the worktree).
    lfs_error.txt handling. After clone or resume, OVMS reads any lfs_error.txt left by the patch, logs its contents at error level, removes the file, and surfaces a dedicated status to the caller.
  3. Shared shutdown-state module (src/shutdown_state.{hpp,cpp}). New module exposes getShutdownRequestValue / setShutdownRequestValue, signal-safe setSignalShutdownRequested plus a main-thread processSignalShutdownRequest, decoupling pull-module cancellation checks from Server internals and satisfying Bazel dependency boundaries (the pull module no longer needs to depend on the server target).
  4. New status codes & tests.
    HF_GIT_CLONE_CANCELLED and HF_GIT_LIBGIT2_LFS_DOWNLOAD_FAILED added (alongside the existing HF_GIT_LIBGIT2_NOT_INITIALIZED).
  5. New tests in pull_hf_model_test.cpp cover resume after a server-shutdown request (HfPull.ResumeShutdown), resume after SIGKILL/TerminateProcess (HfPull.ResumeTerminate), shutdown-driven cancellation (HfPull.CloneCancelledByShutdownRequest), no-op re-pull, partial-resume, user-removed and user-edited file scenarios on a cached repository (HfPullCache.RePull/Resume/UserRemoved/UserEdited), and the LFS-resume option capture tests.
  6. New tests in libgit2_test.cpp document the .lfswip marker behavior (LibGit2LfsWipMarker.*).
    Third-party libgit2 integration updates. Bumped the upstream libgit2 commit (libgit2_engine.bzl) and significantly expanded lfs.patch.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copilot AI review requested due to automatic review settings April 1, 2026 11:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a safeguard to detect Git LFS pointer files left behind after Hugging Face model downloads (clone or resume) and surfaces a dedicated status code when this occurs.

Changes:

  • Introduces a new StatusCode for LFS download failures.
  • Adds status message mapping for the new status code.
  • Extends HfDownloader::downloadModel() to scan for LFS pointer files after clone/resume and fail fast if any are found.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/status.hpp Adds a new status code for LFS download failure.
src/status.cpp Adds a human-readable message for the new status code.
src/pull_module/libgit2.cpp Detects remaining LFS pointer files after clone/resume and returns the new failure status.

Comment thread src/status.hpp Outdated
Comment thread src/pull_module/libgit2.cpp Outdated
Comment thread src/pull_module/libgit2.cpp Outdated
Comment thread src/pull_module/libgit2.cpp Outdated
@rasapala rasapala requested review from atobiszei and dtrawins April 1, 2026 12:54
Comment thread src/pull_module/libgit2.cpp
Comment thread src/pull_module/libgit2.cpp Outdated
Comment thread src/pull_module/libgit2.cpp Outdated
Comment thread src/pull_module/libgit2.cpp Outdated
Comment thread src/pull_module/libgit2.cpp Outdated
Comment thread src/pull_module/libgit2.cpp
Comment thread src/pull_module/libgit2.cpp
Comment thread src/pull_module/libgit2.cpp Outdated
@atobiszei atobiszei self-requested a review April 30, 2026 11:48
Comment thread src/test/pull_hf_model_test.cpp
Comment thread src/test/pull_hf_model_test.cpp
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

third_party/libgit2/lfs.patch:435

  • The new lfs_filter.c being introduced by this patch starts with a UTF-8 BOM and the license block lines are prefixed with +/ instead of the usual *. The BOM (the invisible character before /*) can break compilation on some toolchains or cause lint/style checks to fail. Please remove the BOM and normalize the comment to a standard C block comment format.
+/*
+/ Copyright 2025 Intel Corporation
+/
+/ Licensed under the Apache License, Version 2.0 (the "License");
+/ you may not use this file except in compliance with the License.

Comment thread src/test/pull_hf_model_test.cpp
Comment thread src/test/pull_hf_model_test.cpp Outdated
Comment thread src/server.cpp
@dkalinowski dkalinowski requested a review from atobiszei April 30, 2026 13:18
@rasapala rasapala changed the title Add check for lfs pointer files Hf pull process-wide cancellation API with terminate handling. May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants