Skip to content

Add unit tests and fix bugs in upstream_tools.py#541

Open
opohorel wants to merge 8 commits into
packit:mainfrom
opohorel:upstream_tools_tests
Open

Add unit tests and fix bugs in upstream_tools.py#541
opohorel wants to merge 8 commits into
packit:mainfrom
opohorel:upstream_tools_tests

Conversation

@opohorel
Copy link
Copy Markdown
Collaborator

@opohorel opohorel commented May 29, 2026

Adds comprehensive unit tests for ExtractUpstreamRepositoryTool URL parsing and related tools, exposing and fixing several latent bugs:

Bug fixes

  • rstrip('.patch') corrupting target refsrstrip removes individual characters, not a suffix. A ref like v1.2.3 would lose the trailing 3. Replaced with removesuffix('.patch').
  • GitLab MR URLs with nested project paths — The regex only captured owner/repo, failing on paths like namespace/sub/project. Now captures the full project path. Also makes /-/ optional and strips .git suffixes to prevent double extensions.
  • GitLab compare URLs with nested project paths — Same nested path issue for /compare/ URLs. Also uses domain-based detection (github in netloc) instead of fragile /-/ path check to distinguish GitHub from GitLab.
  • cgit p= query param parsing — The regex required ? or & before p=, failing when p= was the first (and only) query parameter.
  • kernel.org cgit URL crash — cgit URLs with the repo in the path (e.g. /pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=...) crashed with an unbound variable because no p= parameter exists. Falls back to extracting the repo path from the URL path.
  • CherryPickContinueTool crash on clean working directorygit status --porcelain returns empty output for a clean working directory, but the tool treated empty stdout as an error, preventing the CHERRY_PICK_HEAD check from ever running.

Tests

  • 37 unit tests covering GitHub, GitLab, cgit, kernel.org, Pagure URL parsing
  • Tests for CherryPickContinueTool and ApplyDownstreamPatchesTool
  • Async tool execution with mocked subprocess and HTTP calls

opohorel added 2 commits May 29, 2026 13:23
Cover all 6 tool classes: ExtractUpstreamRepositoryTool (URL parsing for
GitHub/GitLab commits, PRs, MRs, compare URLs, cgit), CloneUpstreamRepositoryTool,
FindBaseCommitTool, ApplyDownstreamPatchesTool, CherryPickCommitTool, and
CherryPickContinueTool.

Includes tests that expose existing issues with rstrip('.patch'), nested
GitLab path regex, and cgit query param parsing.

Assisted-by: Cursor (Claude)
rstrip() removes characters from a set, not a literal suffix.
A target_ref like "some-branch-path" would be mangled to "some-branch-"
because 'p','a','t','c','h','.' are all in the character set.

Use removesuffix() which only strips the exact literal suffix.

Assisted-by: Cursor (Claude)
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive unit test suite for upstream tools and refactors URL parsing logic in upstream_tools.py to support nested project paths (such as in GitLab) and improve cgit-style URL extraction. Feedback on the changes suggests several robustness improvements: addressing a bug in CherryPickContinueTool where a clean git status incorrectly raises an error, making the /-/ separator optional in GitLab MR URLs, stripping .git suffixes from matched project paths to prevent double extensions, and anchoring the regular expression for cgit-style repository path matching.

Comment thread ymir/tools/unprivileged/tests/unit/test_upstream_tools.py Outdated
Comment thread ymir/tools/unprivileged/upstream_tools.py
Comment thread ymir/tools/unprivileged/upstream_tools.py
Comment thread ymir/tools/unprivileged/upstream_tools.py Outdated
Comment thread ymir/tools/unprivileged/upstream_tools.py Outdated
The regex captured only the last two path segments as owner/repo.
For deeply nested GitLab paths like "redhat/centos-stream/rpms/bind",
it extracted owner="rpms", repo="bind" which produced wrong API URLs.

Use a greedy path capture to get the full project path before /-/.

Assisted-by: Cursor (Claude)
@opohorel opohorel force-pushed the upstream_tools_tests branch from 0d4dd4d to a41f075 Compare May 29, 2026 11:38
@opohorel
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive unit test suite for upstream tools and improves URL parsing in upstream_tools.py to support deeply nested GitLab project paths and cgit-style repository paths. It also removes an unnecessary stdout check during cherry-pick continuation. Feedback suggests making the distinction between GitHub and GitLab compare URLs more robust by checking the domain name instead of relying on the absence of /-/ in the URL path.

Comment thread ymir/tools/unprivileged/upstream_tools.py Outdated
opohorel added 5 commits May 29, 2026 13:49
Same issue as the MR regex: the compare regex only captured the last
two path segments as owner/repo. For deeply nested GitLab paths it
would construct wrong API URLs.

Use a single greedy capture for the full project path.

Assisted-by: Cursor (Claude)
urlparse strips the leading ? from query strings, so the regex
[?&]p= never matched p= at the start. Use (?:^|[?&])p= to also
match at the beginning of the string.

Assisted-by: Cursor (Claude)
URLs like git.kernel.org/.../linux.git/commit/?id=<hash> have the repo
path in the URL path and the commit hash in ?id=. The query param
handler only looked for ?p= to find the repo, so these URLs crashed
with "Could not extract repository path".

Fall back to extracting the repo path from the URL path by matching
everything up to .git when ?p= is absent.

Assisted-by: Cursor (Claude)
Verify that the constructed API URLs contain the correctly
percent-encoded full project path (redhat%2Fcentos-stream%2Frpms%2Fbind)
rather than just checking the output repo_url.

Assisted-by: Cursor (Claude)
git status --porcelain returns empty output for a clean working
directory (exit_code=0, stdout=""). The tool incorrectly treated
None/empty stdout as an error, preventing the cherry-pick state
check from ever reaching the CHERRY_PICK_HEAD validation.

Assisted-by: Cursor (Claude)
@opohorel opohorel force-pushed the upstream_tools_tests branch from a41f075 to 822bb1b Compare May 29, 2026 11:50
Copy link
Copy Markdown
Member

@nforro nforro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,868 @@
import subprocess
from unittest.mock import AsyncMock, MagicMock, patch
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to use flexmock? Combining different mocking solutions is never a good idea.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at that point only agents would be able to touch those tests; no sane human would be able to deal with multiple mocking mechanisms 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants