feat(repo): add scripts to synthesize and consume azl repodata#17139
Open
reubeno wants to merge 1 commit into
Open
feat(repo): add scripts to synthesize and consume azl repodata#17139reubeno wants to merge 1 commit into
reubeno wants to merge 1 commit into
Conversation
Contributor
|
This looks good Reuben, thanks! Can you please add a few workflow examples and use cases? Interpreting command line option to what it does took some effort. That could be a me problem, but examples can help quite a bit. |
Adds three new tools under `scripts/repo/` for working with Azure Linux package repositories: * `synthesize-repodata.py` — given one or more upstream repo prefixes (Standard Azure Linux Repo Layout: per-channel main/debuginfo/srpms sub-repos) and/or explicit per-repo overrides, synthesize a fresh set of per-destination repodata trees that route each package to its intended channel based on azldev component metadata. Local repo overrides take precedence over upstream when NEVRAs collide (CLI order is preserved). The output is a static directory tree of standard `createrepo_c` repodata with absolute upstream URLs in package locations, so the synthesized repodata can be served from anywhere without needing to mirror the RPM content. * `dnf-with-azl-repos` — thin wrapper around `dnf` that probes one or more URL prefixes for the Standard Azure Linux Repo Layout, enables every sub-repo it discovers (silently skipping ones that don't exist), and execs `dnf` with those repos added on the command line. * `_repo_layout.py` — shared definition of the Standard Azure Linux Repo Layout (channels, sub-repo kinds, per-kind URL template) consumed by both scripts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds three new scripts under scripts/repo/ for synthesizing Azure Linux per-channel/per-arch repodata trees from upstream RPM repositories and for invoking dnf against discovered Azure Linux repos. The scripts share a common layout definition (_repo_layout.py) that encodes the fixed channel × kind × arch matrix.
Changes:
- New
synthesize-repodata.pythat fetches upstream repodata, queriesazldev package listto assign packages to channels (with sibling-rpm inheritance fallback), and emits routedcreaterepo_crepodata referencing the original upstream URLs. - New
dnf-with-azl-reposwrapper that probes URL prefixes for the standard layout (silently skipping 404s) and execsdnfwith the discovered sub-repos enabled. - New
_repo_layout.pyshared module defining the standardCHANNELS,KIND_*constants, andSUBREPOStable.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| scripts/repo/synthesize-repodata.py | Main synth tool: download repodata, build NEVRA universe, query azldev, decide routing per package, emit per-destination repodata + unpublished/fallback reports. |
| scripts/repo/dnf-with-azl-repos | Thin dnf wrapper: HEAD-probe sub-repos under each --repo-prefix, build --repofrompath/--enablerepo args, exec dnf. |
| scripts/repo/_repo_layout.py | Shared constants/dataclass describing the six standard sub-repos. |
Comment on lines
+315
to
+323
|
|
||
| We pull primary/filelists/other for the package universe AND every | ||
| auxiliary record (updateinfo, group, modules, ...) so phase 6 can | ||
| copy non-package metadata through to routed destinations. | ||
|
|
||
| Returns the path to the dir containing ``repodata/``, or None if | ||
| the repo's ``repomd.xml`` returned 404 and *repo* was prefix-derived | ||
| (silent skip). Other HTTP errors and explicit-origin 404s raise. | ||
| """ |
Comment on lines
+121
to
+159
| """HEAD ``<probe_url>/repodata/repomd.xml``. | ||
|
|
||
| Returns ``(_PROBE_OK, None)`` on 2xx (or successful non-HTTP | ||
| responses such as ``file://``), ``(_PROBE_MISSING, None)`` on 404, | ||
| and ``(_PROBE_FAIL, "...")`` on any other transport error or | ||
| non-2xx HTTP status. The error string is suitable for inclusion in | ||
| a fatal-error message so the user can see the underlying cause. | ||
| """ | ||
| url = f"{probe_url.rstrip('/')}/repodata/repomd.xml" | ||
| req = urllib.request.Request( | ||
| url, method="HEAD", headers={"User-Agent": USER_AGENT} | ||
| ) | ||
| try: | ||
| with urllib.request.urlopen(req, timeout=timeout) as resp: | ||
| # ``status`` is the HTTP status code for http(s); for | ||
| # ``file://`` and other non-HTTP schemes urllib's response | ||
| # has no status attribute -- a successful urlopen there | ||
| # already proved the file exists. | ||
| status = getattr(resp, "status", None) | ||
| if status is None or 200 <= status < 300: | ||
| return _PROBE_OK, None | ||
| return _PROBE_FAIL, f"HTTP {status}" | ||
| except urllib.error.HTTPError as e: | ||
| if e.code == 404: | ||
| return _PROBE_MISSING, None | ||
| return _PROBE_FAIL, f"HTTP {e.code}" | ||
| except urllib.error.URLError as e: | ||
| # urllib wraps a `file://` ENOENT as URLError(FileNotFoundError); | ||
| # treat that as MISSING so local fixtures behave like the HTTP 404 | ||
| # case. | ||
| if isinstance(e.reason, FileNotFoundError): | ||
| return _PROBE_MISSING, None | ||
| return _PROBE_FAIL, f"URL error: {e.reason}" | ||
| except TimeoutError: | ||
| return _PROBE_FAIL, f"timed out after {timeout:.0f}s" | ||
| except OSError as e: | ||
| return _PROBE_FAIL, f"OS error: {e}" | ||
|
|
||
|
|
Comment on lines
+209
to
+211
| if found_here == 0 and not failures: | ||
| log(f"{PROG}: warning: no repos discovered under {prefix_trim}") | ||
| total_found += found_here |
Comment on lines
+199
to
+216
| else: | ||
| # No $basearch: caller is asserting "this URL is for one specific | ||
| # arch". We can't tell which from the URL alone, so we infer from the | ||
| # last path component if it matches a known arch; otherwise refuse. | ||
| # Strip query/fragment first so signed URLs (`...?sig=...`) don't | ||
| # poison the inference. | ||
| parts = urllib.parse.urlsplit(url) | ||
| path = parts.path.rstrip("/") | ||
| last = path.rsplit("/", 1)[-1] if path else "" | ||
| if last in arches: | ||
| out.append(InputRepo(kind, last, url.rstrip("/"), "explicit")) | ||
| else: | ||
| raise ValueError( | ||
| f"--repo {spec!r}: URL has no `$basearch` and its final path " | ||
| f"component {last!r} is not a known arch ({', '.join(arches)}); " | ||
| f"cannot determine arch" | ||
| ) | ||
| return out |
Comment on lines
+358
to
+378
| for record in repomd.records: | ||
| # Only fetch the records we'll actually consume (primary, | ||
| # filelists, other, plus their _db variants). See | ||
| # PACKAGE_RECORD_TYPES above for why we skip aux records. | ||
| if record.type not in PACKAGE_RECORD_TYPES: | ||
| continue | ||
| href = record.location_href or "" | ||
| if not href: | ||
| continue | ||
| url = urllib.parse.urljoin(base, href) | ||
| # Constrain the cache destination path so a hostile/malformed | ||
| # repomd can't write outside cache_dir. | ||
| safe_rel = href.lstrip("/") | ||
| if ".." in Path(safe_rel).parts: | ||
| raise RuntimeError( | ||
| f"refusing to write metadata record outside cache: {href!r}" | ||
| ) | ||
| dest = cache_dir / safe_rel | ||
| log(f" fetching {url}") | ||
| _http_get(url, dest, ssl_context) | ||
| return cache_dir |
Comment on lines
+120
to
+158
| def probe_repo(probe_url: str, *, timeout: float = PROBE_TIMEOUT) -> tuple[str, str | None]: | ||
| """HEAD ``<probe_url>/repodata/repomd.xml``. | ||
|
|
||
| Returns ``(_PROBE_OK, None)`` on 2xx (or successful non-HTTP | ||
| responses such as ``file://``), ``(_PROBE_MISSING, None)`` on 404, | ||
| and ``(_PROBE_FAIL, "...")`` on any other transport error or | ||
| non-2xx HTTP status. The error string is suitable for inclusion in | ||
| a fatal-error message so the user can see the underlying cause. | ||
| """ | ||
| url = f"{probe_url.rstrip('/')}/repodata/repomd.xml" | ||
| req = urllib.request.Request( | ||
| url, method="HEAD", headers={"User-Agent": USER_AGENT} | ||
| ) | ||
| try: | ||
| with urllib.request.urlopen(req, timeout=timeout) as resp: | ||
| # ``status`` is the HTTP status code for http(s); for | ||
| # ``file://`` and other non-HTTP schemes urllib's response | ||
| # has no status attribute -- a successful urlopen there | ||
| # already proved the file exists. | ||
| status = getattr(resp, "status", None) | ||
| if status is None or 200 <= status < 300: | ||
| return _PROBE_OK, None | ||
| return _PROBE_FAIL, f"HTTP {status}" | ||
| except urllib.error.HTTPError as e: | ||
| if e.code == 404: | ||
| return _PROBE_MISSING, None | ||
| return _PROBE_FAIL, f"HTTP {e.code}" | ||
| except urllib.error.URLError as e: | ||
| # urllib wraps a `file://` ENOENT as URLError(FileNotFoundError); | ||
| # treat that as MISSING so local fixtures behave like the HTTP 404 | ||
| # case. | ||
| if isinstance(e.reason, FileNotFoundError): | ||
| return _PROBE_MISSING, None | ||
| return _PROBE_FAIL, f"URL error: {e.reason}" | ||
| except TimeoutError: | ||
| return _PROBE_FAIL, f"timed out after {timeout:.0f}s" | ||
| except OSError as e: | ||
| return _PROBE_FAIL, f"OS error: {e}" | ||
|
|
Comment on lines
+1192
to
+1193
| routing = query_azldev( | ||
| args.repo_root, src_map, output_dir, known_components |
Comment on lines
+1105
to
+1106
| f"Arch to expand `$basearch` into (default: " | ||
| f"{', '.join(DEFAULT_ARCHES)}). Repeatable." |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds three new tools under
scripts/repo/for working with Azure Linux package repositories:synthesize-repodata.py— given one or more upstream repo prefixes (Standard Azure Linux Repo Layout: per-channel main/debuginfo/srpms sub-repos) and/or explicit per-repo overrides, synthesize a fresh set of per-destination repodata trees that route each package to its intended channel based on azldev component metadata. Local repo overrides take precedence over upstream when NEVRAs collide (CLI order is preserved). The output is a static directory tree of standardcreaterepo_crepodata with absolute upstream URLs in package locations, so the synthesized repodata can be served from anywhere without needing to mirror the RPM content.dnf-with-azl-repos— thin wrapper arounddnfthat probes one or more URL prefixes for the Standard Azure Linux Repo Layout, enables every sub-repo it discovers (silently skipping ones that don't exist), and execsdnfwith those repos added on the command line._repo_layout.py— shared definition of the Standard Azure Linux Repo Layout (channels, sub-repo kinds, per-kind URL template) consumed by both scripts.