-
Notifications
You must be signed in to change notification settings - Fork 83
[3/n] [ls-apis] track sled-agent-rack-setup as a separate subcomponent #10459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sunshowers/spr/main.3n-ls-apis-track-sled-agent-rack-setup-as-a-separate-subcomponent
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -149,6 +149,62 @@ This tool basically works as follows: | |
| . Then, it loads and validates the developer-maintained metadata (`api-manifest.toml`). | ||
| . Then, it applies whatever filter has been selected and prints out whatever information was asked for. | ||
|
|
||
| === Deployment units | ||
|
|
||
| The upgrade DAG's nodes are _deployment units_: sets of Rust packages that are always shipped and updated together. For example, Sled Agent, Propolis, and the switch-zone components all update as one unit (the host OS image), so they form a single deployment unit. | ||
|
|
||
| Deployment units and their packages are declared in `api-manifest.toml`: | ||
|
|
||
| ```toml | ||
| [[deployment_units]] | ||
| name = "Host OS" | ||
| packages = [ | ||
| "omicron-sled-agent", | ||
| "propolis-server", | ||
| # ...switch zone components... | ||
| ] | ||
| ``` | ||
|
|
||
| Each entry in `packages` is a _top-level package_: a Rust binary shipped in that deployment unit. ls-apis walks each top-level package's Cargo dependencies to discover the APIs it produces and consumes. | ||
|
|
||
| ==== Subcomponents | ||
|
|
||
| Sometimes one binary contains a body of code whose API dependencies are best tracked separately from the rest of it. The motivating case is the Rack Setup Service (RSS): it lives in the `sled-agent-rack-setup` library crate, which is linked into the `omicron-sled-agent` binary, but it runs only once (at rack initialization) and has different semantics from the rest of the Sled Agent. | ||
|
|
||
| A _subcomponent_ is such a library, tracked as an API consumer in its own right: | ||
|
|
||
| ```toml | ||
| [[deployment_units]] | ||
| name = "Host OS" | ||
| packages = [ "omicron-sled-agent", ... ] | ||
| subcomponents = [ | ||
| { name = "sled-agent-rack-setup", inside = "omicron-sled-agent", lifecycle = "rack-init" }, | ||
| ] | ||
| ``` | ||
|
|
||
| `inside` names the top-level package (in the same deployment unit) that links the subcomponent. When ls-apis walks that package's dependencies, the subcomponent's own crate is skipped, so any API dependency reachable only through the subcomponent is attributed to the subcomponent instead of to the containing binary. (A dependency the binary also reaches by another path is attributed to both.) The subcomponent is then walked separately, as its own consumer. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Optional nit. What about |
||
|
|
||
| A subcomponent is assumed not to host any Dropshot APIs of its own, so subcomponents are not walked as API producers. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would happen if it did host a Dropshot API of its own?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we'd want to update the ls-apis tooling at that point. ls-apis is trying to match the complexity of the Oxide system over being too generic, so in general we've built features as we've needed them. |
||
|
|
||
| ==== Lifecycle | ||
|
|
||
| A component's `lifecycle` describes when its code runs: | ||
|
|
||
| [cols="1,3",options="header"] | ||
| |=== | ||
| |Lifecycle | ||
| |Meaning | ||
|
|
||
| |`steady-state` | ||
| |The component runs during normal operation. Its API dependencies are part of the upgrade DAG. This is the default. | ||
|
|
||
| |`rack-init` | ||
| |The component runs only at rack initialization, when the whole rack is at a single consistent version. It therefore cannot experience version skew during an online upgrade, so its API dependencies are excluded from the upgrade DAG and from cycle checks. | ||
|
|
||
| |=== | ||
|
|
||
| `lifecycle` may currently be set only on a subcomponent; every top-level package is treated as `steady-state`. | ||
|
|
||
| The filtering is a little complicated but very important! | ||
|
|
||
| === The purpose of filtering | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -80,6 +80,21 @@ packages = [ | |
| "tfportd", | ||
| "wicketd", | ||
| ] | ||
| # sled-agent-rack-setup is a library inside the omicron-sled-agent binary. It is | ||
| # listed here as a subcomponent so its API dependencies (which only matter | ||
| # during RSS) are reported separately from the steady-state dependencies of | ||
| # omicron-sled-agent. Setting `inside = "omicron-sled-agent"` means that when | ||
| # ls-apis walks omicron-sled-agent's Cargo dependencies, sled-agent-rack-setup | ||
| # itself is skipped, so API dependencies reachable only through it are | ||
| # attributed to the subcomponent rather than to omicron-sled-agent. | ||
| # | ||
| # The `rack-init` lifecycle marks this subcomponent as running only during rack | ||
| # initialization. Edges from `rack-init` components are excluded from the | ||
| # upgrade DAG, so that sled-agent-rack-setup doesn't impose any ordering | ||
| # constraints on the upgrade DAG. | ||
| subcomponents = [ | ||
| { name = "sled-agent-rack-setup", inside = "omicron-sled-agent", lifecycle = "rack-init" }, | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking for feedback on the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really like this tbh. Having it explicitly set helped me understand what was going on. Coming to this PR with very little knowledge of how ls-apis worked, I can say that not having it would have made the functionality much harder to grasp.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I agree, making it explicit is much more approachable. ls-apis is necessarily quite complex and it's good to try and keep the complexity budget down. |
||
| ] | ||
|
|
||
| # Installinator gets packaged into its own host OS image. | ||
| [[deployment_units]] | ||
|
|
@@ -389,10 +404,12 @@ server_package_name = "nexus-lockstep-api" | |
| versioned_how = "client" | ||
| versioned_how_reason = "Circular dependencies between Nexus and other services" | ||
| # Exactly these consumers (Rust packages specified in one or more of the | ||
| # `packages` arrays in the deployment_units table above) are allowed to use this | ||
| # API. ls-apis check will error out if the list of consumers is incorrect. | ||
| # `packages` arrays in the deployment_units table above, or via `subcomponents` | ||
| # entries in the same table) are allowed to use this API. ls-apis check will | ||
| # error out if the list of consumers is incorrect. | ||
| restricted_to_consumers = [ | ||
| { name = "omicron-sled-agent", reason = "Only RSS and sled-agent-sim use this API, neither of which is part of upgrade" }, | ||
| { name = "omicron-sled-agent", reason = "Only sled-agent-sim uses this API, which is not part of upgrade" }, | ||
| { name = "sled-agent-rack-setup", reason = "Only RSS uses this API, which is not part of upgrade" }, | ||
| ] | ||
|
|
||
| [[apis]] | ||
|
|
@@ -548,24 +565,6 @@ Sled Agent uses a Sled Agent client for the `zone-bundle` binary, which is not | |
| deployed. | ||
| """ | ||
|
|
||
| [[dependency_filter_rules]] | ||
| ancestor = "sled-agent-rack-setup" | ||
| client = "cockroach-admin-client" | ||
| evaluation = "not-deployed" | ||
| note = """ | ||
| The Cockroach Admin client is only used during RSS, which we don't care about | ||
| for the purpose of upgrade. | ||
| """ | ||
|
|
||
| [[dependency_filter_rules]] | ||
| ancestor = "sled-agent-rack-setup" | ||
| client = "dns-service-client" | ||
| evaluation = "not-deployed" | ||
| note = """ | ||
| The DNS service client is only used during RSS, which we don't care about for | ||
| the purpose of upgrade. | ||
| """ | ||
|
|
||
| [[dependency_filter_rules]] | ||
| ancestor = "transient-dns-server" | ||
| client = "dns-service-client" | ||
|
|
@@ -799,20 +798,22 @@ note = "Sled Agent uses Client::localhost() to connect to ddm." | |
| permalinks = [ | ||
| "https://github.com/oxidecomputer/omicron/blob/6cef874/sled-agent/src/ddm_reconciler.rs#L56", | ||
| "https://github.com/oxidecomputer/omicron/blob/6cef874/sled-agent/src/bootstrap/bootstore_setup.rs#L92", | ||
| "https://github.com/oxidecomputer/omicron/blob/6cef874/sled-agent/src/rack_setup/service.rs#L1086", | ||
| "https://github.com/oxidecomputer/omicron/blob/6cef874/sled-agent/src/sled_agent.rs#L1378", | ||
| ] | ||
|
|
||
| [[intra_deployment_unit_only_edges]] | ||
| server = "omicron-sled-agent" | ||
| client = "mg-admin-client" | ||
| note = """ | ||
| Sled Agent only queries the switch zone on the same sled, | ||
| which is within the same deployment unit as the global | ||
| zone. | ||
| Sled Agent invokes EarlyNetworkSetup::init_switch_config | ||
| (services.rs:3431) to configure uplinks on the local switch zone. This in turn | ||
| constructs an MgdClient pointed at the local switch zone | ||
| (early-networking/lib.rs:442), which is within the same deployment unit as the | ||
| global zone. | ||
| """ | ||
| permalinks = [ | ||
| "https://github.com/oxidecomputer/omicron/blob/6cef874/sled-agent/src/bootstrap/early_networking.rs#L483", | ||
| "https://github.com/oxidecomputer/omicron/blob/32de20c/sled-agent/src/services.rs#L3431", | ||
| "https://github.com/oxidecomputer/omicron/blob/32de20c/sled-agent/early-networking/src/lib.rs#L442", | ||
| ] | ||
|
|
||
| [[intra_deployment_unit_only_edges]] | ||
|
|
@@ -835,39 +836,47 @@ server = "omicron-sled-agent" | |
| client = "gateway-client" | ||
| note = """ | ||
| BUG: Sled Agent creates two MGS clients. One of them | ||
| (early_networking.rs:376) queries the switch zone on | ||
| the same sled, which is within the same deployment unit | ||
| as the global zone, so this is okay. | ||
| (early-networking/lib.rs:331, called from services.rs:3431) queries the | ||
| switch zone on the same sled, which is within the same deployment unit as the | ||
| global zone, so this is okay. | ||
|
|
||
| The other one (early_networking.rs:277) queries both | ||
| switch zones, so it crosses deployment units. This needs | ||
| to be fixed. | ||
| The other one (early-networking/lib.rs:259, called from services.rs:1095) | ||
| queries both switch zones, so it crosses deployment units. This needs to be | ||
| fixed. | ||
|
|
||
| Reference: https://github.com/oxidecomputer/omicron/issues/9708 | ||
| """ | ||
| permalinks = [ | ||
| "https://github.com/oxidecomputer/omicron/blob/6cef874/sled-agent/src/bootstrap/early_networking.rs#L376", | ||
| "https://github.com/oxidecomputer/omicron/blob/6cef874/sled-agent/src/bootstrap/early_networking.rs#L277-L280", | ||
| "https://github.com/oxidecomputer/omicron/blob/32de20c/sled-agent/src/services.rs#L3431", | ||
| "https://github.com/oxidecomputer/omicron/blob/32de20c/sled-agent/early-networking/src/lib.rs#L331", | ||
| "https://github.com/oxidecomputer/omicron/blob/32de20c/sled-agent/src/services.rs#L1095-L1102", | ||
| "https://github.com/oxidecomputer/omicron/blob/32de20c/sled-agent/early-networking/src/lib.rs#L259-L262", | ||
| ] | ||
|
|
||
| [[intra_deployment_unit_only_edges]] | ||
| server = "omicron-sled-agent" | ||
| client = "dpd-client" | ||
| note = """ | ||
| BUG: Sled Agent creates two DPD clients. One of them (early_networking.rs:419) | ||
| is always to the switch zone on the same sled, which is in the same deployment | ||
| unit. | ||
| BUG: Sled Agent creates two DPD clients. One of them | ||
| (early-networking/lib.rs:378, called from services.rs:3431) is always to the | ||
| switch zone on the same sled, which is in the same deployment unit. | ||
|
|
||
| The other one (services.rs:1090) queries both switch zones, so it crosses | ||
| deployment units. This needs to be fixed. | ||
| The other one (services.rs:1104, iterating addresses returned by | ||
| EarlyNetworkSetup::lookup_uplinked_switch_zone_underlay_addrs) queries both | ||
| switch zones, so it crosses deployment units. This needs to be fixed. | ||
|
|
||
| Reference: https://github.com/oxidecomputer/omicron/issues/9708 | ||
| """ | ||
| permalinks = [ | ||
| "https://github.com/oxidecomputer/omicron/blob/6cef874/sled-agent/src/bootstrap/early_networking.rs#L419", | ||
| "https://github.com/oxidecomputer/omicron/blob/6cef874/sled-agent/src/services.rs#L1090", | ||
| "https://github.com/oxidecomputer/omicron/blob/32de20c/sled-agent/src/services.rs#L3431", | ||
| "https://github.com/oxidecomputer/omicron/blob/32de20c/sled-agent/early-networking/src/lib.rs#L378", | ||
| "https://github.com/oxidecomputer/omicron/blob/32de20c/sled-agent/src/services.rs#L1104-L1115", | ||
| ] | ||
|
|
||
| # Note: sled-agent-rack-setup (RSS) is marked with `lifecycle = "rack-init"` | ||
| # in its subcomponent declaration, so its API consumption is excluded from the | ||
| # upgrade DAG entirely. No IDU annotation is needed for edges from RSS. | ||
|
Comment on lines
+876
to
+878
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🎉 |
||
|
|
||
| [[intra_deployment_unit_only_edges]] | ||
| server = "tfportd" | ||
| client = "dpd-client" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I think of the concept of a "subcomponent", what comes to mind is a piece of a unit, sort of what
packagesdoes. I don't really think of a piece of a unit that should be treated differently. This terminology caused a bit of confusion for me and made it harder to understand what was going on here. At first glance it gave me the impression you should list all API dependencies or something like that.Perhaps we could use another term? Something like
tracked_subcomponentswhere it's clear that they're something that should be tracked separately? Or perhapsauxiliary_subcomponents?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see what you mean. What about "library component" to indicate that this is a library that should be reasoned about separately? Or is that too general for you?