Store xDS resources as YAML instead of JSON#1321
Conversation
Motivation: - xDS resources were stored as JSON files. YAML is more human-readable and is the preferred format for configuration in the Kubernetes/Envoy ecosystem. Modifications: - `XdsResourceManager`: `fileName()` now returns `.yaml`; `push()` uses `Change.ofYamlUpsert()`; `updateOrDelete()` falls back to the legacy `.json` path when the `.yaml` file is absent; `update()` atomically removes the old `.json` and creates the new `.yaml` in a single commit. - `XdsResourceWatchingService`: changed `handleXdsResource()` signature from `String contentAsText` to `JsonNode content`; both call sites now pass `entry.content()` / `change.content()` directly instead of `contentAsText()`. Also accepts `EntryType.YAML` and handles `UPSERT_YAML`. Result: - Newly created xDS resources are stored as `.yaml` files. - Existing `.json` files remain readable; they are atomically migrated to `.yaml` on the first update.
📝 WalkthroughWalkthroughThis PR migrates XDS resource storage from ChangesXDS YAML Migration
Sequence Diagram(s)sequenceDiagram
participant Client
participant XdsService as XdsCluster/Endpoint/ListenerService
participant XdsResourceManager
participant Repository
Client->>XdsService: createResource(group, id, proto)
XdsService->>XdsResourceManager: push(group, name, .yaml path, proto)
XdsResourceManager->>Repository: find(*.{yaml,json}) glob
Repository-->>XdsResourceManager: absent / legacy .json found
alt absent → create new
XdsResourceManager->>Repository: upsert id.yaml
else legacy .json exists → conflict (409)
XdsResourceManager-->>Client: ALREADY_EXISTS
end
Client->>XdsService: updateResource(group, id, proto)
XdsService->>XdsResourceManager: update(group, name, proto)
XdsResourceManager->>Repository: find(*.{yaml,json}) glob via updateOrDelete
Repository-->>XdsResourceManager: foundFileName (.yaml or .json)
XdsResourceManager->>Repository: upsert id.yaml [+ remove legacy id.json]
Repository-->>XdsResourceManager: committed
XdsResourceManager-->>Client: OK
Note over XdsResourceManager,Repository: Watch pipeline receives diff
XdsResourceManager->>XdsResourceWatchingService: UPSERT_YAML / UPSERT_JSON diff entry
XdsResourceWatchingService->>ControlPlaneService: handleXdsResource(path, JsonNode)
ControlPlaneService->>ControlPlaneService: merge into Envoy proto builder via content.traverse()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
xds/src/main/java/com/linecorp/centraldogma/xds/endpoint/v1/XdsEndpointUpdateScheduler.java (1)
180-198: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winMake the YAML-vs-JSON choice deterministic.
FIND_ONE_WITHOUT_CONTENToverbase.*can return the legacy.jsoneven when the.yamlcounterpart also exists, so this path may transform stale JSON and upsert it over the YAML content. Fetch both matches without content and prefer.yaml; only migrate.jsonwhen no YAML entry exists.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@xds/src/main/java/com/linecorp/centraldogma/xds/endpoint/v1/XdsEndpointUpdateScheduler.java` around lines 180 - 198, The selection logic in XdsEndpointUpdateScheduler.handle currently relies on FIND_ONE_WITHOUT_CONTENT over baseFileName + ".*", which can nondeterministically pick the legacy .json entry even when a .yaml counterpart exists. Update the lookup to fetch both matching entries without content, then explicitly prefer the .yaml path in the branch that calls executeTransform; only fall back to executeTransformWithMigration when no .yaml entry is present and the matched file is .json. Ensure the decision is based on the resolved entry name, not the arbitrary first result from the map.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@xds/src/main/java/com/linecorp/centraldogma/xds/endpoint/v1/XdsEndpointUpdateScheduler.java`:
- Around line 248-249: The migration path in XdsEndpointUpdateScheduler
currently lets a failure from BatchUpdateTransformer.apply() abort the returned
stage without closing the copied gRPC observers. Update the callback around
entry.revision() / entry.content() so any exception from apply is caught and all
pending observers are completed or failed before propagating the error, similar
to how Command.transform handles cleanup.
In
`@xds/src/main/java/com/linecorp/centraldogma/xds/internal/XdsResourceManager.java`:
- Around line 201-208: The resource lookup in XdsResourceManager is too broad
because baseFileName + ".*" can match unrelated sibling files and cause false
conflicts or wrong update/delete targets. Update the lookup in the
create/update/delete flow around the repository.find call to check only the
exact .yaml resource path and the legacy .json path, using the existing
fileName/baseFileName handling in XdsResourceManager so FIND_ONE_WITHOUT_CONTENT
only considers those two resource candidates.
- Around line 287-288: Retry the push when a legacy-file removal races with
migration: XdsResourceManager.updateOrDelete() can resolve a .json path that is
deleted and replaced by .yaml before executePush() runs, causing
Change.ofRemoval(legacyFileToRemove) to fail. Update the push flow around
executePush(), createLegacyChange, and the legacyFileToRemove handling so it
re-resolves the current resource path and retries with the YAML target instead
of surfacing the stale removal error.
In
`@xds/src/main/java/com/linecorp/centraldogma/xds/k8s/v1/XdsKubernetesEndpointFetchingService.java`:
- Around line 201-213: In XdsKubernetesEndpointFetchingService’s endpoint
cleanup flow, the current find/delete path only removes the first matching
endpoint file, so a leftover .json or .yaml counterpart can remain. Update the
repository.find(...).handle(...) logic to collect all matching endpoint paths
returned for the endpointBase prefix instead of using FIND_ONE_WITHOUT_CONTENT
and findFirst(), then call removeEndpointFile for each matched YAML/JSON path so
both legacy formats are cleaned up together.
- Around line 188-193: When an aggregator file is removed, the code in
XdsKubernetesEndpointFetchingService currently closes the corresponding
KubernetesEndpointsUpdater but leaves it in kubernetesEndpointsUpdaters. Update
the removal path to also delete the entry from the map after calling
updater.close(), using the existing aggregatorName key so stale updaters do not
accumulate.
---
Outside diff comments:
In
`@xds/src/main/java/com/linecorp/centraldogma/xds/endpoint/v1/XdsEndpointUpdateScheduler.java`:
- Around line 180-198: The selection logic in XdsEndpointUpdateScheduler.handle
currently relies on FIND_ONE_WITHOUT_CONTENT over baseFileName + ".*", which can
nondeterministically pick the legacy .json entry even when a .yaml counterpart
exists. Update the lookup to fetch both matching entries without content, then
explicitly prefer the .yaml path in the branch that calls executeTransform; only
fall back to executeTransformWithMigration when no .yaml entry is present and
the matched file is .json. Ensure the decision is based on the resolved entry
name, not the arbitrary first result from the map.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bc455d39-0640-4f89-8c2e-49525ea44751
📒 Files selected for processing (25)
it/xds-k8s-node-ip-extractor/src/test/java/com/linecorp/centraldogma/it/xds/k8s/XdsKubernetesNodeIpExtractorTest.javaserver/src/main/java/com/linecorp/centraldogma/server/command/ContentTransformer.javaserver/src/main/java/com/linecorp/centraldogma/server/internal/storage/repository/git/AbstractChangesApplier.javaserver/src/main/java/com/linecorp/centraldogma/server/internal/storage/repository/git/TransformingChangesApplier.javawebapp/src/dogma/features/xds/K8sAggregatorStatus.tsxwebapp/src/dogma/features/xds/XdsTypes.tswebapp/src/dogma/features/xds/xdsApiSlice.tsxds/src/main/java/com/linecorp/centraldogma/xds/cluster/v1/XdsClusterService.javaxds/src/main/java/com/linecorp/centraldogma/xds/endpoint/v1/XdsEndpointService.javaxds/src/main/java/com/linecorp/centraldogma/xds/endpoint/v1/XdsEndpointUpdateScheduler.javaxds/src/main/java/com/linecorp/centraldogma/xds/internal/CentralDogmaXdsResources.javaxds/src/main/java/com/linecorp/centraldogma/xds/internal/ControlPlaneService.javaxds/src/main/java/com/linecorp/centraldogma/xds/internal/XdsEndpointReadService.javaxds/src/main/java/com/linecorp/centraldogma/xds/internal/XdsResourceManager.javaxds/src/main/java/com/linecorp/centraldogma/xds/internal/XdsResourceWatchingService.javaxds/src/main/java/com/linecorp/centraldogma/xds/k8s/v1/XdsKubernetesEndpointFetchingService.javaxds/src/main/java/com/linecorp/centraldogma/xds/k8s/v1/XdsKubernetesService.javaxds/src/main/java/com/linecorp/centraldogma/xds/listener/v1/XdsListenerService.javaxds/src/main/java/com/linecorp/centraldogma/xds/route/v1/XdsRouteService.javaxds/src/test/java/com/linecorp/centraldogma/xds/internal/XdsEndpointReadPermissionTest.javaxds/src/test/java/com/linecorp/centraldogma/xds/internal/XdsLegacyJsonCompatibilityTest.javaxds/src/test/java/com/linecorp/centraldogma/xds/internal/XdsResourceWatchingServiceTest.javaxds/src/test/java/com/linecorp/centraldogma/xds/internal/XdsWritePermissionTest.javaxds/src/test/java/com/linecorp/centraldogma/xds/k8s/v1/AggregatingMultipleKubernetesTest.javaxds/src/test/java/com/linecorp/centraldogma/xds/k8s/v1/XdsKubernetesServiceTest.java
| final JsonNode newContent = new BatchUpdateTransformer(toRegister, toDeregister) | ||
| .apply(entry.revision(), entry.content()); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Complete pending observers if the local migration transform fails.
Unlike Command.transform, this migration path runs BatchUpdateTransformer.apply() inside the callback before installing a command handler. If it throws, the returned stage fails and the copied gRPC observers are left open.
Suggested fix
- final JsonNode newContent = new BatchUpdateTransformer(toRegister, toDeregister)
- .apply(entry.revision(), entry.content());
+ final JsonNode newContent;
+ try {
+ newContent = new BatchUpdateTransformer(toRegister, toDeregister)
+ .apply(entry.revision(), entry.content());
+ } catch (Throwable t) {
+ copied.forEach(p -> p.streamObserver.onError(t));
+ return null;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| final JsonNode newContent = new BatchUpdateTransformer(toRegister, toDeregister) | |
| .apply(entry.revision(), entry.content()); | |
| final JsonNode newContent; | |
| try { | |
| newContent = new BatchUpdateTransformer(toRegister, toDeregister) | |
| .apply(entry.revision(), entry.content()); | |
| } catch (Throwable t) { | |
| copied.forEach(p -> p.streamObserver.onError(t)); | |
| return null; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@xds/src/main/java/com/linecorp/centraldogma/xds/endpoint/v1/XdsEndpointUpdateScheduler.java`
around lines 248 - 249, The migration path in XdsEndpointUpdateScheduler
currently lets a failure from BatchUpdateTransformer.apply() abort the returned
stage without closing the copied gRPC observers. Update the callback around
entry.revision() / entry.content() so any exception from apply is caught and all
pending observers are completed or failed before propagating the error, similar
to how Command.transform handles cleanup.
| final String baseFileName = fileName.substring(0, fileName.length() - 5); | ||
| repository.find(Revision.HEAD, baseFileName + ".*", FIND_ONE_WITHOUT_CONTENT) | ||
| .handle((entries, cause) -> { | ||
| if (cause != null) { | ||
| responseObserver.onError(cause); | ||
| return null; | ||
| } | ||
| if (!entries.isEmpty()) { |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Restrict the lookup to the exact .yaml and legacy .json paths.
baseFileName + ".*" can match unrelated siblings, so creates may falsely conflict and update/delete may operate on a non-resource file chosen by FIND_ONE_WITHOUT_CONTENT.
Proposed fix
- final String baseFileName = fileName.substring(0, fileName.length() - 5);
- repository.find(Revision.HEAD, baseFileName + ".*", FIND_ONE_WITHOUT_CONTENT)
+ final String legacyJsonFileName = toLegacyJsonFileName(fileName);
+ repository.find(Revision.HEAD, fileName + ',' + legacyJsonFileName, FIND_ONE_WITHOUT_CONTENT)
.handle((entries, cause) -> {- final String baseFileName = fileName.substring(0, fileName.length() - 5);
- repository.find(Revision.HEAD, baseFileName + ".*", FIND_ONE_WITHOUT_CONTENT)
+ final String legacyJsonFileName = toLegacyJsonFileName(fileName);
+ repository.find(Revision.HEAD, fileName + ',' + legacyJsonFileName, FIND_ONE_WITHOUT_CONTENT)
.handle((entries, cause) -> {Also applies to: 321-334
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@xds/src/main/java/com/linecorp/centraldogma/xds/internal/XdsResourceManager.java`
around lines 201 - 208, The resource lookup in XdsResourceManager is too broad
because baseFileName + ".*" can match unrelated sibling files and cause false
conflicts or wrong update/delete targets. Update the lookup in the
create/update/delete flow around the repository.find call to check only the
exact .yaml resource path and the legacy .json path, using the existing
fileName/baseFileName handling in XdsResourceManager so FIND_ONE_WITHOUT_CONTENT
only considers those two resource candidates.
| executePush(responseObserver, group, resourceName, summary, author, resource, | ||
| change, false, legacyFileToRemove); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift
Retry the lookup when legacy removal races with migration.
After updateOrDelete() finds .json, another update can remove it and create .yaml before this push runs. That makes Change.ofRemoval(legacyFileToRemove) fail and currently surfaces an error instead of retrying against the now-current YAML path.
Suggested direction
- responseObserver.onError(cause);
+ if (!create && legacyFileToRemove != null &&
+ peeled instanceof ChangeConflictException) {
+ // Re-run the YAML/JSON lookup so a concurrent migration is handled as YAML,
+ // while a concurrent delete still reports NOT_FOUND.
+ // Implement this at the update/delete call site to avoid blindly recreating
+ // a resource that was deleted.
+ }
+ responseObserver.onError(cause);
return null;Also applies to: 299-302, 232-248
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@xds/src/main/java/com/linecorp/centraldogma/xds/internal/XdsResourceManager.java`
around lines 287 - 288, Retry the push when a legacy-file removal races with
migration: XdsResourceManager.updateOrDelete() can resolve a .json path that is
deleted and replaced by .yaml before executePush() runs, causing
Change.ofRemoval(legacyFileToRemove) to fail. Update the push flow around
executePush(), createLegacyChange, and the legacyFileToRemove handling so it
re-resolves the current resource path and retries with the YAML target instead
of surfacing the stale removal error.
| if (updaters != null) { | ||
| final KubernetesEndpointsUpdater updater = updaters.get(aggregatorName); | ||
| if (updater != null) { | ||
| updater.close(); | ||
| } | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Remove the updater from the map when the aggregator file is removed.
This currently closes the updater but leaves a stale entry in kubernetesEndpointsUpdaters, which can leak one entry per deleted aggregator.
Suggested fix
if (updaters != null) {
- final KubernetesEndpointsUpdater updater = updaters.get(aggregatorName);
+ final KubernetesEndpointsUpdater updater = updaters.remove(aggregatorName);
if (updater != null) {
updater.close();
}
+ if (updaters.isEmpty()) {
+ kubernetesEndpointsUpdaters.remove(groupName);
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (updaters != null) { | |
| final KubernetesEndpointsUpdater updater = updaters.get(aggregatorName); | |
| if (updater != null) { | |
| updater.close(); | |
| } | |
| } | |
| if (updaters != null) { | |
| final KubernetesEndpointsUpdater updater = updaters.remove(aggregatorName); | |
| if (updater != null) { | |
| updater.close(); | |
| } | |
| if (updaters.isEmpty()) { | |
| kubernetesEndpointsUpdaters.remove(groupName); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@xds/src/main/java/com/linecorp/centraldogma/xds/k8s/v1/XdsKubernetesEndpointFetchingService.java`
around lines 188 - 193, When an aggregator file is removed, the code in
XdsKubernetesEndpointFetchingService currently closes the corresponding
KubernetesEndpointsUpdater but leaves it in kubernetesEndpointsUpdaters. Update
the removal path to also delete the entry from the map after calling
updater.close(), using the existing aggregatorName key so stale updaters do not
accumulate.
| repository.find(Revision.HEAD, endpointBase + ".*", FIND_ONE_WITHOUT_CONTENT) | ||
| .handle((entries, findCause) -> { | ||
| if (findCause != null) { | ||
| logger.warn("Failed to find endpoint file for {} in {}", endpointBase, groupName, findCause); | ||
| return null; | ||
| } | ||
| final String actualPath = entries.keySet().stream() | ||
| .filter(p -> p.endsWith(".yaml") || p.endsWith(".json")) | ||
| .findFirst().orElse(null); | ||
| if (actualPath == null) { | ||
| return null; // Already removed or never created. | ||
| } | ||
| removeEndpointFile(groupName, actualPath); |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Remove both endpoint counterparts when both formats exist.
Using FIND_ONE_WITHOUT_CONTENT deletes only one matched endpoint path. If a legacy .json and .yaml counterpart both exist, deleting the aggregator can leave a stale endpoint resource behind. Fetch all matching YAML/JSON paths and remove them in the same cleanup command.
🧰 Tools
🪛 PMD (7.25.0)
[Low] 204-204: InvalidLogMessageFormat (Error Prone): Too many arguments, expected 2 arguments but found 3
(InvalidLogMessageFormat (Error Prone))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@xds/src/main/java/com/linecorp/centraldogma/xds/k8s/v1/XdsKubernetesEndpointFetchingService.java`
around lines 201 - 213, In XdsKubernetesEndpointFetchingService’s endpoint
cleanup flow, the current find/delete path only removes the first matching
endpoint file, so a leftover .json or .yaml counterpart can remain. Update the
repository.find(...).handle(...) logic to collect all matching endpoint paths
returned for the endpointBase prefix instead of using FIND_ONE_WITHOUT_CONTENT
and findFirst(), then call removeEndpointFile for each matched YAML/JSON path so
both legacy formats are cleaned up together.
Motivation:
Modifications:
XdsResourceManager:fileName()now returns.yaml;push()usesChange.ofYamlUpsert();updateOrDelete()falls back to the legacy.jsonpath when the.yamlfile is absent;update()atomically removes the old.jsonand creates the new.yamlin a single commit.XdsResourceWatchingService: changedhandleXdsResource()signature fromString contentAsTexttoJsonNode content; both call sites now passentry.content()/change.content()directly instead ofcontentAsText(). Also acceptsEntryType.YAMLand handlesUPSERT_YAML.Result:
.yamlfiles..jsonfiles remain readable; they are atomically migrated to.yamlon the first update.