Dev tck report checker plus refactoring#113
Draft
MisterVVP wants to merge 25 commits into
Draft
Conversation
### Motivation - Ensure detailed TCK outputs are preserved as CI artifacts so we can identify specific failed/skipped/not-tested requirement IDs instead of only relying on compact console summaries. - Provide a deterministic local/CI gate that fails when any requirement is failed, skipped, or not-tested to drive the work toward a fully clean TCK report. ### Description - Add `scripts/summarize_tck_report.py`, a JSON parser that recursively extracts requirement gaps and prints failed, skipped, and not-tested requirement IDs, affected transports, and the first error per failure, and which exits non-zero when `--require-zero-gaps` is provided. - Update `scripts/run_tck_mandatory.sh` to collect/copy the TCK runner outputs (`compatibility.json`, `compatibility.html`, `tck_report.html`, `junitreport.xml`) from the TCK run output and from common report locations into backend-specific artifact directories and to use a `run_and_collect_reports` helper so report files are preserved even when the runner exits non-zero. - Add CI gates to `.github/workflows/tck.yml` that run the summarizer after the in-memory and PostgreSQL mandatory TCK runs using `python3 scripts/summarize_tck_report.py <report> --require-zero-gaps` so CI will fail if any gaps remain. ### Testing - Verified `scripts/summarize_tck_report.py` compiles with `python3 -m py_compile` and correctly summarizes a synthetic `compatibility.json` sample, including non-zero exit behavior when `--require-zero-gaps` is used. - Performed static checks on the runner wrapper with `bash -n scripts/run_tck_mandatory.sh` and validated report-copy behavior locally. - Ran the repository validation via `./scripts/verify_changes.sh`, which performed configuration, build, and the full test suite (321 tests) and completed successfully; `clang-tidy` was also executed as part of that validation. - Added the CI workflow checks so future runs will preserve TCK report files in `tck-artifacts/reports/inmemory` and `tck-artifacts/reports/postgres` and will fail early when any TCK gaps remain.
### Motivation - Ensure detailed TCK outputs are preserved as CI artifacts so we can identify specific failed/skipped/not-tested requirement IDs instead of only relying on compact console summaries. - Provide a deterministic local/CI gate that fails when any requirement is failed, skipped, or not-tested to drive the work toward a fully clean TCK report. ### Description - Add `scripts/summarize_tck_report.py`, a JSON parser that recursively extracts requirement gaps and prints failed, skipped, and not-tested requirement IDs, affected transports, and the first error per failure, and which exits non-zero when `--require-zero-gaps` is provided. - Update `scripts/run_tck_mandatory.sh` to collect/copy the TCK runner outputs (`compatibility.json`, `compatibility.html`, `tck_report.html`, `junitreport.xml`) from the TCK run output and from common report locations into backend-specific artifact directories and to use a `run_and_collect_reports` helper so report files are preserved even when the runner exits non-zero. - Add CI gates to `.github/workflows/tck.yml` that run the summarizer after the in-memory and PostgreSQL mandatory TCK runs using `python3 scripts/summarize_tck_report.py <report> --require-zero-gaps` so CI will fail if any gaps remain. ### Testing - Verified `scripts/summarize_tck_report.py` compiles with `python3 -m py_compile` and correctly summarizes a synthetic `compatibility.json` sample, including non-zero exit behavior when `--require-zero-gaps` is used. - Performed static checks on the runner wrapper with `bash -n scripts/run_tck_mandatory.sh` and validated report-copy behavior locally. - Ran the repository validation via `./scripts/verify_changes.sh`, which performed configuration, build, and the full test suite (321 tests) and completed successfully; `clang-tidy` was also executed as part of that validation. - Added the CI workflow checks so future runs will preserve TCK report files in `tck-artifacts/reports/inmemory` and `tck-artifacts/reports/postgres` and will fail early when any TCK gaps remain.
### Motivation - Ensure detailed TCK outputs are preserved as CI artifacts so we can identify specific failed/skipped/not-tested requirement IDs instead of only relying on compact console summaries. - Provide a deterministic local/CI gate that fails when any requirement is failed, skipped, or not-tested to drive the work toward a fully clean TCK report. ### Description - Add `scripts/summarize_tck_report.py`, a JSON parser that recursively extracts requirement gaps and prints failed, skipped, and not-tested requirement IDs, affected transports, and the first error per failure, and which exits non-zero when `--require-zero-gaps` is provided. - Update `scripts/run_tck_mandatory.sh` to collect/copy the TCK runner outputs (`compatibility.json`, `compatibility.html`, `tck_report.html`, `junitreport.xml`) from the TCK run output and from common report locations into backend-specific artifact directories and to use a `run_and_collect_reports` helper so report files are preserved even when the runner exits non-zero. - Add CI gates to `.github/workflows/tck.yml` that run the summarizer after the in-memory and PostgreSQL mandatory TCK runs using `python3 scripts/summarize_tck_report.py <report> --require-zero-gaps` so CI will fail if any gaps remain. ### Testing - Verified `scripts/summarize_tck_report.py` compiles with `python3 -m py_compile` and correctly summarizes a synthetic `compatibility.json` sample, including non-zero exit behavior when `--require-zero-gaps` is used. - Performed static checks on the runner wrapper with `bash -n scripts/run_tck_mandatory.sh` and validated report-copy behavior locally. - Ran the repository validation via `./scripts/verify_changes.sh`, which performed configuration, build, and the full test suite (321 tests) and completed successfully; `clang-tidy` was also executed as part of that validation. - Added the CI workflow checks so future runs will preserve TCK report files in `tck-artifacts/reports/inmemory` and `tck-artifacts/reports/postgres` and will fail early when any TCK gaps remain.
### Motivation - Provide task subscription streaming so clients can subscribe to live task events until terminal state across transports. - Centralize subscription lifecycle management and broadcasting to support multiple concurrent subscribers and deterministic ordering. - Improve TCK run scripts to reliably collect reports from various TCK entrypoints and fail CI when compatibility gaps are reported. ### Description - Add a new `TaskSubscriptionService` with header and implementation that manages subscribers, emits current task and status update events, and closes streams on terminal state. - Wire subscription support through the stack by adding `kSubscribeTask` dispatcher operation, implementing `AgentExecutor::SubscribeTask` default behavior, and dispatch handlers for REST/JSON-RPC/gRPC transports. - Add `task_subscription_service.cpp/.h` to the build and link `server/task_subscription_service.cpp` in `src/CMakeLists.txt`. - Integrate subscriptions into example executor and streaming test executor by publishing updates via `subscriptions_.PublishTaskUpdated(...)` and exposing `SubscribeTask` implementations. - Add comprehensive unit tests `tests/unit/task_subscription_service_test.cpp` and extend integration `grpc_transport_integration_test.cpp` to validate subscribe semantics and ordering. - Enhance TCK tooling by adding `scripts/summarize_tck_report.py` to summarize compatibility.json and fail on gaps, and refactor `scripts/run_tck_mandatory.sh` to use absolute paths, collect/copy reports, and preserve exit statuses; update GitHub workflow `/.github/workflows/tck.yml` to verify generated TCK reports with `--require-zero-gaps` for both in-memory and postgres runs. ### Testing - Ran unit tests `task_subscription_service_test` via CTest/GTest and the new test passed. - Ran modified integration test `grpc_transport_integration_test` verifying `SubscribeTask` behavior and deterministic ordering and it passed. - Executed the updated TCK run logic locally/CI using `./scripts/run_tck_mandatory.sh` and the new verification step `python3 scripts/summarize_tck_report.py <report> --require-zero-gaps` is installed into the workflow and succeeds when no compatibility gaps are found.
### Motivation - Add live task subscription support so clients can open a streaming subscription for task updates rather than only polling or one-shot get operations. - Integrate subscription behaviour across dispatcher and transports (gRPC, REST, JSON-RPC) and provide a reusable service for broadcasting updates to multiple subscribers. - Improve TCK run tooling to reliably collect reports, fail on compatibility gaps when required, and upload consistent artifacts. ### Description - Introduced a `TaskSubscriptionService` with header `include/a2a/server/task_subscription_service.h` and implementation `src/server/task_subscription_service.cpp` that manages subscribers, publishes status updates, and exposes `Subscribe` and `PublishTaskUpdated` APIs. - Added coroutine-based helper `StreamResponseCoroutine` (`include/a2a/server/stream_response_coroutine.h`) and extended `ServerStreamSession` with `IsLive()` to signal persistent live sessions. - Plumbed subscription support through executor and dispatcher: added `DispatcherOperation::kSubscribeTask`, `AgentExecutor::SubscribeTask`, `DispatchSubscribeToExecutor`, and dispatch plumbing in `dispatcher.cpp`, and registered `task_subscription_service.cpp` in `CMakeLists.txt`. - Extended transports to expose subscribe semantics: updated `grpc_server_transport.cpp`, `json_rpc_server_transport.cpp`, and `rest_transport.cpp` to handle streaming subscription sessions and to treat subscriptions as `ServerStreamSession` streams instead of single `Task` payloads. - Added subscription usage to example executor (`examples/example_support.h`) so example agents publish updates and expose `SubscribeTask` behavior, and updated integration tests to verify subscription ordering and completion (`tests/integration/grpc_transport_integration_test.cpp`). - Added unit tests for the new service (`tests/unit/task_subscription_service_test.cpp`) and wired it into test CMake (`tests/CMakeLists.txt`). - Improved TCK tooling: `scripts/run_tck_mandatory.sh` now uses absolute working/report paths, collects reports from multiple TCK styles, preserves exit codes, and includes `scripts/summarize_tck_report.py` to parse `compatibility.json` and optionally fail when any gaps are present; the CI workflow `.github/workflows/tck.yml` now invokes the summarizer to `--require-zero-gaps` for in-memory and postgres runs. ### Testing - Added and ran unit tests via GTest including `task_subscription_service_test`, `examples_support_test`, `task_id_generator_test`, and `store_factory_test`, and they succeeded. - Updated and executed integration tests (`tests/integration/grpc_transport_integration_test.cpp`) verifying `SubscribeTask` behavior and ordering, and they passed. - Executed the TCK run wrapper improvements in `scripts/run_tck_mandatory.sh` and used `scripts/summarize_tck_report.py` in CI to validate reports; the summarizer exits non-zero when gaps are found and exited successfully on the tested reports.
…cation ### Motivation - Implement server-side task subscription streaming so clients can subscribe to live task updates and receive current state followed by status updates until terminal state. - Integrate subscription support across transports (gRPC, JSON-RPC, REST, HTTP JSON client) and provide a robust coroutine-backed streaming primitive. - Improve TCK runner and CI to collect reports reliably and fail when compatibility gaps are detected. ### Description - Add `TaskSubscriptionService` with a `Subscribe` API and `PublishTaskUpdated` to broadcast status updates to multiple subscribers, implemented in `include/a2a/server/task_subscription_service.h` and `src/server/task_subscription_service.cpp`. - Introduce `StreamResponseCoroutine` helper in `include/a2a/server/stream_response_coroutine.h` to implement coroutine-based streaming session producers, and expose `ServerStreamSession::IsLive()` in `include/a2a/server/server_stream_session.h`. - Add `SubscribeTask` to `AgentExecutor` default behavior to return a one-shot current-task stream for executors that don't implement subscriptions in `include/a2a/server/agent_executor.h`. - Wire `kSubscribeTask` dispatcher operation and propagate it through `dispatch_types`, `dispatcher.cpp`, and transport implementations, and update REST/JSON-RPC/gRPC transports to handle streaming subscribe sessions instead of treating subscribe as a simple `GetTask` response (files updated include `src/server/rest_transport.cpp`, `src/server/json_rpc_server_transport.cpp`, `src/server/grpc_server_transport.cpp`, and `src/server/dispatcher.cpp`). - Client HTTP JSON transport changed subscribe endpoint to use `POST` for `:subscribe` and updated SSE start behavior in `src/client/http_json_transport.cpp`. - Examples and in-memory executor were updated to use `TaskSubscriptionService` and to publish subscription events when tasks are created/updated/cancelled (`examples/example_support.h`, `tests/integration/grpc_transport_integration_test.cpp`). - Add unit tests for the subscription service and integrate subscription checks in the gRPC integration test (`tests/unit/task_subscription_service_test.cpp`, `tests/integration/grpc_transport_integration_test.cpp`) and add test target registration in `tests/CMakeLists.txt`. - Add `Stream` source file to CMake targets and include the new server file in `src/CMakeLists.txt`. - Enhance TCK orchestration and artifact handling: rewrite `scripts/run_tck_mandatory.sh` to run entrypoints reliably, collect reports into a single report dir, and add `scripts/summarize_tck_report.py` to summarize compatibility gaps and optionally fail on any gaps; add calls in GitHub workflow `/.github/workflows/tck.yml` to verify reports have no gaps. - Minor fixes: threading for HTTP connection handling in the TCK example SUT, small API adjustments and safety checks, and added `task_subscription_service_test` to test discovery in `tests/CMakeLists.txt`. ### Testing - Added unit test `task_subscription_service_test` which exercises subscribe current-task event, rejection of terminal tasks, multi-subscriber broadcast, and subscriber removal; tests are discovered via `gtest_discover_tests` and passed locally in CI runs. - Updated integration test `grpc_transport_integration_test` to verify `SubscribeTask` behavior, deterministic ordering and completion on terminal status; these integration checks run under the existing test harness and passed in CI. - CI workflow updated to run TCK mandatory category and invoke `scripts/summarize_tck_report.py` on both in-memory and PostgreSQL TCK reports with `--require-zero-gaps`, and the verification step passed when run against the generated reports in CI.
…k subscription service, tests, and TCK report verification ### Motivation - Add a SubscribeTask streaming API so clients can subscribe to live task updates across transports and receive SSE events rather than only polling `GetTask` responses. - Provide a server-side subscription service to manage multiple subscribers, broadcast updates, and close streams when tasks reach terminal states. - Support streaming HTTP responses without `Content-Length` via a stream writer callback and ensure transports serialize streaming events as SSE consistently. - Improve CI to collect TCK reports reliably and fail builds when compatibility gaps are present. ### Description - Introduced `TaskSubscriptionService` and `StreamResponseCoroutine` to manage subscriptions, broadcast updates, and provide coroutine-backed stream sessions (`include/a2a/server/task_subscription_service.h`, `src/server/task_subscription_service.cpp`, `include/a2a/server/stream_response_coroutine.h`). - Extended `AgentExecutor` with a `SubscribeTask` virtual method and added `Dispatcher`/`Dispatch` support for `DispatcherOperation::kSubscribeTask` (multiple header/source changes including `dispatch_types.h`, `dispatcher.cpp`, `agent_executor.h`). - Updated transports to support streaming/subscription paths and send SSE streams via a new `HttpServerResponse::stream_writer` callback and `HttpByteTransport` helpers; changes include `http_adapter.cpp`, `rest_transport.*`, `json_rpc_server_transport.*`, `rest_server_transport.*`, and `grpc_server_transport.cpp`. - Added HTTP JSON client change for `SubscribeTask` to use the correct method endpoint and fixed various request/response streaming plumbing in `http_adapter` and the transports (client/server) to safely write headers and stream chunks. - Added example wiring in `examples/example_support.h` to publish subscription updates and to expose `SubscribeTask` behavior in the example executor. - Added CMake integration for the new service (`src/CMakeLists.txt`) and new unit test target for the subscription service; added/updated unit and integration tests exercising subscriptions and streaming (`tests/unit/task_subscription_service_test.cpp`, updates to `json_rpc_server_transport_test.cpp`, `rest_transport_test.cpp`, `grpc_transport_integration_test.cpp`, and `tests/CMakeLists.txt`). - Improved TCK runner and CI: `scripts/run_tck_mandatory.sh` was hardened to collect reports from various TCK entrypoints and paths; added `scripts/summarize_tck_report.py` to analyze `compatibility.json`; and updated GitHub Actions workflow `.github/workflows/tck.yml` to verify reports for in-memory and postgres runs using `--require-zero-gaps`. ### Testing - Executed the C++ unit and integration test suite via CMake/GTest (`ctest`/`gtest` targets), which includes the new `task_subscription_service_test` and updated transport tests, and they passed locally. - Ran the updated gRPC integration tests that exercise subscribe/cancel flows (`grpc_transport_integration_test`) and verified expected events and ordering. - Verified CI TCK report handling by running the updated `./scripts/run_tck_mandatory.sh` locally against a TCK checkout and validated `scripts/summarize_tck_report.py` can detect gaps; the added `--require-zero-gaps` check is wired into the workflow to fail when gaps are present.
Owner
Author
|
@codex review this pr |
|
Codex Review: Something went wrong. Try again later by commenting “@codex review”. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Owner
Author
|
@codex review |
|
Codex Review: Something went wrong. Try again later by commenting “@codex review”. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Description