Skip to content

feat(spider-storage): Add ValidatedJobSubmission to hold a validated task graph and its task inputs.#320

Merged
sitaowang1998 merged 11 commits intoy-scope:mainfrom
sitaowang1998:validate-job-submission
May 9, 2026
Merged

feat(spider-storage): Add ValidatedJobSubmission to hold a validated task graph and its task inputs.#320
sitaowang1998 merged 11 commits intoy-scope:mainfrom
sitaowang1998:validate-job-submission

Conversation

@sitaowang1998
Copy link
Copy Markdown
Collaborator

@sitaowang1998 sitaowang1998 commented May 8, 2026

Description

This PR adds a ValidatedJobSubmission wrapper type to represents a validated task graph and inputs. ValidatedJobSubmission validates that:

  • Task graph is not empty.
  • Job inputs size match the task graph inputs size.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Adds new unit tests for validation.
  • GitHub workflows pass.

Summary by CodeRabbit

  • New Features

    • Job submissions now validate that task graphs contain at least one task and that provided inputs exactly match expected counts.
    • Error messages now distinguish empty graphs and input-count mismatches for clearer feedback.
  • Refactor

    • Job-submission handling and registration APIs consolidated across storage components for more consistent processing.

@sitaowang1998 sitaowang1998 requested a review from a team as a code owner May 8, 2026 01:32
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack

Walkthrough

This PR adds a ValidatedJobSubmission type that owns a TaskGraph and inputs, enforces non-empty graph and input-count invariants, and propagates the validated submission through TaskGraph creation, SharedJobControlBlock creation, ExternalJobOrchestration::register, and all affected tests.

Changes

ValidatedJobSubmission Wrapper & Integration

Layer / File(s) Summary
Data Types and Error Contracts
components/spider-core/src/types/io.rs, components/spider-storage/src/cache.rs, components/spider-storage/src/cache/error.rs
TaskInput gains Clone derive. InternalError adds TaskGraphEmpty and replaces the positional inputs-mismatch variant with TaskGraphInputSizeMismatch { expected, actual }. job_submission module is exported.
ValidatedJobSubmission Definition & Validation
components/spider-storage/src/cache/job_submission.rs
Adds ValidatedJobSubmission owning TaskGraph and Vec<TaskInput>, create(...) constructor enforcing non-empty graph and matching input count, accessors, into_parts(), and unit tests for success/empty/mismatch cases.
TaskGraph and SharedJobControlBlock Updates
components/spider-storage/src/cache/task.rs, components/spider-storage/src/cache/job.rs
TaskGraph::create now consumes ValidatedJobSubmission. SharedJobControlBlock::create signature updated to accept ValidatedJobSubmission. Cache construction and test helpers updated to use validated submissions.
Database Protocol and Storage Connector Updates
components/spider-storage/src/db/protocol.rs, components/spider-storage/src/db/mariadb.rs
ExternalJobOrchestration::register signature changed to accept &ValidatedJobSubmission; MariaDB implementation extracts task graph and inputs internally. Imports adjusted accordingly.
Test Infrastructure and Factory Pattern
components/spider-storage/tests/scheduling_infra.rs
DbConnectorFactory trait and run_workload now accept ValidatedJobSubmission; noop_db_connector_factory and mariadb_db_connector_factory closures updated to the new signature.
Job Cache and Task Pool Test Updates
components/spider-storage/src/state/job_cache.rs, components/spider-storage/src/task_instance_pool.rs
Test helpers updated to build ValidatedJobSubmission before creating JCBs and TaskGraph in tests; imports updated.
JCB and MariaDB Integration Tests
components/spider-storage/tests/jcb_test.rs, components/spider-storage/tests/mariadb_test.rs
All affected tests updated to construct ValidatedJobSubmission from graphs and inputs and pass it to run_workload/storage.register.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • y-scope/spider#292: Modifies ExternalJobOrchestration::register and related DB-layer APIs in a similar area.

Suggested reviewers

  • LinZhihao-723
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The PR title accurately and concisely describes the main change: introducing the ValidatedJobSubmission type to hold and validate task graphs with their inputs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/spider-storage/tests/scheduling_infra.rs (1)

443-445: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale doc comment — update to reflect the new ValidatedJobSubmission parameter.

The phrase "submitted task graph and job inputs" describes the old separate-argument API. A contributor writing a new DB-connector factory would use this function as a template and be confused by the mismatch between the doc and the actual closure parameter.

✏️ Suggested fix
-/// The returned closure receives the submitted task graph and job inputs from [`run_workload`],
-/// registers the job via [`ExternalJobOrchestration::register`], and returns the connector along
-/// with the resulting [`JobId`] and [`ResourceGroupId`].
+/// The returned closure receives the validated job submission from [`run_workload`],
+/// registers the job via [`ExternalJobOrchestration::register`], and returns the connector along
+/// with the resulting [`JobId`] and [`ResourceGroupId`].

The same applies to the module-level comment (line 71) which still describes the factory as AsyncFnOnce() -> DbConnectorType, omitting the &ValidatedJobSubmission argument.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-storage/tests/scheduling_infra.rs` around lines 443 - 445,
Update the stale doc comments to reflect that the returned closure now receives
a &ValidatedJobSubmission (not separate "submitted task graph and job inputs");
specifically edit the function-level comment that mentions run_workload and
ExternalJobOrchestration::register to state the closure signature accepts
&ValidatedJobSubmission and that it registers the job via
ExternalJobOrchestration::register and returns the connector, and also update
the module-level comment (which currently describes AsyncFnOnce() ->
DbConnectorType) to include the &ValidatedJobSubmission parameter so both
comments match the actual closure parameter.
🧹 Nitpick comments (2)
components/spider-storage/tests/mariadb_test.rs (1)

42-44: ⚡ Quick win

Consider extracting a single_task_job_submission() helper to eliminate repeated boilerplate.

The two-liner pattern:

let (graph, inputs) = single_task_graph();
let job_submission = ValidatedJobSubmission::validate(graph, inputs).expect("job submission should be valid");

appears verbatim in 15+ tests. Extracting it halves the per-test setup noise and makes the intent (register a job from a standard single-task graph) explicit in one call.

♻️ Proposed helper
+/// Builds a [`ValidatedJobSubmission`] from the standard single-task graph used by DB-layer tests.
+fn single_task_job_submission() -> ValidatedJobSubmission {
+    let (graph, inputs) = single_task_graph();
+    ValidatedJobSubmission::validate(graph, inputs)
+        .expect("single-task job submission should be valid")
+}

Each test then becomes:

-    let (graph, inputs) = single_task_graph();
-    let job_submission =
-        ValidatedJobSubmission::validate(graph, inputs).expect("job submission should be valid");
+    let job_submission = single_task_job_submission();

Also applies to: 64-68

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-storage/tests/mariadb_test.rs` around lines 42 - 44,
Extract a helper function (e.g., single_task_job_submission()) that encapsulates
the two-line boilerplate currently repeated: call single_task_graph() to get
(graph, inputs) and then call ValidatedJobSubmission::validate(graph, inputs).
The helper should return the ValidatedJobSubmission (or panic with the same
expect message) so tests can replace the two-liner with a single call to
single_task_job_submission(); update tests that currently use single_task_graph
and ValidatedJobSubmission::validate to call this new helper instead.
components/spider-storage/src/cache/task.rs (1)

1093-1131: 💤 Low value

Dummy task in build_termination_tcb is the correct workaround.

ValidatedJobSubmission::validate requires at least one regular task (TaskGraphEmpty invariant), so the minimal dummy task (zero inputs, zero outputs, input_sources: None) is added solely to satisfy that constraint. The dummy has no effect on the commit-task-focused assertions that follow. A brief inline comment explaining why it exists would prevent future confusion.

💬 Suggested comment
+        // A non-empty regular task is required to pass `ValidatedJobSubmission::validate`
+        // (TaskGraphEmpty check). This dummy task has no bearing on the commit-task behaviour
+        // being tested.
         submitted
             .insert_task(TaskDescriptor {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-storage/src/cache/task.rs` around lines 1093 - 1131, Add a
brief inline comment in build_termination_tcb next to the inserted dummy
TaskDescriptor (the TaskDescriptor with package "test_pkg", task_func
"dummy_fn", zero inputs/outputs and input_sources: None) explaining that this
dummy task is intentionally added to satisfy ValidatedJobSubmission::validate's
requirement that a TaskGraph contain at least one regular task (the
TaskGraphEmpty invariant) and has no effect on the commit-task-focused
assertions; reference SubmittedTaskGraph and ValidatedJobSubmission::validate in
the comment so future readers understand why the dummy exists.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@components/spider-storage/tests/scheduling_infra.rs`:
- Around line 443-445: Update the stale doc comments to reflect that the
returned closure now receives a &ValidatedJobSubmission (not separate "submitted
task graph and job inputs"); specifically edit the function-level comment that
mentions run_workload and ExternalJobOrchestration::register to state the
closure signature accepts &ValidatedJobSubmission and that it registers the job
via ExternalJobOrchestration::register and returns the connector, and also
update the module-level comment (which currently describes AsyncFnOnce() ->
DbConnectorType) to include the &ValidatedJobSubmission parameter so both
comments match the actual closure parameter.

---

Nitpick comments:
In `@components/spider-storage/src/cache/task.rs`:
- Around line 1093-1131: Add a brief inline comment in build_termination_tcb
next to the inserted dummy TaskDescriptor (the TaskDescriptor with package
"test_pkg", task_func "dummy_fn", zero inputs/outputs and input_sources: None)
explaining that this dummy task is intentionally added to satisfy
ValidatedJobSubmission::validate's requirement that a TaskGraph contain at least
one regular task (the TaskGraphEmpty invariant) and has no effect on the
commit-task-focused assertions; reference SubmittedTaskGraph and
ValidatedJobSubmission::validate in the comment so future readers understand why
the dummy exists.

In `@components/spider-storage/tests/mariadb_test.rs`:
- Around line 42-44: Extract a helper function (e.g.,
single_task_job_submission()) that encapsulates the two-line boilerplate
currently repeated: call single_task_graph() to get (graph, inputs) and then
call ValidatedJobSubmission::validate(graph, inputs). The helper should return
the ValidatedJobSubmission (or panic with the same expect message) so tests can
replace the two-liner with a single call to single_task_job_submission(); update
tests that currently use single_task_graph and ValidatedJobSubmission::validate
to call this new helper instead.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 360476b5-b445-404d-8ded-e82562297e3e

📥 Commits

Reviewing files that changed from the base of the PR and between af1df89 and b53714d.

📒 Files selected for processing (13)
  • components/spider-core/src/types/io.rs
  • components/spider-storage/src/cache.rs
  • components/spider-storage/src/cache/error.rs
  • components/spider-storage/src/cache/job.rs
  • components/spider-storage/src/cache/job_submission.rs
  • components/spider-storage/src/cache/task.rs
  • components/spider-storage/src/db/mariadb.rs
  • components/spider-storage/src/db/protocol.rs
  • components/spider-storage/src/state/job_cache.rs
  • components/spider-storage/src/task_instance_pool.rs
  • components/spider-storage/tests/jcb_test.rs
  • components/spider-storage/tests/mariadb_test.rs
  • components/spider-storage/tests/scheduling_infra.rs

Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit catch on the variable name otherwise lgtm.

Comment thread components/spider-storage/src/cache/job_submission.rs Outdated
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got one missing catch.

Comment thread components/spider-storage/src/cache/job_submission.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
components/spider-storage/src/cache/job_submission.rs (1)

79-91: 💤 Low value

Optional: drop the SubmittedTaskGraph alias — it resolves nothing in this scope.

super::* only re-exports pub items; TaskGraph (a non-pub use at line 1) is not among them. Within the test module, spider_core::task::TaskGraph can be imported under its original name without any shadowing conflict.

♻️ Proposed refactor
-        TaskGraph as SubmittedTaskGraph,
+        TaskGraph,

Then update all usages from SubmittedTaskGraphTaskGraph in the test helper and test bodies.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/spider-storage/src/cache/job_submission.rs` around lines 79 - 91,
The import alias SubmittedTaskGraph is unnecessary and unused due to scope
shadowing; remove the alias from the use list and import
spider_core::task::TaskGraph directly in the test module, then replace all
occurrences of SubmittedTaskGraph with TaskGraph in the test helper and test
bodies (references: the use line that currently declares SubmittedTaskGraph and
all usages of SubmittedTaskGraph within job_submission.rs tests). Ensure
super::* remains for other re-exports but does not try to rely on
SubmittedTaskGraph.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@components/spider-storage/src/cache/job_submission.rs`:
- Around line 79-91: The import alias SubmittedTaskGraph is unnecessary and
unused due to scope shadowing; remove the alias from the use list and import
spider_core::task::TaskGraph directly in the test module, then replace all
occurrences of SubmittedTaskGraph with TaskGraph in the test helper and test
bodies (references: the use line that currently declares SubmittedTaskGraph and
all usages of SubmittedTaskGraph within job_submission.rs tests). Ensure
super::* remains for other re-exports but does not try to rely on
SubmittedTaskGraph.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 18a10b9a-44fb-408a-a9ef-5257224986b7

📥 Commits

Reviewing files that changed from the base of the PR and between f9e1163 and 935d8a5.

📒 Files selected for processing (6)
  • components/spider-storage/src/cache/job_submission.rs
  • components/spider-storage/src/cache/task.rs
  • components/spider-storage/src/state/job_cache.rs
  • components/spider-storage/src/task_instance_pool.rs
  • components/spider-storage/tests/jcb_test.rs
  • components/spider-storage/tests/mariadb_test.rs
🚧 Files skipped from review as they are similar to previous changes (5)
  • components/spider-storage/src/task_instance_pool.rs
  • components/spider-storage/tests/jcb_test.rs
  • components/spider-storage/src/state/job_cache.rs
  • components/spider-storage/src/cache/task.rs
  • components/spider-storage/tests/mariadb_test.rs

Copy link
Copy Markdown
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat(spider-storage): Add `ValidatedJobSubmission`  to hold a pre-validated task graph and its task inputs.

@sitaowang1998 sitaowang1998 changed the title feat(spider-storage): Add ValidatedJobSubmission for task graph & inputs validation. feat(spider-storage): Add ValidatedJobSubmission to hold a validated task graph and its task inputs. May 9, 2026
@sitaowang1998 sitaowang1998 merged commit aadb9eb into y-scope:main May 9, 2026
13 checks passed
@sitaowang1998 sitaowang1998 deleted the validate-job-submission branch May 9, 2026 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(storage): Introduce a validated job submission wrapper type for task graph + task inputs

2 participants