feat(spider-storage): Add ServiceState and job level functions.#319
feat(spider-storage): Add ServiceState and job level functions.#319sitaowang1998 wants to merge 29 commits intoy-scope:storage-service-devfrom
ServiceState and job level functions.#319Conversation
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/spider-storage/src/state/job_cache.rs (1)
240-247:⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
resultis moved bymatches!then used again inif let— compile error
matches!(result, Err(...))expands tomatch result { ... }, which moves the non-CopyResult<(), StorageServerError>(theStopping(String)andBadRequest(String)variants alone preventCopy). The subsequentif let … = resulttherefore references a moved value and will not compile.🐛 Proposed fix — borrow in `matches!`
- assert!( - matches!(result, Err(StorageServerError::JobAlreadyExists(_))), - "insert should return JobAlreadyExists error for duplicate key" - ); - if let Err(StorageServerError::JobAlreadyExists(id)) = result { - assert_eq!(id, job_id, "error should contain the duplicate job ID"); - } + assert!( + matches!(&result, Err(StorageServerError::JobAlreadyExists(_))), + "insert should return JobAlreadyExists error for duplicate key" + ); + if let Err(StorageServerError::JobAlreadyExists(id)) = result { + assert_eq!(id, job_id, "error should contain the duplicate job ID"); + }Passing
&resultborrows the value; Rust's match ergonomics automatically adapt the inner pattern, leavingresultavailable for the subsequentif let.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@components/spider-storage/src/state/job_cache.rs` around lines 240 - 247, The test currently moves `result` by using `matches!(result, Err(StorageServerError::JobAlreadyExists(_)))` then reuses `result` in the `if let`, causing a compile error; change the `matches!` to borrow the value (e.g., `matches!(&result, Err(StorageServerError::JobAlreadyExists(_)))`) so `result` from `cache.insert(...)` remains available for the subsequent `if let Err(StorageServerError::JobAlreadyExists(id)) = result` check and assertion.
🧹 Nitpick comments (2)
components/spider-storage/src/state/test_mocks.rs (1)
119-145: ⚡ Quick win
commit_outputsdiscards outputs andfaildiscards the error messageBoth methods update
self.statescorrectly but silently drop their data payloads (_job_outputs,_error_message). Tests that assert on those values work around this by inserting directly intodb.outputs/db.errors, so nothing is broken today. However, as more tests are added, this silent discard can cause confusingDbError::JobNotFoundresults fromget_outputs/get_erroreven after a "successful" transition, making failures hard to diagnose.♻️ Proposed fix — persist data in the mock
async fn commit_outputs( &self, job_id: JobId, - _job_outputs: Vec<TaskOutput>, + job_outputs: Vec<TaskOutput>, _has_commit_task: bool, ) -> Result<(), DbError> { self.states.insert(job_id, JobState::Succeeded); + self.outputs.insert(job_id, job_outputs); Ok(()) } - async fn fail(&self, job_id: JobId, _error_message: String) -> Result<(), DbError> { + async fn fail(&self, job_id: JobId, error_message: String) -> Result<(), DbError> { self.states.insert(job_id, JobState::Failed); + self.errors.insert(job_id, error_message); Ok(()) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@components/spider-storage/src/state/test_mocks.rs` around lines 119 - 145, The mock methods commit_outputs and fail currently ignore their payloads; update commit_outputs(JobId, Vec<TaskOutput>, ...) to store the provided job_outputs in the mock's outputs map (use the real parameter name instead of _job_outputs) when inserting JobState::Succeeded, and update fail(JobId, String) to store the provided error message in the mock's errors map (use the real parameter name instead of _error_message) when inserting JobState::Failed so get_outputs/get_error return the persisted data in tests.components/spider-storage/src/state/service.rs (1)
386-417: 💤 Low value
create_test_jcbhelper is duplicated across test modulesThe
create_test_jcbfunction here is functionally identical to the one injob_cache.rs's test module (Lines 152–183). Sincetest_mocks.rsalready centralises shared mock types, this helper could live there as well, eliminating the duplication.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@components/spider-storage/src/state/service.rs` around lines 386 - 417, The create_test_jcb helper is duplicated; move the function into the shared test module (test_mocks.rs) and have tests import it instead of defining their own. Specifically, extract the create_test_jcb that constructs a SubmittedTaskGraph and calls SharedJobControlBlock::create (using MockReadyQueueSender, MockDbConnector::default(), MockTaskInstancePoolConnector) into test_mocks.rs, remove the duplicate in this file and in job_cache.rs, and update the tests to use the centralized create_test_jcb helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@components/spider-storage/src/state/job_cache.rs`:
- Around line 240-247: The test currently moves `result` by using
`matches!(result, Err(StorageServerError::JobAlreadyExists(_)))` then reuses
`result` in the `if let`, causing a compile error; change the `matches!` to
borrow the value (e.g., `matches!(&result,
Err(StorageServerError::JobAlreadyExists(_)))`) so `result` from
`cache.insert(...)` remains available for the subsequent `if let
Err(StorageServerError::JobAlreadyExists(id)) = result` check and assertion.
---
Nitpick comments:
In `@components/spider-storage/src/state/service.rs`:
- Around line 386-417: The create_test_jcb helper is duplicated; move the
function into the shared test module (test_mocks.rs) and have tests import it
instead of defining their own. Specifically, extract the create_test_jcb that
constructs a SubmittedTaskGraph and calls SharedJobControlBlock::create (using
MockReadyQueueSender, MockDbConnector::default(), MockTaskInstancePoolConnector)
into test_mocks.rs, remove the duplicate in this file and in job_cache.rs, and
update the tests to use the centralized create_test_jcb helper.
In `@components/spider-storage/src/state/test_mocks.rs`:
- Around line 119-145: The mock methods commit_outputs and fail currently ignore
their payloads; update commit_outputs(JobId, Vec<TaskOutput>, ...) to store the
provided job_outputs in the mock's outputs map (use the real parameter name
instead of _job_outputs) when inserting JobState::Succeeded, and update
fail(JobId, String) to store the provided error message in the mock's errors map
(use the real parameter name instead of _error_message) when inserting
JobState::Failed so get_outputs/get_error return the persisted data in tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e5c094e3-844e-4933-8d0c-8f32c049df24
📒 Files selected for processing (5)
components/spider-storage/src/state.rscomponents/spider-storage/src/state/error.rscomponents/spider-storage/src/state/job_cache.rscomponents/spider-storage/src/state/service.rscomponents/spider-storage/src/state/test_mocks.rs
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
LinZhihao-723
left a comment
There was a problem hiding this comment.
Reviewed service implementation and brought up some design concerns. Will go for a detailed review in the next round.
| pub async fn register_job( | ||
| &self, | ||
| resource_group_id: ResourceGroupId, | ||
| task_graph: &spider_core::task::TaskGraph, |
There was a problem hiding this comment.
Since this will be the serviceable layer, shouldn't the input be a serialized task graph? This call should first deserialize the task graph to make sure it's valid.
| ) -> Result<JobId, StorageServerError> { | ||
| let job_id = self | ||
| .db | ||
| .register(resource_group_id, task_graph, &job_inputs) |
There was a problem hiding this comment.
I realized we forgot to address #284 so that the task graph vs. inputs are not validated before registration... we should probably fix that first before we implement this registration method.
| &self, | ||
| job_id: JobId, | ||
| ) -> Result<Vec<TaskOutput>, StorageServerError> { | ||
| Ok(self.db.get_outputs(job_id).await?) |
There was a problem hiding this comment.
Please refer to https://github.com/y-scope/spider/pull/319/changes#r3204181094.
There was a problem hiding this comment.
Job control block does not directly store the outputs/error of the job. Should we add support for that in JCB or keep it as it is now?
Description
This PR:
ServiceStateobject that holds all storage services.SerivceState.test_mocks.rs.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Release Notes
Refactor
Tests
Note: This release contains internal refactoring and infrastructure improvements with no direct impact on user-facing features or functionality.