App 16856: Implement UploadDataFromPath using builtin data manager logic#6113
Open
angelapredolac wants to merge 4 commits into
Open
App 16856: Implement UploadDataFromPath using builtin data manager logic#6113angelapredolac wants to merge 4 commits into
angelapredolac wants to merge 4 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Jira Ticket 16856: Implement UploadDataFromPath using builtin data manager logic
Summary
This implements the core logic behind the UploadDataFromPath API. The previous ticket (wiring PR — base of this branch) plumbed the RPC through RobotService → localRobot → the data manager as a stub that returned "data manager does not support UploadDataFromPath." This PR makes the builtin data manager actually perform the upload: given a file or directory path, it uploads each file to the cloud, deletes the source on success, and returns aggregate counts plus the cloud-assigned binary data ids.
Per discussion, UploadDataFromPath is only expected to be used for arbitrary (3rd-party) files, so every file is uploaded via the existing arbitrary-file FileUpload path (no special-casing of data-capture files). If a .capture file is ever passed, it's uploaded as a plain file rather than parsed into the capture pipeline.
This depends on the api proto PR and is stacked on the wiring branch (base: APP-16855).
Changes
services/datamanager/builtin/builtin.go: added the UploadDataFromPath method on the builtin service, which locks and delegates to the sync layer. This is the method the wiring ticket's interface assertion looks for, so the call now reaches the data manager instead of erroring.
services/datamanager/builtin/sync/sync.go: implemented Sync.UploadDataFromPath. It guards on the cloud connection, os.Stats the path, and either walks a directory or handles a single file, uploading each via the arbitrary-file path and aggregating files_uploaded / files_failed / bytes_uploaded / bytes_total / ids. For directories, a dropped connection stops the walk mid-way, so partial success is reflected in the counts. Refactored syncArbitraryFile to take a context and return the uploaded file's id (and to accept a per-call byte counter so ad-hoc uploads don't pollute the scheduled-sync stats).
services/datamanager/builtin/sync/upload_arbitrary_file.go: uploadArbitraryFile now returns the cloud-assigned BinaryDataId from the FileUploadResponse (previously discarded), which is what populates the ids field.
services/datamanager/builtin/sync/upload_data_from_path_test.go: new unit tests — cloud-not-ready guard, single file, directory aggregation, partial failure, and id population.
robot/impl/local_robot_test.go: updated the wiring ticket's guard test, which now expects "not connected to the cloud" (since the builtin implements the method).
Testing
Unit Tests:
sync package (TestUploadDataFromPath): asserts the not-connected guard, a single file (counts/bytes/id correct and source deleted), a directory aggregating counts/bytes/ids, and a partial failure where bytes_total counts everything discovered while only successful files contribute to files_uploaded/bytes_uploaded. The mock cloud client returns a known BinaryDataId so id population is verified.
Confirmed the existing builtin and sync tests still pass against the uploadArbitraryFile/syncArbitraryFile signature changes (go test ./services/datamanager/... ./robot/... -count=1).
Manual Testing:
Tested UploadDataFromPath end-to-end against a cloud-managed machine running a from-source build of this branch, using a local generic module whose DoCommand dials the machine's RobotClient and calls the new API. The happy path works for both a single file and a 50-file directory — all files upload, the returned ids match what lands in the DATA tab, and source files are deleted after a successful upload. I then exercised the failure modes: a non-existent path returns a clean error with nothing uploaded; an empty directory returns zero uploaded/zero failed with no error; an unreadable file (chmod 000) is counted as files_failed: 1 without a fatal error and isn't deleted; a directory mixing one readable and one unreadable file returns a correct partial result (files_uploaded: 1, files_failed: 1, with bytes_uploaded reflecting only the good file and bytes_total counting both); cutting wifi mid-upload causes the data manager sync to enter TRANSIENT_FAILURE and wait-for-ready/retry rather than crash or lose data; and killing the server mid-upload triggers a graceful shutdown that closes the module and data manager cleanly without deleting the interrupted source. Finally, for a stress/concurrency test, I ran a fake sensor capturing at 10 Hz with scheduled sync enabled while triggering UploadDataFromPath on a 1.5 GB directory (300 × 5 MB files) — the upload completed fully (files_uploaded: 300, files_failed: 0, all 1,572,864,000 bytes), sensor capture stayed continuous at 10 Hz with no gaps throughout, both streams reached the cloud, and the logs stayed clean, confirming the new API coexists with normal data manager capture/sync under load without interference.