CLDSRV-898: handle checksums in CompleteMultipartUpload#6170
CLDSRV-898: handle checksums in CompleteMultipartUpload#6170leif-scality wants to merge 10 commits into
Conversation
leif-scality
commented
May 15, 2026
- Calculate and compare the final object checksum with the one sent by the headers
- Check that all parts have the correct checksum and checksum type
- stores the final checksum when FULL_OBJECT (COMPOSITE are going to be stored by https://scality.atlassian.net/browse/S3C-10399)
|
LGTM |
❌ 1 Tests Failed:
View the top 1 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
2493909 to
538c16c
Compare
Hello leif-scality,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Incorrect fix versionThe
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
Please check the |
27b4a43 to
a90bd94
Compare
|
LGTM |
| const parts = jsonList.Part || []; | ||
| for (let i = 0; i < parts.length; i++) { | ||
| const part = parts[i]; |
There was a problem hiding this comment.
| const parts = jsonList.Part || []; | |
| for (let i = 0; i < parts.length; i++) { | |
| const part = parts[i]; | |
| for (const part of (jsonList.Part || [])) { |
| if (tag !== expectedTag) { | ||
| const algoLabel = tag.replace(/^Checksum/, '').toLowerCase(); | ||
| return errorInstances.BadDigest.customizeDescription( | ||
| `The ${algoLabel} you specified for part ${partNumber} ` + 'did not match what we received.', |
There was a problem hiding this comment.
| `The ${algoLabel} you specified for part ${partNumber} ` + 'did not match what we received.', | |
| `The ${algoLabel} you specified for part ${partNumber} did not match what we received.`, |
| return errorInstances.InvalidRequest.customizeDescription( | ||
| `The upload was created using a ${mpuAlgo} checksum. ` + | ||
| 'The complete request must include the checksum for each ' + | ||
| `part. It was missing for part ${partNumber} in the request.`, |
There was a problem hiding this comment.
| `part. It was missing for part ${partNumber} in the request.`, | |
| `part, it is missing for part ${partNumber} in the request.`, |
There was a problem hiding this comment.
The current error message is identical to the one sent by AWS
<Error>
<Code>InvalidRequest</Code>
<Message>The upload was created using a sha256 checksum. The complete request must include the checksum for each part. It
was missing for part 1 in the request.</Message>
<RequestId>2CMRHS7YQJ418XRD</RequestId><HostId>Z+nL7Rh7aglZj/H2D2aesp/j38QjuvN/wpgur86e7/0P6nohBelRVXaXY+mWSkQ
dGaPIKHTmmg0iPUkaWjIWFTHmMJdNZolP</HostId>
</Error>
There was a problem hiding this comment.
There's already a test file multipartUpload.js, wouldn't it be better to add the tests there? Or if a refactor into multiple files is beneficial, maybe port some related tests from there to this new file to avoid scattering the complete MPU tests into multiple files.
There was a problem hiding this comment.
moved the tests to multipartUpload.js. This also triggered the new prettier linter in the file, I added it as a separate commit
| const DummyRequest = require('../DummyRequest'); | ||
| const { cleanup, DummyRequestLogger, makeAuthInfo } = require('../helpers'); | ||
|
|
||
| const SPLITTER = '..|..'; |
There was a problem hiding this comment.
| const SPLITTER = '..|..'; | |
| const SPLITTER = constants.splitter; |
|
|
||
| // XML element name AWS uses for each algorithm in CompleteMultipartUpload's | ||
| // per-part body. | ||
| const TAG_BY_ALGO = { |
There was a problem hiding this comment.
Shouldn't this be in constants too?
There was a problem hiding this comment.
The algorithms object contains the TAG of each algo already, this object is just for testing that the tag was not changed in the algorithms object
| assert.strictEqual( | ||
| err.description, | ||
| 'One or more of the specified parts could not be ' + | ||
| 'found. The part may not have been uploaded, or ' + |
There was a problem hiding this comment.
Is the double space intentional? Also not very fond of matching against such a long error message that may vary, but that's okay I think.
There was a problem hiding this comment.
AWS returns two spaces. For all the final API/XML errors I return exactly what AWS returns
<Error><Code>InvalidPart</Code>
<Message>One or more of the specified parts could not be found. The part may not have been uploaded, or the specified entity
tag may not match the part's entity tag.</Message>
<UploadId>CTFNKyL2hI6n6irH0zbVWDzdPZ4n2ueJceRh1juCeuL2X5HOjrvCmXQMEqaoAatEWTEa3pWWxC7t9lOStMzjo0nJb4pv8Ct6oT
Hv2n8mggVXRQ8RxiXyVyt3.3zpY98HVsZd.ozihhJ1HdUjLCkJtwJMQdNBd4fSdG9drS80vdg-</UploadId>
<PartNumber>1</PartNumber>
<ETag>0ebf9257a12e808d107b2ed1a826c122</ETag>
<RequestId>DAQAPVMCMSY4PDPV</RequestId>
<HostId>XS/2re3ieUQRTKdANLtZv14qyB2h3LjVHnmVrvjP0cj1PazPO16KkArQMtBLBy8S4mmLzQkuXZc=</HostId></Error>
| // `x-amz-checksum-type` and `x-amz-checksum-algorithm` are configuration | ||
| // headers (MPU completeness mode / SDK algorithm hint), not value | ||
| // headers. They must not count toward the "value header" tally. | ||
| const valueHeaders = Object.keys(headers).filter( | ||
| h => h.startsWith('x-amz-checksum-') && h !== 'x-amz-checksum-type' && h !== 'x-amz-checksum-algorithm', | ||
| ); |
There was a problem hiding this comment.
If the list of supported headers is known and a short list of supported algorithms, it may be cleaner to directly extract each of the possible ones, or otherwise filter like [list-of-supported-headers].includes(h).
It would change the behavior (probably in a good way) if the client sends one or more unsupported checksums along with multiple valid ones, where we probably want to return AlgoNotSupported rather than MultipleChecksumTypes in priority in this case, but that's more a nitpick, either way should work.
There was a problem hiding this comment.
AWS checks MultipleChecksumTypes before AlgoNotSupported, the current behavior is the same as AWS.
AWS also ignores only x-amz-checksum-type and x-amz-checksum-algorithm, they don't count them to the checksum count, and they also never trigger AlgoNotSupported. if we send x-amz-checksum-BAD on the other hand we get an AlgoNotSupported. If we send x-amz-checksum-BAD + x-amz-checksum-ZZZ we get MultipleChecksumTypes.
So the order is
- check no MultipleChecksumTypes
- check no AlgoNotSupported
- check actual checksum value, BadDigest if mismatch
I added a commit to ignore x-amz-checksum-type and x-amz-checksum-algorithm in the other functions, I didn't know this behavior existed.
| partInputs.map(p => p.value), | ||
| ); | ||
| } else if (type === 'FULL_OBJECT') { | ||
| result = computeFullObjectMPUChecksum(algorithm, partInputs); |
There was a problem hiding this comment.
Wondering what is the worst case run time of computeFullObjectMPUChecksum, to know if it shouldn't be an async function so it can yield to the event loop regularly? Assuming the worst case is 10K parts and it's just combining CRCs, I think it should not take more than one or a few ms, so should be fine I think, but better check if not done yet.
There was a problem hiding this comment.
A test was added in the previous PR for testing the worst case, and it takes a couple of milliseconds so it should be ok
| let nextCalled = false; | ||
| const callNext = (...args) => { | ||
| if (nextCalled) { | ||
| log.error('processParts: swallowed late callNext after next already invoked', { |
There was a problem hiding this comment.
Not a big fan of this style of defensive programming: for me, either we know that next may be called multiple times by design and we accept it as part of the normal behavior (e.g. with jsutil.once wrapping the inner next callback where we know it may occur), or we should raise an exception (async.* typically does). This type of error will probably be lost in the logs without anyone noticing and finish as if jsutil.once was used in the first place, potentially hiding real issues.
The original code didn't have such error handling, so I think it should be fine to assume it will not happen if the logic introduced ensures a single callback call.
…ing in part metadata
a90bd94 to
e468700
Compare
|
LGTM |
46b6d78 to
6e735e4
Compare
| return next(typeErr, destBucket); | ||
| } | ||
| const mpuType = storedMetadata.checksumType; | ||
| if (!mpuType || headerTypeUpper !== mpuType.toUpperCase()) { |
There was a problem hiding this comment.
When mpuType is falsy (legacy MPU created before the checksum feature, where storedMetadata.checksumType is undefined), this error message reads "The upload was created using the undefined checksum mode." — leaking an internal implementation detail to the client. Consider splitting the !mpuType and mismatch cases into separate error messages.
— Claude Code
|
| */ | ||
| // Validate a Content-MD5 header against the buffered body. Returns null on | ||
| // success, an error object otherwise. | ||
| function validateContentMd5(headers, body) { |
There was a problem hiding this comment.
The JSDoc block above (validateChecksumsNoChunking on lines 332-337) now documents validateContentMd5 instead, since the new function was inserted between the comment and its intended target at line 361. Move the JSDoc down to line 361 or add a proper JSDoc here for validateContentMd5.
— Claude Code
|
LGTM — solid implementation with thorough test coverage. |