test: make tests independent of zip encoding#3239
test: make tests independent of zip encoding#3239supervacuus wants to merge 12 commits intogetsentry:masterfrom
Conversation
loewenheim
left a comment
There was a problem hiding this comment.
As remarked under #3237, I'm in favor of this change, but would like @szokeasaurusrex's opinion as well.
|
I'm a bit more hesitant with this change. I think it is good if we know when the encoding changes, as it is important that we use a zip encoding which the Sentry backend (including older self-hosted versions) properly support. If the encoding ever changes, we should validate the server still recognizes the files properly, then update the snapshots. What do you all think @supervacuus and @loewenheim? |
I'm not sure it's possible for the zip encoding to become genuinely incompatible. As far as I understand the format is fixed. But that doesn't mean that a change in the encoder can't lead to a file being compressed in a different (but equally valid) way. |
|
As far as I am aware, the ZIP format can use other compression algorithms (e.g. ZSTD to name one in particular) which are not universally supported. I'd be concerned about missing a change in the compression algorithm used, which could break compatibility with Sentry SaaS or self-hosted |
That is exactly the premise of my proposal. The test suite on So, while it is true that a break here would signal a change in output, it does not indicate that any version of the backend would be negatively affected (because it does not tell us what changed). It is most likely a false positive. It would also have to be raised back upstream, which means early warnings would have to exist there, too. Even if you use the bit-level break as a signal for potential breakage, you'd still have to check the range of backend versions for compatibility (without blocking a zip dependency bump indefinitely). I think a compatibility check that goes beyond bit-equality can be achieved either by introducing minimal end-to-end tests vs a particular set of pinned backend versions, or, likely more practical, by assuming a set of compression parameters inside the ZIP container that are part of the contract across backend versions. Something like let mut archive =
zip::ZipArchive::new(std::fs::File::open(bundle.path()).unwrap()).unwrap();
assert!(
!archive.is_empty(),
"bundle should contain at least one entry"
);
for i in 0..archive.len() {
let entry = archive.by_index(i).unwrap();
let method = entry.compression();
assert!(
matches!(
method,
zip::CompressionMethod::Stored | zip::CompressionMethod::Deflated
),
"only assume store-as-is and deflate as compression methods",
);
}This allows you to create very tight, concrete bounds on the bundle ZIP container that you can tune (there are additional metadata items that you bound via tests, like "version made by" and extra fields, which can be central to compatibility, but I think just a tight allowlist is a good start), which should also be easier to sync with backends, while keeping you isolated from trivial decoder output changes. |
|
Fair enough. Given that, I am okay with this approach generally 👍 |
szokeasaurusrex
left a comment
There was a problem hiding this comment.
Mostly seems reasonable, please check the one thing I commented on (and, if anything similar was done elsewhere, please also address that!)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e117f9a. Configure here.
… + chunk_upload_multiple_files_only_some
Great catch!
Re-uploading the server-resident chunk (or duplicating either missing one) now fails the assertion. I applied the same multiset pattern to the sibling On the broader compatibility concern from the earlier thread: added a |
|
Excellent, thanks for all the work! |
| fn build_compression_method_canary() { | ||
| let bundle = make_test_bundle(); | ||
| let mut archive = | ||
| zip::ZipArchive::new(std::fs::File::open(bundle.path()).unwrap()).unwrap(); | ||
|
|
||
| // Two source files plus the manifest.json that the source bundle | ||
| // writer adds automatically. | ||
| assert_eq!(archive.len(), 3, "unexpected bundle entry count"); | ||
| for i in 0..archive.len() { | ||
| let entry = archive.by_index(i).unwrap(); | ||
| let method = entry.compression(); | ||
| assert!( | ||
| matches!( | ||
| method, | ||
| zip::CompressionMethod::Stored | zip::CompressionMethod::Deflated | ||
| ), | ||
| "entry {:?} uses {method:?}, outside the current allowlist: \ | ||
| verify backend compatibility before widening", | ||
| entry.name(), | ||
| ); | ||
| } |
|
|
||
| #[cfg(test)] | ||
| mod tests { |
There was a problem hiding this comment.
m: Why do we have a mod tests block here? This file is already inside an integration test, so the #[cfg(test)] on the module should have no effect AFAIK.
There was a problem hiding this comment.
No particular reason. I am not a Rust native, and, as such, I assumed this was not a test module but a test utility, so I created a test sub-module for the tests 😅.
Flattened here: 9614ef2

Description
Extracted from #3237
This change was triggered by bumping
symbolicto12.17.3to include getsentry/symbolic#960.symbolicupdated itszipdependency (2.4.2to7.2.0) since the last bump. This changed encoding internals and invalidated a bunch of test assertions.The PR addresses the issue in the following way:
I rewrote the affected tests so that they decode the incoming chunks and compare their contents against the fixtures, rather than snapshotting, which could break with every change to the encoder. I also simplified the usage of
split_chunk_body()a bit: now it returns aVec, and callers decide how they want to view the data: the simple tests collect into aHashSetfor set comparison, while the small-chunk tests use aSHA1digest multimap to verify exact chunk identity and multiplicity.If you actually wanted an early warning whether the encoding was stable, then we can drop this PR.
Specifically, for
build_deterministic()insrc/utils/source_bundle.rs, it was not entirely clear whether you want two runs of the same version to have predictably the same output (which I changed it to) or whether it should actually be stable across future versions (which azipcrate bump upstream would break).Issues
Raised as part of the fix proposals for getsentry/sentry#104738
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.