Skip to content

test: make tests independent of zip encoding#3239

Open
supervacuus wants to merge 12 commits intogetsentry:masterfrom
supervacuus:test/chunk-zip-stable
Open

test: make tests independent of zip encoding#3239
supervacuus wants to merge 12 commits intogetsentry:masterfrom
supervacuus:test/chunk-zip-stable

Conversation

@supervacuus
Copy link
Copy Markdown
Contributor

Description

Extracted from #3237

This change was triggered by bumping symbolic to 12.17.3 to include getsentry/symbolic#960.

symbolic updated its zip dependency (2.4.2 to 7.2.0) since the last bump. This changed encoding internals and invalidated a bunch of test assertions.

The PR addresses the issue in the following way:

I rewrote the affected tests so that they decode the incoming chunks and compare their contents against the fixtures, rather than snapshotting, which could break with every change to the encoder. I also simplified the usage of split_chunk_body() a bit: now it returns a Vec, and callers decide how they want to view the data: the simple tests collect into a HashSet for set comparison, while the small-chunk tests use a SHA1 digest multimap to verify exact chunk identity and multiplicity.

If you actually wanted an early warning whether the encoding was stable, then we can drop this PR.

Specifically, for build_deterministic() in src/utils/source_bundle.rs, it was not entirely clear whether you want two runs of the same version to have predictably the same output (which I changed it to) or whether it should actually be stable across future versions (which a zip crate bump upstream would break).

Issues

Raised as part of the fix proposals for getsentry/sentry#104738

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

@supervacuus supervacuus requested review from a team and szokeasaurusrex as code owners March 24, 2026 18:51
Copy link
Copy Markdown
Contributor

@loewenheim loewenheim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As remarked under #3237, I'm in favor of this change, but would like @szokeasaurusrex's opinion as well.

@szokeasaurusrex
Copy link
Copy Markdown
Member

I'm a bit more hesitant with this change. I think it is good if we know when the encoding changes, as it is important that we use a zip encoding which the Sentry backend (including older self-hosted versions) properly support. If the encoding ever changes, we should validate the server still recognizes the files properly, then update the snapshots.

What do you all think @supervacuus and @loewenheim?

@loewenheim
Copy link
Copy Markdown
Contributor

I'm a bit more hesitant with this change. I think it is good if we know when the encoding changes, as it is important that we use a zip encoding which the Sentry backend (including older self-hosted versions) properly support. If the encoding ever changes, we should validate the server still recognizes the files properly, then update the snapshots.

What do you all think @supervacuus and @loewenheim?

I'm not sure it's possible for the zip encoding to become genuinely incompatible. As far as I understand the format is fixed. But that doesn't mean that a change in the encoder can't lead to a file being compressed in a different (but equally valid) way.

Copy link
Copy Markdown
Member

As far as I am aware, the ZIP format can use other compression algorithms (e.g. ZSTD to name one in particular) which are not universally supported. I'd be concerned about missing a change in the compression algorithm used, which could break compatibility with Sentry SaaS or self-hosted

@supervacuus
Copy link
Copy Markdown
Contributor Author

supervacuus commented Apr 13, 2026

I'm not sure it's possible for the zip encoding to become genuinely incompatible. As far as I understand the format is fixed. But that doesn't mean that a change in the encoder can't lead to a file being compressed in a different (but equally valid) way.

That is exactly the premise of my proposal. The test suite on master currently tests bit-equality not compatibility. The encoder can be changed to produce a different binary output for a particular byte sequence, yet the output remains decodable by any decoder in the wild.

So, while it is true that a break here would signal a change in output, it does not indicate that any version of the backend would be negatively affected (because it does not tell us what changed). It is most likely a false positive. It would also have to be raised back upstream, which means early warnings would have to exist there, too. Even if you use the bit-level break as a signal for potential breakage, you'd still have to check the range of backend versions for compatibility (without blocking a zip dependency bump indefinitely).

I think a compatibility check that goes beyond bit-equality can be achieved either by introducing minimal end-to-end tests vs a particular set of pinned backend versions, or, likely more practical, by assuming a set of compression parameters inside the ZIP container that are part of the contract across backend versions. Something like

let mut archive =
    zip::ZipArchive::new(std::fs::File::open(bundle.path()).unwrap()).unwrap();
assert!(
    !archive.is_empty(),
    "bundle should contain at least one entry"
);
for i in 0..archive.len() {
    let entry = archive.by_index(i).unwrap();
    let method = entry.compression();
    assert!(
        matches!(
            method,
            zip::CompressionMethod::Stored | zip::CompressionMethod::Deflated
        ),
        "only assume store-as-is and deflate as compression methods",
    );
}

This allows you to create very tight, concrete bounds on the bundle ZIP container that you can tune (there are additional metadata items that you bound via tests, like "version made by" and extra fields, which can be central to compatibility, but I think just a tight allowlist is a good start), which should also be easier to sync with backends, while keeping you isolated from trivial decoder output changes.

Copy link
Copy Markdown
Member

Fair enough. Given that, I am okay with this approach generally 👍

Will plan to take a look at this in more depth tomorrow

Copy link
Copy Markdown
Member

@szokeasaurusrex szokeasaurusrex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly seems reasonable, please check the one thing I commented on (and, if anything similar was done elsewhere, please also address that!)

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e117f9a. Configure here.

@supervacuus
Copy link
Copy Markdown
Contributor Author

m: If possible, we should check that the two chunks that get uploaded are the two we expect; otherwise, it is possible that we reuploaded the chunk that was already on the server

Great catch!

chunk_upload_multiple_files_only_some() now asserts the exact expected chunk set. It derives the expectation by hashing the three fixtures, filtering out the one whose SHA1 matches the "already on the server" chunk from the dif/assemble mock, and then compares it against the decompressed request body using chunk_upload::chunk_digest_counts() (a BTreeMap<String, usize>).

Re-uploading the server-resident chunk (or duplicating either missing one) now fails the assertion. I applied the same multiset pattern to the sibling chunk_upload_multiple_files() test for consistency; all four chunk_upload_multiple_files* variants now share the same shape.

On the broader compatibility concern from the earlier thread: added a build_compression_method_canary in src/utils/source_bundle.rs that opens a freshly built bundle and asserts that each zip entry's compression method is in an allowlist of {Stored, Deflated}. It's a starting bound, not a verified server contract, of course. But it will fail loudly on a silent upstream switch to Zstd/Bzip2/etc, so we can consciously re-validate before accepting the bump.

Copy link
Copy Markdown
Member

Excellent, thanks for all the work!

Copy link
Copy Markdown
Member

@szokeasaurusrex szokeasaurusrex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small question

Comment on lines +269 to +289
fn build_compression_method_canary() {
let bundle = make_test_bundle();
let mut archive =
zip::ZipArchive::new(std::fs::File::open(bundle.path()).unwrap()).unwrap();

// Two source files plus the manifest.json that the source bundle
// writer adds automatically.
assert_eq!(archive.len(), 3, "unexpected bundle entry count");
for i in 0..archive.len() {
let entry = archive.by_index(i).unwrap();
let method = entry.compression();
assert!(
matches!(
method,
zip::CompressionMethod::Stored | zip::CompressionMethod::Deflated
),
"entry {:?} uses {method:?}, outside the current allowlist: \
verify backend compatibility before widening",
entry.name(),
);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 🚀

Comment on lines +155 to +157

#[cfg(test)]
mod tests {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: Why do we have a mod tests block here? This file is already inside an integration test, so the #[cfg(test)] on the module should have no effect AFAIK.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason. I am not a Rust native, and, as such, I assumed this was not a test module but a test utility, so I created a test sub-module for the tests 😅.

Flattened here: 9614ef2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants