Skip to content

Better edit validation and don't apply invalid edits in the latest event#6454

Merged
poljar merged 7 commits intomainfrom
poljar/better-edit-validation
Apr 17, 2026
Merged

Better edit validation and don't apply invalid edits in the latest event#6454
poljar merged 7 commits intomainfrom
poljar/better-edit-validation

Conversation

@poljar
Copy link
Copy Markdown
Contributor

@poljar poljar commented Apr 15, 2026

This PR fixes an issue where we would not adhere to the spec when it comes to edit validation in the latest event logic.

Since the event cache isn't the one applying edits this validation logic needs to happen in two places. The first place is the timeline aggregation and the second one the latest event module.

I created a common method which operates on the raw JSON which is now used in the two relevant places.

A review commit by commit would be the easiest.

  • I've documented the public API Changes in the appropriate CHANGELOG.md files.
  • This PR was made with the help of AI.

@poljar poljar requested a review from a team as a code owner April 15, 2026 12:39
@poljar poljar requested review from bnjbvr and removed request for a team April 15, 2026 12:39
@poljar poljar force-pushed the poljar/better-edit-validation branch from 5267ab9 to cec40bf Compare April 15, 2026 12:41
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 15, 2026

Merging this PR will not alter performance

✅ 50 untouched benchmarks


Comparing poljar/better-edit-validation (9651630) with main (b066260)

Open in CodSpeed

@poljar poljar force-pushed the poljar/better-edit-validation branch from cec40bf to dca2c4f Compare April 15, 2026 13:15
Copy link
Copy Markdown
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm afraid that modifying every observer of the event cache to do the manual check will lead to proliferation of very similar code. And, it makes it a bit too easy to forget about other places where such a check would be required:

  • in the thread latest event (maybe_update_thread_summary predates the LatestEvent system, and the LatestEvent integration with threads is NYI)
  • in TimelineItemContent::from_event (which reconstructs an EventTimelineItem for out-of-timeline events, for usage in mutiple FFI APIs)

These are the two cases I'm thinking about, from the top of my head, so there might be more.

Instead, what we'd do ideally would be to not save an invalid edit event in the first place in the event cache. This complicates things a bit, because federation implies that the edit event can be observed before the original event, so we might want to keep track of edit events not matched against their target event, and discard them if they don't validate.

With such an implementation, you wouldn't have to worry about other subsystems creating observable invalid edits, I think, because the events wouldn't be emitted, or they would be removed beforehand even.

How does that sound? If you prefer keeping an a posteriori validation, then I think the other two functions I've mentioned need a fix and tests too 🫠

Comment thread crates/matrix-sdk-common/src/edit_validation.rs Outdated
Comment thread crates/matrix-sdk-common/src/edit_validation.rs Outdated
Comment thread crates/matrix-sdk-common/src/edit_validation.rs
Comment thread crates/matrix-sdk-common/src/edit_validation.rs Outdated
Comment thread crates/matrix-sdk-common/src/edit_validation.rs Outdated
Comment thread crates/matrix-sdk-common/src/edit_validation.rs
Comment thread crates/matrix-sdk-common/src/edit_validation.rs
Comment thread crates/matrix-sdk-common/src/edit_validation.rs Outdated
Comment thread crates/matrix-sdk-common/src/edit_validation.rs Outdated
Comment thread crates/matrix-sdk-ui/src/timeline/controller/aggregations.rs
@poljar
Copy link
Copy Markdown
Contributor Author

poljar commented Apr 15, 2026

Instead, what we'd do ideally would be to not save an invalid edit event in the first place in the event cache. This complicates things a bit, because federation implies that the edit event can be observed before the original event, so we might want to keep track of edit events not matched against their target event, and discard them if they don't validate.

With such an implementation, you wouldn't have to worry about other subsystems creating observable invalid edits, I think, because the events wouldn't be emitted, or they would be removed beforehand even.

How does that sound? If you prefer keeping an a posteriori validation, then I think the other two functions I've mentioned need a fix and tests too 🫠

Well that sounds good, and I would say that we should go even further. The whole aggregations logic which lives inside of the timeline should be moved to the event cache.

Almost each of those subsystems will be interested in more than just edits and will want to have consistent handling.

That being said, I don't think I'll want to tackle such a big refactoring right now, it's more important to get the fix for this merged.

@poljar
Copy link
Copy Markdown
Contributor Author

poljar commented Apr 16, 2026

Alright, I handled the thread summary in 6d1c566 as well.

As for TimelineItemContent::from_event, as far as I understood that one isn't problematic. It's used in the timeline and the latest event logic.

In both cases we eventually check if this contains a valid edit, well at least after this PR gets merged.

@poljar poljar requested a review from bnjbvr April 16, 2026 15:09
@bnjbvr
Copy link
Copy Markdown
Member

bnjbvr commented Apr 17, 2026

The whole aggregations logic which lives inside of the timeline should be moved to the event cache.

This is relevant to a fundamental design question around the event cache: should the event cache contain raw events (like now) or some kind of aggregation of events (like the timeline EventTimelineItem)? And we went with the raw events for symmetry with all the CS APIs, which give us individual events; then, consumers can handle the individual events, whether they came from the event cache or other CS APIs.

Unless you meant something by this sentence (?), I think it's still better to keep raw, individual, non-aggregated events in the event cache, and let the consumers reaggregate them as they wish.

One API change that would be interesting to explore, would be some kind of output transformer for the event cache events, which split them into "rendered" and "aggregations" categories; each rendered item would get a list of all its aggregations at the same time. Might not be that efficient to represent, but would be definitely more ergonomic for multiple consumers (timeline and latest event, at the minimum), and this would give us a single place where to apply the edit validation logic, among other things.

Copy link
Copy Markdown
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread crates/matrix-sdk/src/event_cache/caches/room/state.rs Outdated
Comment thread crates/matrix-sdk/tests/integration/event_cache/threads.rs Outdated
Comment thread crates/matrix-sdk-common/src/edit_validation.rs Outdated
Comment thread crates/matrix-sdk-common/src/edit_validation.rs Outdated
@poljar
Copy link
Copy Markdown
Contributor Author

poljar commented Apr 17, 2026

Unless you meant something by this sentence (?), I think it's still better to keep raw, individual, non-aggregated events in the event cache, and let the consumers reaggregate them as they wish.

Why is that better? Do we expect different consumers would reaggregate things differently?

One API change that would be interesting to explore, would be some kind of output transformer for the event cache events, which split them into "rendered" and "aggregations" categories;

If I understand this correctly, sure it doesn't need to be exactly in the event cache but might be a layer on top.

What shouldn't be the goal is to have multiple places where aggregation logic needs to happen.

@bnjbvr
Copy link
Copy Markdown
Member

bnjbvr commented Apr 17, 2026

Why is that better? Do we expect different consumers would reaggregate things differently?

No, this is rather that, if we want to be able to accomodate all the use cases (first example that comes to mind being show individual events, for debugging purposes), we need to provide access to individual raw events anyways, i.e. the least common denominator. If we provided only a high-level, aggregated view of events, then we couldn't cover the whole variety of use cases we have now and can't really anticipate in all consumers of the SDK.

What shouldn't be the goal is to have multiple places where aggregation logic needs to happen.

Definitely agree here that having multiple places where applying aggregation would be an anti goal!

@poljar
Copy link
Copy Markdown
Contributor Author

poljar commented Apr 17, 2026

No, this is rather that, if we want to be able to accommodate all the use cases (first example that comes to mind being show individual events, for debugging purposes), we need to provide access to individual raw events anyways, i.e. the least common denominator. If we provided only a high-level, aggregated view of events, then we couldn't cover the whole variety of use cases we have now and can't really anticipate in all consumers of the SDK.

I think we already can't represent the raw view of events like the CS API does, as we're not storing the m.room.encrypted variant of an event if we managed to decrypt it.

As we're already applying transformations on the raw events, i.e. we decrypt them, I'm not sure I see a massive difference between transforming events by decrypting them or by applying an edit.

But alas, this is becoming a bit off-topic for this PR.

@poljar poljar force-pushed the poljar/better-edit-validation branch 2 times, most recently from 9faa761 to 9651630 Compare April 17, 2026 10:05
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 17, 2026

Codecov Report

❌ Patch coverage is 96.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.87%. Comparing base (b9b40c0) to head (9651630).
⚠️ Report is 14 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...rix-sdk-ui/src/timeline/controller/aggregations.rs 86.20% 4 Missing ⚠️
crates/matrix-sdk-common/src/edit_validation.rs 92.10% 1 Missing and 2 partials ⚠️
...es/matrix-sdk/src/event_cache/caches/room/state.rs 83.33% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6454      +/-   ##
==========================================
- Coverage   89.90%   89.87%   -0.04%     
==========================================
  Files         378      379       +1     
  Lines      104559   104689     +130     
  Branches   104559   104689     +130     
==========================================
+ Hits        94002    94087      +85     
- Misses       6963     6994      +31     
- Partials     3594     3608      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@poljar poljar merged commit 1e74c48 into main Apr 17, 2026
52 checks passed
@poljar poljar deleted the poljar/better-edit-validation branch April 17, 2026 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants