-
Notifications
You must be signed in to change notification settings - Fork 12
Blog post: Redis Query Engine Triemap Port #2850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 4 commits
eccb09e
781bfee
e7e049c
815c1ea
827b841
5e6dfdc
41baf6f
52b5a4d
6515fe3
12e22ea
de7e368
3e85c60
abd2129
7c83e92
60a30ac
c2c2ffc
97689d7
44515cf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,273 @@ | ||
| --- | ||
|
|
||
| --- | ||
|
|
||
| Redis wants to migrate the Redis Query Engine from C to Rust, and Mainmatter has been working with them on this project. They've rewritten some of their code to rust already, but this kind of work requires good strategy for both choosing candidates for rewrites and for matching the complex behavior and performance characteristics of the original C code. | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## Case Study: Porting the Trie Map | ||
|
|
||
| A Trie Map is a key-value data structure that has a similar API as a Hash Map / Binary Tree Map, but optimized for compressing keys by their shared prefixes. | ||
|
|
||
| <!-- Diagram --> | ||
|
|
||
| We can imagine how to do this easily in Rust, we just need to store a data structure in the form: | ||
|
|
||
| ```rust | ||
| struct TrieMapNode<T> { | ||
| label: Vec<u8>, | ||
| children: Vec<TrieMapNode<T>, | ||
| // The data that the label "maps" to. | ||
| // This is optional, because not all nodes have complete keys with associated data. | ||
| data: Option<T>, | ||
| } | ||
| ``` | ||
|
|
||
| Very simple stuff! We can "cascade" down nodes from the root to complete the labels of leaves. The data field is optional as labels can branch without having an appropriate piece of data at the branch point: a map with only the key-value pairs `"Scarborough": 42` and `"Scaffolding": 451` doesn't have associated data for their shared parent `"Sca"`. | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| Let's take a look at the C implementation: | ||
|
|
||
| ```c | ||
| #pragma pack(1) | ||
| typedef struct { | ||
| uint16_t labelLen; | ||
| uint16_t numChildren : 9; | ||
| uint8_t flags : 7; | ||
| void *data; | ||
| char label[]; | ||
| //... here come the first letters of each child | ||
| //... now come the children pointers | ||
| } TrieMapNode; | ||
|
|
||
| #pragma pack() | ||
| ``` | ||
|
|
||
| The core property of this type is that the fields, label, and the array of pointers to children take up a **single heap allocation**. This is really important for cache locality and minimizing pointer dereferences, but it also means the size of the type and some of the offsets of the fields of that type are not known at compile time. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it'd be good to state explicitly that the actual layout is not fully specified with this struct definition, as this is very much a dynamically sized type that is allocated dynamically on the heap. The comments add some info, but what needs to made more explicit is that:
This is important as it shows that there is a lot more to it than the type definition shows, and that will need to be handled with care during implementation
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| TODO: Diagram of the C memory layout, or pointing to the video. | ||
|
|
||
| This is a complex type, and translating it to Rust is difficult. Translating it to safe Rust is impossible, but that doesn't mean we can't translate it _safely_. | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| ## Porting Strategy: Establish a Baseline | ||
|
|
||
| The first component of our strategy was establishing a baseline implementation of the `TrieMap`, exposing the public API we want to exist at the end of the project and building out an extensive benchmark + test suite. | ||
|
|
||
| ```rust | ||
| #[repr (C)] | ||
| struct Node<T> { | ||
| // LowMemoryThinVec uses u16 for capacity, not usize. | ||
| // This is still separate heap allocated data though. | ||
| label: LowMemoryThinVec<u8>, | ||
| children: LowMemoryThinVec<ChildRef<T>>, | ||
| data: Option<T>, | ||
| } | ||
|
|
||
| #[repr (C)] | ||
| struct ChildRef<T> { | ||
| first_byte: u8, | ||
| node: Node<T>, | ||
| } | ||
| ``` | ||
|
|
||
| This implementation makes no attempt to match the performance characteristics of the C original. We have extra heap allocations for labels and children. | ||
|
|
||
| What we get from this baseline is code that works correctly, knowledge of performance characteristics, and a team that has a deeper understanding of the original implementation. | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| This working, tested, naive version and refresh of what the original was doing lowers the cognitive load when it comes to the team implementing a later version. | ||
|
|
||
| This first step in the porting strategy left us with 1. Total test coverage of what we were working on and 2. Knowledge of how subpar the performance of this naive TrieMap implementation. | ||
|
|
||
| TODO: Diagram of performance, violin chart. | ||
|
|
||
| Our implementation was twice as slow as the original C implementation, but about half as slow as an off-the-shelf crate from crates.io. Our implementation was also far less consistent in its speed, whereas the C implementation had very little variance. All the `Vec`s we used also meant the memory usage of the naive implementation was double that of the original. | ||
|
|
||
| ## Porting Strategy: Iterating for Performance | ||
|
|
||
| Having learnt as much as we could from the naive implementation, we can now start working to match the original. | ||
|
|
||
| We established that the core characteristic of the original TrieMap implementation is how each node occupies _a single heap allocation_. The label, array of pointers to children, and field data occupy the same contiguous space in memory. | ||
|
|
||
| We want to get as close to this original implementation as possible, but if we can make it simpler to maintain then we should. To this end, we decided to abandon the `pack` pragma to avoid dealing with unaligned accesses. CPUs are optimized for aligned access, so this should mean a negligible trade-off in memory footprint in exchange for speed. | ||
|
|
||
| We want our implementation to have the following properties: | ||
|
|
||
| 1. Each node takes up a single heap allocation. | ||
| 2. The layout is padded, unlike the packed original. | ||
|
|
||
| We designed out layout to be as follows: | ||
|
|
||
| TODO: Show the Rust Layout | ||
|
|
||
| There are some differences between our Rust layout and the original C layout: | ||
|
|
||
| 1. Our optional leaf data sits at the end of the struct, rather than before the label and children. | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, may even bring it into a short paragraph. |
||
| ### DIY Dynamically Sized Types | ||
|
|
||
| We want nodes to be a Dynamically Sized Type, like `&str` or `&[u8]`. Something whose size is not known at compile-time. | ||
|
|
||
| Our ideal rust implementation would look something like this: | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| ```rust | ||
| #[repr(C)] | ||
| struct IdealNode<T> { | ||
| label_len: u16, | ||
| n_children: u8, | ||
| label: [u8; self.label_len], | ||
| children_first_bytes: [u8; self.n_children], | ||
| children: [*Node<T>; self.n_children], | ||
| data: Option<T> | ||
| } | ||
| ``` | ||
|
|
||
| But we cannot express this in rust's type system. So instead, we split up the implementation into a stack type and a heap type. The stack type `Node<T>` stores the pointer, and the heap type `NodeHeader` stores our data as if it were `IdealNode<T>`. | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| ```rust | ||
| // Stack | ||
| pub(crate) struct Node<T> { | ||
| ptr: NonNull<NodeHeader>, | ||
| _phantom: PhantomData<T>, | ||
| } | ||
|
|
||
| // Heap | ||
| #[repr(C)] | ||
| #[derive(Clone, Copy, Debug)] | ||
| pub(super) struct NodeHeader { | ||
| pub label_len: u16, | ||
| pub n_children: u8, | ||
| // ...where are the other fields? | ||
| } | ||
| ``` | ||
|
|
||
| Note that in `NodeHeader` we can't express everything we want to in the type system. Instead, we avoid defining them directly and make sure to manually allocate them ourselves. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just like in the C implementation |
||
|
|
||
| ### Allocating a Dynamically Sized Type | ||
|
|
||
| The `alloc` function has the signature: | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| ```rust | ||
| pub unsafe fn alloc(layout: Layout) -> *mut u8 | ||
| ``` | ||
|
|
||
| Rather than C's `malloc` which takes a length in bytes, `alloc` takes a `Layout` type that represents the size and alignment of an allocation. | ||
|
|
||
| `Layout` exposes a fairly simple API, and we can define a layout to suit our complex needs at runtime. | ||
|
LukeMathWalker marked this conversation as resolved.
Outdated
|
||
|
|
||
| 1. `Layout::new` produces a layout for a sized type. | ||
| 2. `Layout::array` produces a layout for an array of a sized type, with a runtime-or-const `n` elements. | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
| 2. `Layout::extend` allows us to compose layouts together. | ||
|
|
||
| This composable API of `Layout` means it's fairly easy to build the `Layout` we need for our custom Dynamically Sized Type. | ||
|
|
||
| ## Porting Strategy: Managing Necessary Unsafe | ||
|
|
||
| Once we have our `Layout`, we can call `alloc` to get our pointer. This memory is uninitialized, but we can initialize it manually. Managing this memory manually only gets gnarly once we want to split or merge nodes. | ||
|
|
||
| In the final codebase we were finally left with **128** unsafe blocks and **21** unsafe methods. | ||
|
|
||
| Managing this unsafe code requires care and understanding for how developers will interact with it. Preferably a developer won't interact with it at all unless they're a maintainer. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The developer/maintainer distinction doesn't apply here. The project isn't really "open contributions", so all developers working on it are maintainers. |
||
|
|
||
| On top of all this, the client is moving to rust to minimize issues related to memory safety so it is important to build an API that can be trusted and can't be misused. | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| To deal with these constraints, we have a core axiom of **no `unsafe` in the public API**, on top of trying to rely on tooling as much as possible. | ||
|
LukeMathWalker marked this conversation as resolved.
Outdated
|
||
|
|
||
| ### Tooling & Patterns | ||
|
|
||
| Our management of the unsafe code is a mix of good documentation practices for critical areas & automated tooling. We can't fully automate this process, otherwise it wouldn't be `unsafe`, but we can do our best to build and maintain certainty. | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| 1. **Miri** | ||
|
|
||
| Miri runs your rust code in an interpreter against different memory models, checking for undefined behavior as it comes up. A great help when we're writing lots of unsafe code. | ||
|
|
||
| This only works on Rust code, so we can't apply this in other parts of Rust Redis where we cross FFI boundaries into C libraries, but we can use it here and anywhere else where all dependencies are rust. | ||
|
LukeMathWalker marked this conversation as resolved.
Outdated
|
||
|
|
||
| 2. **Debug assertions** | ||
|
|
||
| Debug assertions are checks that only run in debug builds, release builds do not pay the performance cost of them. Debug assertions let us check the invariants we want to build our API around at runtime and give good errors when those assertions fail. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They also further specify the invariants in code |
||
|
|
||
| Heavy use of debug assertions ties nicely with our use of Miri, whose errors can be overly esoteric or academic. Rust assertions are much easier to contextualize. | ||
|
|
||
| 3. **Clippy Lints** | ||
|
|
||
| Following [Jack Wrenn's guidance](https://jack.wrenn.fyi/blog/safety-hygiene/) we're using clippy lints to enforce two soft invariants: | ||
|
LukeMathWalker marked this conversation as resolved.
Outdated
|
||
|
|
||
| - 3.1: `undocumented_unsafe_blocks`: If an unsafe block does not have comments surrounding it, there will be a warning. | ||
|
|
||
| - 3.2: `multiple_unsafe_ops_per_block`: If an unsafe block contains more than one unsafe operation, there will be a warning. | ||
|
LukeMathWalker marked this conversation as resolved.
Outdated
|
||
|
|
||
| This pushes developers in the direction of properly documenting the unsafe code they do right, operation by operation, to assure all invariants are documented. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And of course, we're denying warnings in CI
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't quite follow, could you elaborate?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We're saying
Which is great and all, but warnings can be ignored. That's why we configured CI such that it will fail if there are any warnings |
||
|
|
||
| ### Safety Comments | ||
|
LukeMathWalker marked this conversation as resolved.
|
||
|
|
||
| Following more of Wrenn's guidance, we're also making sure that each safety comment mentions invariants as a list for easy comparison. | ||
|
|
||
| Enumerated safety comments differ between call sites and declaration sites. Declaration sites need to show the assumptions made (invariants) about inputs and context. Call sites need to explain how those invariants are met, one by one. | ||
|
|
||
| ```rust | ||
| // SAFETY: | ||
| // - The source range is all contained within a single allocation. | ||
| // - The destination range is all contained within a single allocation. | ||
| // - We have exclusive access to the destination buffer, since it was allocated earlier in this function. | ||
| // - No one else is mutating the source buffer, since this function owns it. | ||
| // - Both source and destination pointers are well aligned, see [PtrMetadata::children_ptr] | ||
| // - The two buffers don't overlap. The destination buffer was freshly allocated earlier in this function. | ||
| unsafe { | ||
| child_ptr.children().ptr().copy_from_nonoverlapping( | ||
| old_ptr.children().ptr(), | ||
| child_header.n_children as usize, | ||
| ) | ||
| }; | ||
| ``` | ||
|
|
||
| Note how long this safety comment is, with 128 unsafe blocks in the codebase this could become an overwhelming amount of information very quickly. This is where the next component of our unsafe management strategy comes in. | ||
|
|
||
| ### Abstractions that are Hard to Misuse | ||
|
|
||
| Unsafe doesn't let users or maintainers do "anything", but it does let them get away with misusing the tools they're given in ways that make software systems unpredictable. | ||
|
|
||
| To strengthen systems built on unsafe foundations, we start by **Layering Abstractions** in a way that gradually reduces the concerns of a maintainer. | ||
|
|
||
| The end goal is to have a outwardly safe API, and a internal implementation that restricts what mistakes a maintainer to specific areas of the code. Core, most-unsafe operations are built upon by still-unsafe abstractions that limit the operations the maintainer can perform, and those abstractions are built upon by more-safe abstractions. | ||
|
LukeMathWalker marked this conversation as resolved.
Outdated
|
||
|
|
||
| **Bottom**: `alloc`, `Layout`, and pointers. | ||
|
|
||
| This is the information we get from naive use of `alloc`. Invariants need to be manually upheld for all operations at this level. At this level, many different | ||
|
|
||
| **Near-Bottom**: `PtrMetadata<T>` + `PtrWithMetadata<T>` | ||
|
|
||
| At this level, we have the metadata needed to construct a pointer (`PtrMetadata`, which holds a `Layout` and the offsets of the fields not tracked by the ), as well as a version of the constructed pointer that is paired with its metadata. This pairing means a proof of the relationship between the pointer and the metadata of that pointer only needs to happen once, when `PtrWithMetadata` is constructed using the unsafe methods that construct it. | ||
|
|
||
| **Mid-Level**: Unsafe methods and functions. | ||
|
|
||
| At this layer we're developing methods and functions that perform unsafe operations but have API surfaces which are much more specialized for the tasks we're using them for. | ||
|
|
||
| **Top**: The Safe API | ||
|
|
||
| Finally, we have the safe API surrounding the data structure that we want end users to actually use. | ||
|
|
||
| ## Conclusion | ||
|
|
||
| This implementation is now in production in the Redis Query Engine! We can note & measure some key changes in the codebase from this transition. | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| || C | Rust | | ||
| |-|-|-| | ||
| |Lines of Code|~1k|~1.9k| | ||
| |Unsafe expressions|-|128| | ||
| |Coverage|82.8%|95.6%| | ||
| |Microbenchmarks|❌|✅| | ||
| |Performance|(baseline)|25% to 100% improvement in core operations| | ||
|
|
||
| The codebase did double in size, but with a move to a Rust codebase the unsafe operations have been confined to an area of the code a fraction of the size of the C codebase, where unsafe operations could have been happening anywhere. | ||
|
|
||
| Test coverage was greatly improved, the last percentage points in coverage are in unreachable areas. | ||
|
|
||
| Performance was also greatly improved, in part from changing the layout from Packed to Unpacked. | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| Real-world codebases on the level of complexity of the Redis Query Engine require looking carefully into how complex components can be replaced with alternative implementations that better fit the needs of maintainers and users. | ||
|
|
||
| What we get from a rewrite is more than just a codebase translated to a new language. A rewrite is an opportunity for an engineering team to relearn and re-contextualize the core problems that existing code exists to solve, to learn what's important about the building blocks of the software they're developing. | ||
|
|
||
| A rewrite is also an opportunity to reassess assumptions. In the original implementation we had a packed layout, we questioned that assumption and went with a different direction, as well as making other small changes in the memory layout of the data structure. Challenging these assumptions paid off, as our new implementation is faster and has a negligible increase in memory use. | ||
|
|
||
| We could have stopped at our naive implementation, failing the client on performance. We could have wrapped large areas of the codebase in unsafe, failing the client on maintainability. We could have come in with a new data structure entirely, failing the client on the performance assumptions made in the rest of their codebase. | ||
|
tall-vase marked this conversation as resolved.
Outdated
|
||
|
|
||
| The success of this rewrite in terms of speed, performance, and future maintainability is a consequence of taking the client's needs seriously and keeping an open mind on digesting the problems current code solved. | ||
Uh oh!
There was an error while loading. Please reload this page.