Genesis-Embodied-AI · hughperkins · May 16, 2026 · May 16, 2026 · May 16, 2026 · May 16, 2026
diff --git a/docs/source/user_guide/compound_types.md b/docs/source/user_guide/compound_types.md
@@ -169,6 +169,31 @@ sim.step()
 
 `@qd.data_oriented` objects can also be passed as `qd.Template` parameters to kernels defined outside the class, and they support nesting (one `@qd.data_oriented` struct containing another).
 
+### stable_members
+
+**Recommended for any `@qd.data_oriented` class whose ndarray members are allocated once (typically in `__init__`) and not subsequently rebound — the common case.** Decorate with `stable_members=True`:
+
+```python
+@qd.data_oriented(stable_members=True)
+class Simulation:
+    def __init__(self, n):
+        self.x = qd.ndarray(qd.f32, shape=(n,))
+        self.v = qd.ndarray(qd.f32, shape=(n,))
+        # ... more ndarray / field / primitive members
+```
+
+This skips a per-call walk that Quadrants otherwise runs to detect ndarray member rebinding between kernel launches. The walk is O(number of ndarray members) per kernel call, so the savings scale with the container's size.
+
+Microbenchmark on an RTX PRO 6000 Blackwell with a container holding 30 `qd.ndarray` members across two nesting levels, calling a trivial kernel that takes the container as a `qd.template()` arg:
+
+| | Per-launch Python overhead |
+|---|---|
+| `stable_members=False` (default) | 18.5 µs/call |
+| `stable_members=True` | 13.5 µs/call |
+| | **−5 µs/call (−28%)** |
+
+**Trade-off:** with `stable_members=True`, reassigning an ndarray member on an instance is undefined behavior — the previously compiled kernel will be reused even if the new ndarray has a different `dtype`, `ndim`, or layout, silently bit-reinterpreting the new array's storage. Set it only on classes whose ndarray members are allocated once (typically in `__init__`) and never rebound. See [Reassigning ndarray members](#reassigning-ndarray-members) below for the supported alternative.
+
 ### Primitive members
 
 Primitive members on `self` (e.g. `int`, `float`, `bool`, `enum.Enum`) are supported, but they are treated as **template values**: each distinct primitive value across instances triggers a new kernel compilation, with the value baked into the kernel IR.
@@ -318,6 +343,8 @@ Practical consequence:
 
 For `@qd.data_oriented` containers passed via `qd.Template`, reassigning an ndarray member between kernel launches is supported, including changes to `dtype`, `ndim`, or layout. A new specialised kernel is compiled and cached for the new shape; subsequent launches with the original shape continue to use the original cached kernel. (For `@dataclasses.dataclass` containers — passed via the dataclass-type annotation — the member binding follows the standard dataclass mutability rules: frozen dataclasses can't rebind, non-frozen ones can, and a rebind triggers a fresh kernel arg setup on the next launch.)
 
+This support is only available on `@qd.data_oriented` classes *without* the [`stable_members=True`](#stable_members) opt-in. Setting `stable_members=True` is a promise that ndarray members on instances of the class are never reassigned; if you break that promise the previously compiled kernel is silently reused against the new ndarray.
+
 ### Restrictions
 
 - **`@qd.dataclass` cannot contain `qd.ndarray` or `qd.field` members.** See the [`@qd.dataclass`](#qddataclass-qdtypesstruct) section above for the full list of allowed member types. (The function-form factory `qd.types.struct(...)` has the same restrictions.)

diff --git a/docs/source/user_guide/fastcache.md b/docs/source/user_guide/fastcache.md
@@ -95,14 +95,17 @@ Fastcache supports the following parameter types:
 | `qd.types.NDArray` (scalar, vector, matrix) | Yes | dtype, ndim, layout |
 | `torch.Tensor` | Yes | dtype, ndim |
 | `numpy.ndarray` | Yes | dtype, ndim |
-| `dataclasses.dataclass` | Yes | member types recursively; member values if annotated with `FIELD_METADATA_CACHE_VALUE` (see [Appendix — compound-type cache keying](#compound-type-cache-keying)) |
-| `@qd.data_oriented` objects | Yes | member types recursively; primitive member types and values baked into kernel (see [Appendix — compound-type cache keying](#compound-type-cache-keying)) |
+| `dataclasses.dataclass` | Yes | member types recursively (narrowed to members the kernel reads or writes); member values if annotated with `FIELD_METADATA_CACHE_VALUE` (see [Advanced — compound-type cache keying](#compound-type-cache-keying)) |
+| `@qd.data_oriented` objects | Yes | member types recursively (narrowed to members the kernel reads or writes); primitive member types and values baked into kernel (see [Advanced — compound-type cache keying](#compound-type-cache-keying)) |
 | `qd.Template` primitives (int, float, bool) | Yes | type and value (baked into kernel) |
 | Non-template primitives (int, float, bool) | Yes | type only |
 | `enum.Enum` | Yes | name and value |
-| `qd.field` / `ScalarField` / `MatrixField` | **No** | — |
+| `qd.field` / `ScalarField` / `MatrixField` at a kernel-read path | **No** | — |
+| Anything else at a kernel-read path | **No** | — |
 
-If any parameter is of an unsupported type, fastcache is disabled for that call and the kernel falls back to normal compilation. For `qd.field` / `ScalarField` / `MatrixField` arriving through a `qd.Tensor`-annotated parameter, this is silent — no warning is emitted. For other unsupported types, a warning is logged at the `warn` level identifying the offending parameter.
+If any kernel-used parameter is of an unsupported type, fastcache is disabled for that call and the kernel falls back to normal compilation. For `qd.field` / `ScalarField` / `MatrixField` arriving through a `qd.Tensor`-annotated parameter, this is silent — no warning is emitted. For other unsupported types, a warning is logged at the `warn` level identifying the offending parameter.
+
+Kernel-unused members of any type — including unrecognised ones — do **not** disable fastcache. Fastcache skips them entirely, so opaque metadata (UUIDs, Pydantic configs, parent back-pointers) attached to a `@qd.data_oriented` or `dataclasses.dataclass` instance is harmless as long as the kernel doesn't read it.
 
 ### 3. Source code must be available
 
@@ -120,6 +123,12 @@ Each compiled artifact is stored under a key derived from all of the following:
 
 When any of these change, the resulting key is different, so a new compilation occurs and a new entry is stored. Previous entries remain on disk — multiple cached versions coexist. You do not need to manually clear the cache when making code changes — the hash mismatch causes a transparent recompilation.
 
+### Two strict invariants
+
+1. **If the kernel does not read or write a variable, it is entirely ignored by fastcache.** It will not cause fastcache to fail, nor emit a warning, nor emit an error.
+
+2. **Unrecognised types at variables the kernel reads or writes must not be silently dropped or hashed by type-name.** If the value of such a variable has a type fastcache doesn't explicitly handle (Pydantic models, UUIDs, third-party tensor wrappers, …), fastcache is disabled for the call with a one-shot `[FASTCACHE][UNKNOWN_TYPE]` warning identifying the offending type plus an `[INVALID_FUNC]` log line confirming the cache is off.
+
 ## Advanced
 
 ### Diagnostics
@@ -143,32 +152,25 @@ print(obs.cache_stored)         # True if the compiled kernel was stored to cach
 
 On the first run you'll see `cache_stored=True` but `cache_loaded=False`. On the second run (after `qd.init`), `cache_loaded=True`.
 
-## Appendix
-
 ### Compound-type cache keying
 
-The args hasher walks compound-type kernel parameters recursively. For each leaf member it decides what (if anything) contributes to the cache key. The headline rules:
+For `@qd.data_oriented` and `dataclasses.dataclass` kernel parameters, fastcache walks members recursively. Any members that are not themselves read or written by the kernel, nor contain members read or written by the kernel, are skipped during the walk (per the [strict invariants](#two-strict-invariants) above). Member-by-member behavior:
 
-**`@qd.data_oriented`:** the walker descends into `vars(obj)`. For each child:
+- **`qd.ndarray` member** — `(dtype, ndim, layout)` is included in the cache key. Element values are not.
+- **Primitive (`int` / `float` / `bool` / `enum.Enum`) member.** The handling depends on the enclosing container:
+  - In a `@qd.data_oriented` instance — value is baked into the kernel, same as a `qd.Template` primitive. Two instances of the same class with different primitive member values get different cache entries.
+  - In a `dataclasses.dataclass` instance — only the type is included by default. To include the value too, annotate the field with `FIELD_METADATA_CACHE_VALUE`:
 
-- `qd.ndarray` member — `(dtype, ndim, layout)` is included in the cache key. Element values are not.
-- Primitive (`int` / `float` / `bool` / `enum.Enum`) member — value is baked into the kernel (same semantics as a `qd.Template` primitive). Two instances of the same class with different primitive member values get different cache entries.
-- Nested `@qd.data_oriented` member — recurses.
-- Nested `dataclasses.dataclass` member — recurses (with the dataclass rules below).
-- `qd.field` member — fastcache is disabled for the entire kernel call. The kernel still runs via normal compilation; a warn-level log line is emitted.
-
-**`dataclasses.dataclass`:** the walker descends into the declared members. For each member, only the *type* is included in the cache key by default — **not** the value. To include a member's value, annotate it:
-
-```python
-import dataclasses
-from quadrants.lang._fast_caching import FIELD_METADATA_CACHE_VALUE
-
-@dataclasses.dataclass
-class SimConfig:
-    num_layers: int = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
-    dt: float = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
-```
+    ```python
+    import dataclasses
+    from quadrants.lang._fast_caching import FIELD_METADATA_CACHE_VALUE
 
-This is necessary whenever the compiled kernel depends on the member's *value* rather than just its type (for example, when the value is used as a loop bound that the compiler bakes into the generated code). Without the annotation, two `SimConfig` instances with different `num_layers` values would share a fastcache key, and the second instance would silently load a kernel compiled for the wrong value.
+    @dataclasses.dataclass
+    class SimConfig:
+        num_layers: int = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
+        dt: float = dataclasses.field(metadata={FIELD_METADATA_CACHE_VALUE: True})
+    ```
 
-Note the asymmetry: `@qd.data_oriented` primitive members are baked into the kernel automatically (same semantics as `qd.Template`); `dataclasses.dataclass` members contribute only their *type* to the cache key unless you opt in per-member.
+    Annotate any member whose *value* (not just type) affects the compiled kernel. Primarily this means any variable used inside [`qd.static`](static.md).
+- **Nested `@qd.data_oriented` or `dataclasses.dataclass` member** — recurses with the same rules (so an `int` inside a nested `@qd.data_oriented` is still baked into the kernel; an `int` inside a nested `dataclasses.dataclass` still needs `FIELD_METADATA_CACHE_VALUE` to bake its value).
+- **`qd.field` member** — fastcache is disabled for the entire kernel call. The kernel still runs via normal compilation; a warn-level log line is emitted.