diff --git a/docs/internals/.gitignore b/docs/internals/.gitignore new file mode 100644 index 00000000..dfe00413 --- /dev/null +++ b/docs/internals/.gitignore @@ -0,0 +1,15 @@ +## Project Specific ## +book + +## IDEs ## +**/.vscode +**/.idea +**/.obsidian +**/.smart-env + +## Rust ## +**/target +**/Cargo.lock + +## macOS ## +**/.DS_Store diff --git a/docs/internals/Readme.md b/docs/internals/Readme.md new file mode 100644 index 00000000..46c81ddb --- /dev/null +++ b/docs/internals/Readme.md @@ -0,0 +1,35 @@ +# The Ixa Book + +## Prerequisits + +You need mdBook and the `mdbook-callouts` and `mdbook-inline-highlighting` +plugins. + +```bash +cargo install mdbook mdbook-callouts mdbook-inline-highlighting +``` + +## Building + +To build without opening it: + +```bash +mdbook build +``` + +...or to build and then open the rendered book in your browser: + +```bash +mdbook build --open +``` + +For authoring, use `serve` instead: + +```bash +mdbook serve --open +``` + +> The `serve` command watches the book’s `src` directory for changes, rebuilding +> the book and refreshing clients for each change; this includes re-creating +> deleted files still mentioned in `SUMMARY.md`! A websocket connection is used +> to trigger the client-side refresh. diff --git a/docs/internals/book.toml b/docs/internals/book.toml new file mode 100644 index 00000000..5462ec92 --- /dev/null +++ b/docs/internals/book.toml @@ -0,0 +1,22 @@ +[book] +authors = ["The Ixa Developers"] +language = "en" +src = "src" +title = "Notes On Ixa Internals" + +[build] +create-missing = false + +[output.html] +site-url = "/book/" + +[preprocessor.callouts] +# Callouts are the same as GitHub's admonitions. + +[output.html.playground] +# None of our code examples are runnable, and the playground interferes with inline highlighting. +runnable = false + +[preprocessor.inline-highlighting] +after = ["links"] +default-language = "rust" diff --git a/docs/internals/src/SUMMARY.md b/docs/internals/src/SUMMARY.md new file mode 100644 index 00000000..43b42fd6 --- /dev/null +++ b/docs/internals/src/SUMMARY.md @@ -0,0 +1,6 @@ +# Summary + +- [Benchmarking](benchmarking.md) +- [Entity System Design Notes](entities-design-notes.md) +- [EntitySet](entity_set.md) +- [Conditional Plan Execution Analysis](conditional-plan-execution-analysis.md) diff --git a/docs/internals/src/benchmarking.md b/docs/internals/src/benchmarking.md new file mode 100644 index 00000000..8270dda7 --- /dev/null +++ b/docs/internals/src/benchmarking.md @@ -0,0 +1,121 @@ +# Benchmarking + +Ixa's benchmark harness lives in the `ixa-bench` package. The current benchmark +tasks are defined in the top-level `mise.toml`; prefer those tasks over direct +`cargo` commands when running routine benchmarks. + +## Setup + +Install and activate mise, then trust the repository configuration: + +```sh +curl https://mise.run | sh +cd ixa +mise trust mise.toml +``` + +You can list the available tasks with: + +```sh +mise tasks +``` + +## Running Benchmarks + +Run all benchmark suites: + +```sh +mise run bench +``` + +This runs the Hyperfine suite first and then the Criterion suite. + +## Hyperfine Benchmarks + +Hyperfine benchmarks are registered in `ixa-bench/src` with the +`hyperfine_group!` macro. The current reference SIR comparison is the +`large_sir` group, which compares: + +- `baseline`: static reference implementation without Ixa +- `entities`: equivalent Ixa implementation with queries enabled + +Run all Hyperfine groups: + +```sh +mise run bench:hyperfine +``` + +Run only the `large_sir` group: + +```sh +mise run bench:hyperfine large_sir +``` + +Run a quick one-pass smoke test of the Hyperfine harness: + +```sh +mise run test:hyperfine +``` + +The `bench:hyperfine` task builds the benchmark binaries first through the +`build:hyperfine` task, then runs `target/release/hyperfine`. + +If you need to run one Hyperfine benchmark directly, use the `run_bench` binary: + +```sh +cargo run --bin run_bench -p ixa-bench --release -- --group large_sir --bench baseline +cargo run --bin run_bench -p ixa-bench --release -- --group large_sir --bench entities +``` + +## Criterion Benchmarks + +Run all Criterion benchmarks: + +```sh +mise run bench:criterion +``` + +Run a specific Criterion benchmark target: + +```sh +mise run bench:criterion sample_entity_scaling +``` + +The current Criterion benchmark targets are defined in `ixa-bench/Cargo.toml`. +Examples include `examples`, `large_dataset`, `algorithms`, `sampling`, +`indexing`, `counts`, `sample_entity_scaling`, `set_property`, and +`property_semantics`. + +The `sample_entity_scaling` target prints a scaling summary for +`sample_entity` cases, including whole-population sampling, indexed +single-property sampling, indexed multi-property sampling, and unindexed +single-property sampling. + +## Criterion Baselines + +Create a Criterion baseline: + +```sh +mise run bench:create --baseline main +``` + +Create a baseline for one benchmark target: + +```sh +mise run bench:create sample_entity_scaling --baseline main +``` + +Compare against a saved baseline: + +```sh +mise run bench:compare --baseline main +``` + +Compare one benchmark target against a saved baseline: + +```sh +mise run bench:compare sample_entity_scaling --baseline main +``` + +The `bench:compare` task runs `cargo bench` with the selected baseline and then +runs the `check_criterion_regressions` utility. diff --git a/docs/internals/src/conditional-plan-execution-analysis.md b/docs/internals/src/conditional-plan-execution-analysis.md new file mode 100644 index 00000000..ce19491b --- /dev/null +++ b/docs/internals/src/conditional-plan-execution-analysis.md @@ -0,0 +1,140 @@ +# Conditional plan execution: three possible approaches + +Here is my analysis of three different strategies for dealing with "canceling" in-flight plans associated with people who die. To understand *why* the pros and cons are what they are, it helps to first explain how plans are stored internally in Ixa today. + +## How plans work internally + +A scheduled plan is currently split into two parts. + +- The plan timing information - the `PlanId`, the execution time, and the execution phase - is stored in a priority queue called `queue`, implemented as a binary heap. +- The plan payload - at the moment, this is just the callback to run - is stored separately in a hash map called `data_map`, keyed by `PlanId`: `data_map: HashMap`. + +This design matters because cancellation is already done lazily in a sense: When a plan is cancelled, Ixa removes the payload from `data_map`, but it does *not* remove the corresponding entry from `queue`. Later, when the scheduler looks for the next plan to execute, it may pop entries from `queue` that no longer have a payload in `data_map`, and simply discard them. The rationale for this design decision is: + +- The removal operation for the `queue` involves a search (*O(log n)*) followed by a binary heap "fix up", and that operation is actually less efficient than just throwing out already cancelled plans that are popped from the top of the `queue` while fetching the next plan to execute. +- Said another way, removal from the `queue` is just ammortized over the life of the simulation rather than done eagerly. +- The assumption is that the memory cost of not eagerly removing items from `queue` is negligible, because the number of cancelled plans that are still "in flight" is expected to be small. + +## 1. Bookkeeping in a separate "plan index", then bulk cancellation when a person dies + +In this approach, client code keeps an additional index from each person to the plans associated with that person in a hash map `HashMap>`. When the person dies, the model looks up all of those plans and cancels them. At first this sounds appealing because it gives a direct way to say "cancel all future plans for this person." But in practice the advantages don't really materialize. + +The first problem is bookkeeping. Plans are added over time, so the index has to keep track of every plan that was ever associated with a person, for every single person. Removing plans from the index when they execute is awkward, so the simplest version is to leave executed plans in the index and only clear them out when the person dies. That means the index may hold a great deal of stale information over the course of a long simulation. + +The second problem is that bulk cancellation is less valuable than it may sound. Cancelling a plan removes its payload from `data_map`, but it still does not remove the corresponding entry from `queue`. So even after bulk cancellation, the timing entries remain in the queue until they eventually reach the top and are discarded. In other words, this approach does not truly "remove all future plans for this person" from the scheduler. It only removes their payloads early. + +That means the memory savings are limited. What we save is only the *payload* storage for plans that belong to already-dead people and have not yet been reached in simulated time. If that number is small, then the savings is small. Against that, we must pay the ongoing cost of maintaining a separate plan index for all people, across the whole simulation. On memory grounds, this is clearly a poor trade. + +There is one real possible advantage: if plans are cancelled in advance, we do not have to do an "is this person still alive?" check later when those plans come due. That might save some runtime. But it is not obvious that this savings is large enough to justify the extra bookkeeping, especially since the alive check itself is simple (just an index into a vector and with a small method call overhead). + +Overall, this approach seems unattractive. It adds substantial bookkeeping complexity, stores extra information for the whole simulation, and gives only limited benefit because cancelled plans still remain in the queue. + +Other advantages: + +- Implementable in client code; no modifications to ixa core necessary + +## 2. Add an optional `RunCondition` to the plan payload + +In this approach, a plan may optionally carry a `RunCondition` along with its callback. When the scheduler is choosing the next plan to execute, it checks the condition. If the condition does not hold, the plan is skipped. + +This matches the current internal design much better than the plan-index approach. Ixa already uses lazy cancellation: plans can remain in the queue even after their payload has effectively been removed. A `RunCondition` is similar in spirit. Instead of eagerly trying to remove plans from the scheduler, we wait until a plan is about to run and decide then whether it is still valid. + +This has several advantages. + +First, it avoids the extra bookkeeping cost of a separate per-person plan index. We do not need to remember all plans associated with all people for the whole simulation on top of the plan storage subsystem that already exists. + +Second, it is general. The condition does not have to be "person is alive." It could be any rule that can be checked from the current simulation state. That makes it useful for other cases too. + +Third, it makes conditional execution a built-in feature of the scheduler itself. That means Ixa could later do more with it if desired, for example recording how many plans were skipped or supporting better debugging tools around skipped plans. We could have first-class support for a *semantics* of plan execution built into ixa core. + +The main disadvantage is that this adds some complexity to the core plan system. Plans are no longer just callbacks; they may carry an additional condition. The scheduler also has to check that condition when deciding what to run next. This is not a huge conceptual change, but it is more machinery inside Ixa itself than the lightweight wrapper approach described below. + +A second possible disadvantage is runtime cost. Every gated plan now requires checking its condition when it is reached. If most plans are gated, and if the condition is not trivial, that cost could matter. On the other hand, for the simple "person is alive" case, the check is likely small, and this cost may be entirely acceptable in practice. + +Overall, this is a good fit if we think conditional plan execution is something Ixa itself should support as a first-class feature, not just a one-off convenience for a single model. Really its main selling point is that it provides a clear path forward for future enrichment of plan execution semantics. + +But we can get the same functionality with far less infrastructure by using strategy 3 of the next section. + +Advantages: + +- first-class support for "execution semantics" of plans +- simple to understand and reason about +- provides a more generic feature ("here is a condition to determine if a plan should execute") that can be used in other use cases. + +Disadvantages: + +- Requires support in ixa core, not just implementation in client code. +- We'd still probably want a convenience method in client code of the form `add_plan_for_person` that is implemented in terms of `RunCondition` anyway. But this is easy to do. + +## 3. A lightweight wrapper in client code: `add_plan_for_person` + +The simplest approach is to keep Ixa unchanged and handle the issue in model code. A helper like `add_plan_for_person` can wrap the user's callback in another callback that first checks whether the person is alive, and only then runs the original handler. + +```rust +/// Adds a plan for the given person if and only if that person is +/// alive when the plan comes due. +fn add_plan_for_person( + &mut self, + person_id: PersonId, + time: f64, + callback: impl FnOnce(&mut Context) + 'static, +) -> PlanId { + self.add_plan( + time, + |context| { + // Only execute callback if the person is still alive. + let Alive(is_alive) = context.get_property(person_id); + if is_alive { + callback(context) + } + } + ) +} +``` + +This is attractive because it is so simple. It does not change Ixa internals, does not require new bookkeeping, and is easy for model authors to understand. For the concrete case of person-associated plans, it expresses exactly what we want: "run this only if the person is still alive." + +Client code can still schedule plans unconditionally by using the existing `Context::add_plan` API. In other words, client code uses `add_plan_for_person` *if and only if* client code wants the plan execution gated on an "is alive" check. + +This approach can also be generalized to arbitrary conditions. Instead of hard-coding an alive check, the helper can take a `RunCondition` argument and apply it inside the wrapper callback. That makes it much closer in spirit to the built-in `RunCondition` approach *but without needing any support in ixa core.* + +```rust +/// Adds a plan for the given entity to be executed only if the `RunCondition` holds. +fn add_plan_for_person( + &mut self, + person_id: PersonId, + time: f64, + callback: impl FnOnce(&mut Context) + 'static, + run_condition: impl RunCondition +) -> PlanId { + self.add_plan( + time, + |context| { + // Only execute callback if the run condition holds + if run_condition.should_run(context) { + callback(context) + } + } + ) +} +``` + +Compared with the plan-index approach, the wrapper is clearly simpler and likely more memory-efficient. There is no extra global index to maintain, and no need to keep track of every plan ever associated with every person. The cost is just the condition check at execution time. This cost isn't zero, but it's likely small. We need to measure it. + +Compared with the built-in `RunCondition` approach, the lightweight wrapper contributes less to a future execution-semantics framework, but it remains forward-compatible with one. The scheduler itself does not know that a plan is conditional. From Ixa's point of view, it is just an ordinary callback that sometimes returns immediately without doing anything. + +One consequence of this is, in the generalized wrapper design, the condition cannot naturally receive the `PlanId` unless the underlying scheduling API changes, whereas if we had full in-built support in ixa for `RunCondition`, we could include both `context: &Context` and `plan_id: PlanId` parameters to the `RunCondition::should_run` method. Still, for the immediate use case, that difference may not matter much. If all we need is "do nothing if the person is no longer alive," then the wrapper behaves almost the same as a built-in condition check. + +## Overall comparison + +The plan-index strategy is the weakest of the three. It adds the most bookkeeping, stores extra information for the whole simulation, and gets less benefit than one might expect because cancelled plans still remain in the queue. + +The built-in `RunCondition` strategy is the most powerful and the cleanest if we want conditional execution to be a first-class concept within Ixa. It fits naturally with the current lazy approach to plan cancellation, avoids the cost of a separate index, and provides a scaffolding for richer plan execution semantics, introspection on execution conditions, statistics collection, and so forth, that we might want to conceptually attach to conditional execution. + +The lightweight wrapper strategy is the simplest. It solves the immediate problem with very little machinery, keeps the core plan system unchanged, and can be generalized enough to cover many practical cases. Its main limitation is that it remains agnostic about architecting richer support for inspecting or analyzing skipped plans. + +My personal choice: lightweight wrappers give us the biggest payout for the lowest cost and is pretty low-stakes. + +## Shared API over all three strategies + +Under all three strategies you would have an `add_plan_for_person` helper method as the primary access point to the functionality for client code. In the jargon of software engineering, the `add_plan_for_person` helper provides an "abstraction boundary" that prevents us from having to change every single call site in client code in the event, for example, that we do decide to have first-class support for `RunCondition` execution gates in ixa core's plan execution subsystem. We would only have to change the implementation of `add_plan_for_person`. This reduces [coupling](https://en.wikipedia.org/wiki/Coupling_(computer_programming)) between client code implementation and implementation of ixa core. diff --git a/docs/internals/src/entities-design-notes.md b/docs/internals/src/entities-design-notes.md new file mode 100644 index 00000000..022fc626 --- /dev/null +++ b/docs/internals/src/entities-design-notes.md @@ -0,0 +1,334 @@ +# Entity System Design Notes + +This chapter describes the current internal architecture of Ixa's entity system. +It is not intended to duplicate the user-facing entity and property guides in +the main Ixa Book. For user-facing syntax and examples, see the migration guide, +the indexing chapter, and the forthcoming properties chapter in the Ixa Book. + +Historical notes are kept here only when they explain the current design or a +still-open design question. + +## Entity Model + +An `Entity` is a type-level marker for a collection of related properties. In +model code, entities are usually declared with `define_entity!`, and existing +types can implement the trait with `impl_entity!`. + +The `define_entity!` macro also creates the conventional entity ID alias. For +example, `define_entity!(Person)` creates `PersonId = EntityId`. + +`EntityId` is intentionally typed by entity. A `PersonId` cannot be passed +where a `SchoolId` is expected, even though both are represented internally as a +row index. The row index itself is opaque outside `ixa`; only Ixa internals can +construct new `EntityId` values. + +Entity counts do not live on the entity marker type. They live in the +per-context `EntityStore`, because entity marker types are defined by client +code and should not be able to create IDs or modify population counts directly. + +`PopulationIterator` iterates over the valid `EntityId` values for an +entity type. It captures the population size when the iterator is created, so +entities added later are not included in that iterator. + +## Registration and Store Ownership + +`Context` owns an `EntityStore` directly. The `EntityStore` contains one +`EntityRecord` for each registered entity type. Each record tracks the current +entity count and lazily initializes the entity's `PropertyStore`. + +Registration uses macro-generated `ctor`s. At startup, entity and property +metadata is collected into global registries. The metadata is frozen on first +read, and late registration after that point is treated as an internal error. + +There are two kinds of IDs in play: + +- `TypeId` is used where the question is type identity, such as validating that + an initialization list contains the required property types. +- Numeric entity and property IDs are used for fast lookup in stores. + +Property IDs are scoped to an entity type. It is possible for `Property` and +`Property` to have the same numeric property ID when `E1 != E2`. Internal +metadata that needs a stable property key therefore uses `(E::id(), P::id())`. + +Each `PropertyStore` contains one type-erased +`PropertyValueStoreCore` for each registered property of `E`. This is a +change from older notes that described lazily initializing individual property +stores from a `Vec>>`. Today, the `PropertyStore` +itself is lazy, but once it exists its property value stores are constructed +from the frozen property metadata. + +## Property Model + +The Rust value type is also the property type. The trait implementation +`impl Property for Age` is what makes `Age` a property of `Person`. + +All property values satisfy the internal `AnyProperty` bounds: + +```rust +Copy + Debug + PartialEq + Eq + Hash + 'static +``` + +Those bounds matter because property values and canonical values can be used as +index keys, query keys, event payloads, and stored column values. + +Every property has one of three initialization kinds: + +- `Explicit`: the value must be supplied when the entity is created. +- `Constant`: a constant default is used unless creation supplies a value. +- `Derived`: the value is computed from dependencies and cannot be set directly. + +The old "required versus explicit" question is resolved in the implementation: +a non-derived property without `default_const` is explicit, and explicit +properties are required during `add_entity`. + +Macros remain the intended way to implement the `Entity` and `Property` traits +correctly. In particular: + +- `define_property!` defines a public property type and delegates to + `impl_property!`. +- `impl_property!` attaches an existing type to an entity. +- `define_derived_property!` records dependency information used to update + derived-property indexes and events. +- `define_multi_property!` uses canonical values and shared `index_id()` + machinery to support multi-property indexes. + +`define_property!` emits public generated types and public generated fields for +struct properties. When model code needs custom visibility, attributes, +additional derives, or more complex Rust syntax, it should define the type +itself and use `impl_property!`. + +### Property IDs and Index IDs + +Most properties use their own property ID as their index ID. Multi-properties +are the important exception. Equivalent multi-properties, such as +`(Age, InfectionStatus)` and `(InfectionStatus, Age)`, share a single +underlying index. For that reason `Property::id()` answers "which property is +this?" while `Property::index_id()` answers "which property value store owns the +index this property should use?" + +### Canonical Values + +`Property::CanonicalValue` is the internal representation used by indexes and +indexed query lookup. For ordinary properties this is usually the property type +itself. For multi-properties, the canonical value is the component tuple in a +stable order, allowing equivalent multi-properties to share one index. + +The Ixa Book's properties chapter covers user-facing property syntax, custom +display behavior, `Option` properties, floating-point equality and hashing, +and canonical values in more detail. This internals chapter only depends on the +fact that indexes use canonical values as keys. + +## Property Storage + +`PropertyValueStoreCore` owns the storage and index state for one +property: + +- `data: Vec

` stores non-derived property values. +- `index: PropertyIndex` stores the property's current index, if any. +- value-change counters are stored alongside the property. + +Derived properties have no backing value vector. `Context::get_property` +computes them from the current context and entity ID. + +Constant-default properties use a storage optimization: trailing default values +do not have to be materialized in `data`. If a constant property has not stored +a value for an entity ID, `get` can return `P::default_const()`. + +Explicit properties do not have that fallback. The entity creation path enforces +that every explicit property has a value before the new entity can be created. + +## Add Entity Flow + +`Context::add_entity` currently returns `Result, IxaError>`. + +The flow is: + +1. Validate that the supplied property list contains distinct property types. +2. Check that all required properties are present. +3. Create a new typed `EntityId`. +4. Write initial property values into the `PropertyStore`. +5. Catch up enabled indexes for newly added entities. +6. Emit `EntityCreatedEvent`. +7. Return the new ID. + +Initial property writes during entity creation do not emit property-change +events. Entity creation has its own `EntityCreatedEvent`. + +Public entity initialization uses either the entity marker type for all-default +initialization: + +```rust +context.add_entity(Person) +``` + +or the `with!` macro for explicit values: + +```rust +context.add_entity(with!(Person, Age(42), InfectionStatus::Susceptible)) +``` + +Public initialization APIs no longer accept naked tuples such as +`(Age(42),)`. + +### Open Question: Fallible or Panicking `add_entity` + +The old design question about `add_entity` is still live. The current API +returns `Result`, but the tradeoff remains: + +- Returning `Result` makes sense for cases where entity creation might be + driven by external input, a debugging interface, or a web/API layer. +- Panicking can be more ergonomic for ordinary model code, where an invalid + initialization list is a programmer error and recovery is unlikely. +- A possible future API split would be `add_entity` for the common panicking + path and `try_add_entity` for fallible callers. + +For now, the implementation remains fallible. + +## Set Property and Derived Dependents + +`Context::set_property` can only set non-derived properties. Derived properties +are recomputed from their dependencies. + +The current algorithm is: + +1. Snapshot previous values for the property being set and any dependent + derived properties that need change processing. +2. Write the new non-derived property value. +3. Emit the partial change events. Each partial event recomputes the current + value, updates value-change counters when the value actually changed, + updates indexes by removing the previous value and inserting the current + value, and emits a `PropertyChangeEvent`. + +`set_property` intentionally emits property-change events even when +`current == previous`. This treats the event as a report of a write transaction, +not just a report that the instantaneous state changed. Code that only cares +about real value changes can compare `event.current` and `event.previous`. + +Value-change counters are stricter: they update only when +`current != previous`. + +The old allocation concern around partial property-change events has been +partly addressed. Partial events use `SmallBox`, and the dependent-event list +inside `set_property` uses `SmallVec`. + +### Why Index Catch-Up Uses Narrow `unsafe` + +Index storage used to rely on interior mutability. The current design removed +`RefCell` from `PropertyIndex`, so index reads return plain references and +index writes require mutable references. This makes query and iterator +reference types simpler and avoids carrying runtime borrow guards through +`EntitySet` and `EntitySetIterator`. + +The cost is a narrow use of `unsafe` in the index catch-up paths used by +`ContextEntitiesExt::add_entity` and `ContextEntitiesExt::index_property`. + +The core issue is that index catch-up mutates a `PropertyStore`, while +indexing derived properties may need a shared `&Context` to compute +`P::compute_derived(context, entity_id)`. Rust can express partial borrows of +different fields in local code, but this pattern crosses method boundaries +through `Context`. The implementation therefore uses a raw context pointer to +create a shared context reference while mutating the relevant property store. + +The intent is not arbitrary aliasing. The mutable access is limited to index +internals, and the shared context reference is used for read-only property +access needed to compute derived values. + +## Query Model + +Public query APIs use either: + +- `with!(Entity, prop1, prop2, ...)` for property filters, or +- the entity marker itself, such as `Person`, for a whole-population query. + +The unit type `()` still exists internally as an empty query, but it is not the +preferred public API. Query tuples are wrapped in `EntityPropertyTuple` so +the query carries the entity type explicitly. + +The main public query methods are: + +- `query(query) -> EntitySet` +- `query_result_iterator(query) -> EntitySetIterator` +- `with_query_results(query, callback)` for scoped `EntitySet` access +- `query_entity_count(query)` +- `sample_entity`, `count_and_sample_entity`, and `sample_entities` +- `get_entity_iterator::()` for whole-population iteration + +Use `query_result_iterator` for ordinary iteration. Use `query_entity_count` +for counts. `with_query_results` exists for code that needs direct access to an +`EntitySet`, especially when an indexed query can be represented by borrowing +an indexed source without constructing an intermediate vector. + +The scoped callback matters because an `EntitySet` may hold immutable +references into `Context`, such as a reference to an index bucket. While that +set is live, the context cannot be mutably borrowed. `with_query_results` +contains that borrow inside the callback. + +The details of `EntitySet` and `EntitySetIterator` live in the +[EntitySet](entity_set.md) chapter. + +## Indexing and Multi-Properties + +Indexes are per-context. Enabling an index affects only that `Context`. + +There are two index modes: + +- `context.index_property::()` enables a full index. +- `context.index_property_counts::()` enables a value-count index unless + a full index already exists. + +Full indexes support both query result sets and counts. Value-count indexes +support counts but not entity-set lookup. + +Enabling an index catches it up to the current population. After that, indexes +are maintained during entity creation and property changes. + +`is_property_indexed` exists only under `#[cfg(test)]`. It is useful for tests, +but client code does not need to ask whether an index is already enabled. +Calling an indexing method more than once is not an error. + +### Multi-Properties + +A multi-property is a derived tuple property created with +`define_multi_property!`: + +```rust +define_multi_property!((Age, InfectionStatus), Person); +``` + +The main use is joint indexing: a query over `Age` and `InfectionStatus` can +use the shared multi-property index instead of intersecting separate component +sources. + +Equivalent multi-properties with reordered components share one `index_id`. +For example, `(Age, InfectionStatus)` and `(InfectionStatus, Age)` point to the +same underlying index. Component values are canonicalized into a stable order +before indexed lookup. + +Events remain type-specific through the normal `PropertyChangeEvent` +machinery. A multi-property is still a property type, even when its index is +shared with an equivalent ordering. + +Multi-properties are not indexed automatically. The current behavior is +explicit indexing: + +```rust +context.index_property::(); +``` + +The old auto-indexing question remains worth preserving. Multi-properties have +uses besides joint indexing: they can serve as event types, derived tuple +properties, and value-combination counting keys. Automatic indexing would impose +memory and maintenance costs even when a multi-property is being used for one +of those non-index purposes. + +## Events + +The entity subsystem emits two core event types: + +- `EntityCreatedEvent` after successful entity creation. +- `PropertyChangeEvent` after property writes and derived dependent + updates. + +Property-change events are part of the `set_property` flow. A write to one +non-derived property can emit events for that property and for derived +properties whose values may have changed because of the write. diff --git a/docs/internals/src/entity_set.md b/docs/internals/src/entity_set.md new file mode 100644 index 00000000..037a13b0 --- /dev/null +++ b/docs/internals/src/entity_set.md @@ -0,0 +1,86 @@ +# EntitySet + +`EntitySet` and `EntitySetIterator` are the chosen public +representations for entity query results. They are not just implementation +details. Their purpose is to give callers a stable interface for working with +query results while hiding how those results are represented internally. + +The query API itself is covered in the +[Entity System Design Notes](entities-design-notes.md) chapter. This chapter +focuses on the result types returned by that API. + +## Role in Queries + +The public query APIs accept either the entity marker type for a whole-population +query or a `with!(Entity, ...)` value for property filters. Query results are +then represented as one of two forms: + +- `EntitySet` when code needs a reusable set expression. +- `EntitySetIterator` when code can stream matching entity IDs directly. + +`Context::query` returns an `EntitySet`. `Context::query_result_iterator` +returns an `EntitySetIterator`. `Context::with_query_results` gives scoped +access to an `EntitySet` through a callback. + +Other APIs, such as `query_entity_count` and sampling methods, use these +representations as appropriate. + +## EntitySet + +`EntitySet` represents a set expression over entity IDs. The current +implementation can represent: + +- a source set, such as the whole population or an index bucket; +- intersections; +- unions; +- differences; +- the empty set. + +This matters because an indexed query can often be represented by borrowing an +existing index bucket instead of constructing a new vector of entity IDs. For +unindexed queries, an `EntitySet` can represent the query as a composition of +sources and filters. + +## Scoped Access and Borrowing + +An `EntitySet` may borrow from the `Context`. For example, an indexed query +can represent its result by holding an immutable reference to an index bucket. +That is the point of the abstraction: callers can work with query results +without knowing whether those results are backed by an index, a population +range, or a composed set expression. + +The consequence is that the `Context` cannot be mutably borrowed while such an +`EntitySet` is live. This is ordinary Rust borrowing behavior, but it matters +for API design because many model operations mutate the context. + +`Context::with_query_results` exists to provide scoped access to an +`EntitySet`. The callback receives the set, uses it, and then the borrow ends +when the callback returns. This makes the borrowing boundary explicit and keeps +later context mutation straightforward. + +## EntitySetIterator + +`EntitySetIterator` is the streaming form. It yields `EntityId` values and +has optimized paths for common cases such as whole-population iteration and +indexed source iteration. + +Query code sometimes constructs an iterator directly instead of first building +an `EntitySet`. This is a performance choice: in tight loops, avoiding an +intermediate set expression can reduce overhead. + +## Whole-Population and Empty Queries + +Whole-population queries have special paths. Passing the entity marker type, +such as `Person`, means "all entities of this type." Internally, this can use a +`PopulationIterator` over the entity IDs from `0..entity_count`. + +Empty result sets also have explicit representations so query code can avoid +unnecessary iteration when a lookup proves there are no matching entities. + +## Relationship to Indexes + +Full indexes can provide the entity IDs for a property value directly, so they +can back both `EntitySet` and `EntitySetIterator` results. + +Value-count indexes only store counts. They can speed up `query_entity_count`, +but they cannot provide entity IDs for `EntitySet` or `EntitySetIterator`. diff --git a/mise.toml b/mise.toml index 608dc37f..1e369705 100644 --- a/mise.toml +++ b/mise.toml @@ -49,6 +49,10 @@ run = ''' mdbook build docs/book -d ../../website/book ''' +[tasks.'docs:internals'] +description = "Build the internal documentation book" +run = "mdbook build docs/internals" + [tasks.test-all] description = "Run all tests" depends = ["test:unit", "test:wasm"] @@ -185,6 +189,7 @@ run = ''' cargo clean rm -rf pkg integration-tests/ixa-wasm-tests/{pkg,node_modules} rm -rf website/{doc,debug,book} + rm -rf docs/internals/book rm -f website/{.rustc_info.json,.rustdoc_fingerprint.json} rm -rf examples/**/output/*.csv '''