Skip to content

Make task-hiffy less iffy#2475

Merged
mkeeter merged 6 commits intomasterfrom
mkeeter/hiffy-soundness
Apr 21, 2026
Merged

Make task-hiffy less iffy#2475
mkeeter merged 6 commits intomasterfrom
mkeeter/hiffy-soundness

Conversation

@mkeeter
Copy link
Copy Markdown
Collaborator

@mkeeter mkeeter commented Apr 10, 2026

task-hiffy has bad vibes: it uses static mut arrays, and accesses them without an obvious SAFETY comment.

The task is necessarily in a weird no man's land, because the debugger writes to those arrays to load HIF text and data; how can Rust know that they have changed? A fully robust option would be to use read_volatile everywhere; unfortunately, downstream libraries (hif and eventually postcard) expect Rust slices, and we don't want to hack all of them (or double our memory footprint by copying data around).

This PR is an attempt to improve the vibes in three ways:

  • Instead of accessing the globals directly, convert them to a pointer and then back into a slice with a bounded lifetime. Previously, we were passing slices with 'static lifetimes into HIF, which could theoretically stash them
  • Add an assembly block which may invalidate any memory, so Rust must assume the arrays may have changed
  • Add a SAFETY comment explaining the above

@mkeeter mkeeter requested review from cbiffle, hawkw and labbott April 10, 2026 20:30
@mkeeter mkeeter force-pushed the mkeeter/hiffy-soundness branch from 99930c3 to f792078 Compare April 10, 2026 20:35
@jamesmunns
Copy link
Copy Markdown
Contributor

jamesmunns commented Apr 13, 2026

I'm not personally entirely convinced that inline-asm is useful here. I'll give my take, and then maybe propose what I would like to see. For reference, I had a very similiar conversation with the opsem team for expressing "memory persistent across reboots" (e.g. rtc-mem, or just non-initialized sections with a warm reboot).

IMO, the main point here is that we are doing something spooky:

  1. We have a block of memory, INSIDE the abstract machine (e.g. it is a variable known to the compiler as a static)
  2. We have somewhat arbitrated access:
  3. The contents are "garbage" until we receive a request to do something. This is arbitrated by the HIFFY_KICK variable being non-zero
  4. When the variable IS zero, the contents are garbage
  5. When the variable IS NOT zero, we assume it will only change once will not change as long as it is non-zero.
  6. We therefore can model the contents of HIFFY_TEXT, HIFFY_DATA, and HIFFY_RSTACK as being contained within a mutex. When KICK is zero, it is locked by "someone else", and we cannot read or write the contents. When KICK is nonzero, the hiffy task has exclusive (ish, others might peek, but not in a way that is meaningful for us) ownership of these regions
  7. One ambiguity I'm not sure about, technically we've made things a bit more complicated by immediately subtracting kick, so actually the mutex area becomes "released" when we either increment HIFFY_REQUESTS or HIFFY_ERRORS?
  8. I think this PR's addition of non-'static lifetime bounds to the slice is probably a net-good: it prevents downstream callers from storing-off the static reference, stashing it somewhere.

In general to avoid the optimizer ruining our day, and to make it "aware" of spooky action at a distance, we need to make the optimizer realize that things can change outside of typical AM interactions. There are two ways to do this with a static:

  • Make it a non-mut static UnsafeCell. UnsafeCell's contract says that while you are not holding a reference derived from a call to .get(), the contents are allowed to change in any spooky way, but while you are holding that reference, it MUST NOT change in spooky ways, e.g. you must ONLY interact with that data through the reference you created. One papercut is that UnsafeCell is !Sync so you can't put it in a non-mut static, but that's why I wrote the grounded crate, which does the unsafe impl for you.
  • Make it a mut static, AND ensure that is defined that the symbol of the static is "exported", e.g. "linker visible", therefore the compiler/optimizer must assume that usage of this region can "escape" the visibility here, e.g. external code has a chance to poke it. The three main ways to achieve this are:
    • give it an explicit link_section name
    • make the static pub
    • make the static no_mangle

Re: this comment:

A fully robust option would be to use read_volatile everywhere

I somewhat disagree: I think this would likely work, but in general volatile is not intended for objects that are "inbounds" of the Rust Abstract Machine: we've given the static a name and a place, which means it is not just some MMIO or external linker section region outside the knowledge of the compiler. Even the linux kernel agrees volatile is not to be used for synchronization.

The "right" way to do this is generally with fences and ordering guarantees, and firming up our contract of what it means for the region to be "ours", and "not ours". This means likely continuing to use the atomics as we do (AtomicU32s are just UnsafeCell<u32>s!), and better define what our synchronization rules are, then treat it like a more boring manual mutex implementation. Mara's book is an EXTREMELY good reference for exactly this kind of work.

My general suggestion for what we should be doing (not very far from what we do now):

  1. Declare the regions in a way that makes them escaped, e.g. UnsafeCell/static mut+.
  2. Optional: Collapse many of the atomic vars, AT LEAST any required for coordination, like READY/KICK, into a single atomic var that uses CAS operations (or critical sections) to guarantee atomicity
  3. Determine when the region has data and can be "locked"
  4. Do the locking, with appropriate "acquire" fence semantics, and create the scope-bounded mut refs necessary
  5. Update any outputs, then drop the references, and mark the operation as complete with a "release" fence, ensuring that all writes are not reordered across the release.

There are also a couple things I don't love to see with atomics here:

  1. The use of SeqCst everywhere, see the section "Myth: Sequentially consistent memory ordering is a great default and is always correct." in the link to Mara's book above
  2. There are many separate atomic vars for the "coordination", e.g. both HIFFY_KICK and HIFFY_REQUESTS/HIFFY_ERRORS (or maybe HIFFY_READY?)
  3. There are some non-CAS operations being done outside of a critical section on one variable, e.g. a separated HIFFY_KICK.load and HIFFY_KICK.fetch_sub, rather than a CAS loop that decrements atomically if non-zero.
  4. Some uses of fetch_add and fetch_sub without checking the output at all, and in cases where store(1) or store(0) would be more appropriate. There's no actual guard against under/overflow here: atomic ops are always defined to wrap, which means we could underflow or get out of the "expected" band pretty easily.

I don't know how easy it would be to change some of the coordination rules, and update Humility at the same time. But I think just writing down EXACTLY what the rules are for mutual exclusion, we could nudge hiffy into a much sounder place, and then maybe consider if there are potential defects in how we do coordination with multiple atomic variables.

@mkeeter
Copy link
Copy Markdown
Collaborator Author

mkeeter commented Apr 13, 2026

Very helpful thoughts, thanks @jamesmunns !

My goal for this PR is to make changes which don't require a Humility bump, which limits us somewhat (can't change types of statics, can't mess with the atomic dance).

Here's a quick, somewhat-simplified diagram of how the pieces interact:

Humility

flowchart LR
    Idle --READY == 1--> Kicked
    Kicked[Write KICK = 1]
    Kicked --REQUEST or ERRORS changed--> Done
    Done[Read results]
    Done --> Idle
Loading

Hubris

flowchart LR
    Startup --> Ready
    Ready[Set READY = 1]
    Ready --Delay-->Unready
    Unready[Set READY = 0]
    Unready --KICK == 1--> Running[Clear KICK\nRun script]
    Unready --KICK == 0-->Ready
    Running --Success--> Success[Increment REQUESTS]
    Running --Failure--> Fail[Increment ERRORS]
    Success --> Ready
    Fail --> Ready
Loading

Data is owned by Hubris when HIFFY_KICK == 0 and HIFFY_REQUESTS/ERRORS are unchanged. Hubris increments the request/error count when finished, so Humility has loaded the previous value and is checking for a changed value (again, kinda weird design, but let's not break compatibility now).


Follow-up changes based on feedback:

  • Use HIFFY_KICK.compare_exchange_weak(1, 0, Ordering::Acquire, Ordering::Relaxed).is_ok() to check whether Humility has kicked us, instead of separate load + subtract operations. In practice, Humility won't be adversarial, but let's do the right thing.
  • Use Ordering::Release when updating HIFFY_REQUESTS/ERRORS
    • Move the HIFFY_FAILURE write to be before the write to HIFFY_ERRORS; this was wrong beforehand!
  • Make HIFFY_READY writes Relaxed: it's a single atomic variable (so the writes are still ordered w.r.t. each other), and it isn't actually used as a memory barrier (just as a flag to Humility)
  • Add #[used] annotations to the shared arrays (TEXT / DATA / RSTACK)
    • Do you have strong opinions on #[no_mangle] vs #[used] for making the symbol exported? It looks like they both export the symbol, Humility hasn't had any issues finding the variables by name, and #[used] doesn't require unsafe 😅

@mkeeter
Copy link
Copy Markdown
Collaborator Author

mkeeter commented Apr 13, 2026

Uh oh, some of our CPUs don't actually support compare_exchange 🙃

I'm going to polyfill it in AtomicU32Ext (only for the AcqRel, Acquire ordering, because I don't want to copy this whole table over). This is a little sketchy, but no worse than what we're doing elsewhere in Hiffy, and properly conveys our intent.

@jamesmunns
Copy link
Copy Markdown
Contributor

@mkeeter imo, at least in the embedded rust world at large, there are two main ways to abstract over "atomic or not":

  • Use the portable-atomic crate, ideally with the critical-section backend, to take critical sections to do CAS ops on non-CAS targets. This works best when there is a SINGLE atomic item, like an atomicbool/atomicu32 that controls things.
  • Be generic over a coordination trait, like I do in bbqueue, that exposes what actions should be done "atomically", and have a CAS and Critical Section impl for each, gated by cpu capabilities.

I actually don't know if we allow tasks to take "real" critical sections, and also I don't know how well this model works with something like a debugger which is... somewhat different than single/multi core modeling. If we don't let tasks disable atomics, there's always kernel user helpers like what linux offers for old archs.

Do you have strong opinions on #[no_mangle] vs #[used] for making the symbol exported?

I'm not certain #[used] lifts the symbol to public. We should at least make it pub. IMO what we're doing is unsafe, we shouldn't necessarily steer away from #[unsafe(no_mangle)]. I'd have to check with the opsem folks about which attrs are guaranteed to "lift" a static's visibility.

Re: reviewing atomic ordering, I don't have my best thinking cap on right now, I'll see if my head is a little less cloudy tomorrow to see if all of your assertions wrt ordering make sense to me. I would also deeply trust @hawkw's opinions on the matter. If READY is part of the synchronization between the debugger and task (we can model the debugger's interactions as if it was a separate thread, running on the "debug" core), I'm a little sus at making it Relaxed.

@hawkw
Copy link
Copy Markdown
Member

hawkw commented Apr 13, 2026

I don't really think we should be attempting to "abstract over atomic or not" here. Typically, it is only safe to allow something that must be atomic on multi-threaded platforms to be non-atomic on other platforms because the platforms where it is not atomic are inherently always single-threaded. That doesn't really apply here: all our targets are single-threaded, but in this code, there is always the possibility that we will be interrupted at any point by:

  • an interrupt context switching us into the kernel, or,
  • (perhaps more importantly), the debug probe stopping the world

I don't think it makes sense to think of the debugger as "another thread", exactly, since when the debugger is operating on us, we are halted. This makes it somewhat more similar to an interrupt than a concurrent thread of execution: it may halt us at any time, potentially in the midst of a non-atomic, multi-instruction operation, but we are not concurrently mutating or reading things when it has halted us.

Given that, I don't think we can safely rely on compare-and-swap operations in hiffy if they may not be atomic on some targets. We cannot, AFAIK, rely on a crate like critical-section as @bitshiftmask suggests, because critical-section will only prevent hardware interrupts by temporarily disabling them for the duration of the critical section. But, disabling interrupts does not prevent a SWD probe from halting us (and I dont immediately know of a way we can refer a SWD halt, though perhaps there's some way to do so...). Instead, we might need to consider something like a sequence lock, which does not rely on critical-section, on systems which don't have CAS operations.

I haven't had a chance to actually review the code here yet, but I intend to; this is mostly a response to James' last comment.

@mkeeter
Copy link
Copy Markdown
Collaborator Author

mkeeter commented Apr 13, 2026

I'm not certain #[used] lifts the symbol to public. We should at least make it pub. IMO what we're doing is unsafe, we shouldn't necessarily steer away from #[unsafe(no_mangle)]. I'd have to check with the opsem folks about which attrs are guaranteed to "lift" a static's visibility.

This was my interpretation of the Rust reference, which says the following about #[used]:

The used attribute can only be applied to static items. This attribute forces the compiler to keep the variable in the output object file (.o, .rlib, etc. excluding final binaries) even if the variable is not used, or referenced, by any other item in the crate. However, the linker is still free to remove such an item.

#[no_mangle] says the following:

The no_mangle attribute may be used on any item to disable standard symbol name mangling. The symbol for the item will be the identifier of the item’s name.

Additionally, the item will be publicly exported from the produced library or object file, similar to the used attribute.

My reading of this is that #[used] does the part which we care about (public export), since #[no_mangle] defines this behavior in terms of #[used] (although it uses the phrase "similar to", which is annoyingly imprecise).


For the atomics wrangling, the module-level docs for armv6m-atomic-hack provide some additional context:

//! Provides fake atomic read-modify-write operations for situations where you
//! _really_ know what you're doing.
//!
//! Pulling this trait in can cause code written for ARMv7-M and later machines,
//! which have atomic read-modify-write operations, to compile on ARMv6-M. This
//! is, in general, not safe: the program wanted an atomic read-modify-write and
//! you're faking it with a non-atomic sequence. However, in our _specific_ case
//! on Hubris, we can do this safely because
//!
//! 1. Tasks are isolated.
//! 2. Tasks are single-threaded.
//! 3. ISRs do not access memory shared with tasks.
//!
//! If any of those three points is wrong in your case, do not use these, it
//! will go badly for you.
//!
//! Everything in this crate is conditional on the `armv6m` config, which means
//! (1) it will only work inside the Hubris build system and (2) accidentally
//! including it on armv7m or later won't pull in the bogus implementations.
//!
//! # Why not just disable interrupts
//!
//! Because (1) we can't from unprivileged mode and (2) we don't out of
//! principle anyway.

It's also worth noting that in our existing hiffy, the fetch_add and fetch_sub operations are already implemented using this polyfill.

Comment thread task/hiffy/src/main.rs
Comment on lines +161 to +163
HIFFY_READY.store(1, Ordering::Relaxed);
hl::sleep_for(sleep_ms);
HIFFY_READY.fetch_sub(1, Ordering::SeqCst);
HIFFY_READY.store(0, Ordering::Relaxed);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that these needn't be SeqCst. However, I think we may actually be concerned about the second store to HIFFY_READY being reordered around the compare_exchange, which may be multiple instructions, to HIFFY_KICK.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten the compare_exchange as an explicit load/store pair, so I think we're now good on reordering, even with these marked as Relaxed (the load comes first and uses Ordering::Acquire).

Comment thread task/hiffy/src/main.rs Outdated

if HIFFY_KICK.load(Ordering::SeqCst) == 0 {
// Humility writes `1` to `HIFFY_KICK`
if HIFFY_KICK.compare_exchange_acqrel(1, 0).is_ok() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unconvinced by making this a CAS especially given that it may not actually be a CAS on some supported hardware. I think that in the previous code, and in the case where we are on a non-CAS MCU, the debugger could observe the state where we have been kicked, but have not yet decremented HIFFY_KICK, but...I don't actually think that matters, since it's not going to kick us again, right? And, the tradeoff is that the compare_exchange makes it less obvious that this may or may not be possible by tricking the reader into believing it's actually a LDREX/STREX...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way – the polyfill expresses intent more clearly, but explicit load-stores are less misleading. However, if we don't use the polyfill for CAS, we should also stop using it for fetch_add elsewhere in this code. Deal?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced the polyfilled compare-and-swap with an explicit load/store pair, and done the same for the fetch_add operations below!

Comment thread task/hiffy/src/main.rs Outdated
Comment on lines +201 to +204
// Use an inline assembly instruction without `nomem`, so the
// compiler must assume that any memory can be invalidated (in
// this case, by the debugger).
core::arch::asm!("", options(nostack, preserves_flags));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I trust you about this...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little hand-wavey, but does seem to accomplish something:
https://godbolt.org/z/odEExhxYz

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is a roundabout way to achieve what fence/compiler_fence is designed for: https://godbolt.org/z/o9x1G47fd

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm now coming around to the idea that the inline assembly is not necessary – but that a compiler fence also isn't necessary, because of the Acquire load of HIFFY_KICK above!

We know that Humility only writes HIFFY_KICK after writing values to the data arrays in RAM. Doing an Acquire load of an atomic both forces the value to be loaded from RAM and adds a dmb, so it's already accomplishing the goals of compiler_fence / fence: https://godbolt.org/z/6ba7WhTP6

@jamesmunns
Copy link
Copy Markdown
Contributor

@mkeeter:

My reading of this is that #[used] does the part which we care about (public export), since #[no_mangle] defines this behavior in terms of #[used] (although it uses the phrase "similar to", which is annoyingly imprecise).

I'm less certain (but we should probably go ask T-Opsem), what I'm worried about is:

  • used ensures the item is not optimized away, and therefore is visible to the program and debugger, BUT
  • I am not sure if it makes the compiler aware that other software components can poke at it. I think we need the compiler to be convinced that other software components have a chance to peek and poke at the data, NOT just the debugger, as I'm not sure if the compiler really considers the role of a debugger in that aspect

I'm going off of #t-opsem > Persistent memory across reboots @ 💬, from Amanieu:

Screenshot 2026-04-14 at 3 19 47 PM

used was not explicitly mentioned there, so I'm less sure.

@mkeeter mkeeter force-pushed the mkeeter/hiffy-soundness branch from 8139560 to 958ce5d Compare April 14, 2026 15:22
@mkeeter mkeeter changed the title Improve the vibes of task-hiffy Make task-hiffy less iffy Apr 14, 2026
@jamesmunns
Copy link
Copy Markdown
Contributor

jamesmunns commented Apr 20, 2026

@mkeeter drafted some diagrams for you (let me know if you have notes/want the monodraw file):

┌─────────────────────────────────────────────────────────────────────────────────┐
│                                                                                 │
│                                ┌────────────────┐                ┌────────┐     │
│          ┌────────┐ READY == 1 │ Read REQUEST   │   REQUEST or   │ Read   │     │
│        ┌▷│  Idle  │───────────▷│ Read ERRORS    │───────────────▷│ RESULT │     │
│        │ └────────┘            │ Write KICK = 1 │ ERRORS changed │        │     │
│        │                       └────────────────┘                └────────┘     │
│        │                                                              │         │
│        └──────────────────────────────────────────────────────────────┘         │
│ ┌──────────┐                                                                    │
└─┤ Humility ├────────────────────────────────────────────────────────────────────┘
  └──────────┘                                                                     
                                                                                   
┌─────────────────────────────────────────────────────────────────────────────────┐
│                                                                                 │
│                                        KICK == 0                                │
│                    ┌────────────────────────────────────────────────┐           │
│                    │                                                │           │
│                    │                                       ┌─────────────────┐  │
│  ┌─────────┐       │  ┌─────────────────┐     ┌───────┐    │ Write READY = 0 │  │
│  │ Startup │──┬────┴─▷│ Write READY = 1 │────▷│ Sleep │───▷│ Read KICK       │  │
│  └─────────┘  │       └─────────────────┘     └───────┘    │                 │  │
│               │                                            └─────────────────┘  │
│               │                                                     │           │
│               │  ┌───────────────────────────────┐                  ▽           │
│               │  │ Read REQUESTS                 │ Success ┌─────────────────┐  │
│               ├──│ Write REQUESTS = REQUESTS + 1 │◁────┐   │ Write KICK = 0  │  │
│               │  └───────────────────────────────┘     ├───│ Execute script  │  │
│               │  ┌───────────────────────────────┐     │   │                 │  │
│               │  │ Read ERRORS                   │     │   └─────────────────┘  │
│               └──│ Write ERRORS = ERRORS + 1     │◁────┘                        │
│                  └───────────────────────────────┘ Failure                      │
│ ┌────────────┐                                                                  │
└─┤ Hiffy Task ├──────────────────────────────────────────────────────────────────┘
  └────────────┘                                                                                                                                                      

Copy link
Copy Markdown
Contributor

@jamesmunns jamesmunns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall happy with this, modulo the addition of the state diagrams, and the one main comment below.

Comment thread task/hiffy/src/main.rs Outdated
// - [`HIFFY_READY`] => Variable that will be non-zero iff the HIF
// execution engine is waiting to be kicked
//
#[used]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkeeter somewhat insisting on no_mangle! I think pub + used might be enough, but I don't think there is any particular downside to picking pub + no_mangle instead, so let's be as "loud" as possible to the compiler about it. Also wrote up some justification/explanation here that I think is worth having for future reader's context.

The pub+no_mangle should apply to all three of these, as well as the four tracking variables and HIFFY_FAILURE.

Suggested change
#[used]
// We are making the following items "no mangle" and "pub" to hint to the compiler
// that they are "exposed", and may be written (spookily) outside the scope
// of Rust itself. The aim is to prevent the optimizer from *assuming* the
// contents of these buffers will remain unchanged between accesses, as they
// will be written directly by the debugger.
//
// Below, we use atomic ordering (e.g. Acquire and Release) to inhibit
// compile- and run-time re-ordering around the explicit sequencing performed
// by the HIFFY_READY, HIFFY_KICK, HIFFY_REQUESTS, and HIFFY_ERRORS that are
// used to arbitrate shared access between the debugger and this software task.
#[unsafe(no_mangle)]

Comment thread task/hiffy/src/main.rs
Comment thread task/hiffy/src/main.rs
@jamesmunns
Copy link
Copy Markdown
Contributor

Oh, and overall: I think this change is just spiffy.

@mkeeter
Copy link
Copy Markdown
Collaborator Author

mkeeter commented Apr 20, 2026

@jamesmunns I've added the diagrams, no_mangle block comment, and switched from #[used] to #[unsafe(no_mangle)] for the relevant variables (and made them pub).

Interestingly, I need to keep #[used] on the HIFFY_VERSION_* variables for them to be present in the output binary; they're not touched by Hubris code, so that's presumably fine.

I also switched the HIFFY_FUNCTIONS to #[unsafe(no_mangle)]; it was previously a mix of #[unsafe(no_mangle)] and #[used(compiler)]. These aren't used in our code, but must be present so that Humility can extract types from the DWARF info 🫠

@mkeeter mkeeter force-pushed the mkeeter/hiffy-soundness branch from 57d702e to 0d251d9 Compare April 20, 2026 19:19
@mkeeter mkeeter force-pushed the mkeeter/hiffy-soundness branch from 0d251d9 to 0604f3d Compare April 21, 2026 19:45
@mkeeter mkeeter merged commit 2d3a962 into master Apr 21, 2026
183 checks passed
@mkeeter mkeeter deleted the mkeeter/hiffy-soundness branch April 21, 2026 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants