Conversation
99930c3 to
f792078
Compare
|
I'm not personally entirely convinced that inline-asm is useful here. I'll give my take, and then maybe propose what I would like to see. For reference, I had a very similiar conversation with the opsem team for expressing "memory persistent across reboots" (e.g. rtc-mem, or just non-initialized sections with a warm reboot). IMO, the main point here is that we are doing something spooky:
In general to avoid the optimizer ruining our day, and to make it "aware" of spooky action at a distance, we need to make the optimizer realize that things can change outside of typical AM interactions. There are two ways to do this with a static:
Re: this comment:
I somewhat disagree: I think this would likely work, but in general volatile is not intended for objects that are "inbounds" of the Rust Abstract Machine: we've given the static a name and a place, which means it is not just some MMIO or external linker section region outside the knowledge of the compiler. Even the linux kernel agrees volatile is not to be used for synchronization. The "right" way to do this is generally with fences and ordering guarantees, and firming up our contract of what it means for the region to be "ours", and "not ours". This means likely continuing to use the atomics as we do (AtomicU32s are just My general suggestion for what we should be doing (not very far from what we do now):
There are also a couple things I don't love to see with atomics here:
I don't know how easy it would be to change some of the coordination rules, and update Humility at the same time. But I think just writing down EXACTLY what the rules are for mutual exclusion, we could nudge hiffy into a much sounder place, and then maybe consider if there are potential defects in how we do coordination with multiple atomic variables. |
|
Very helpful thoughts, thanks @jamesmunns ! My goal for this PR is to make changes which don't require a Humility bump, which limits us somewhat (can't change types of statics, can't mess with the atomic dance). Here's a quick, somewhat-simplified diagram of how the pieces interact: Humilityflowchart LR
Idle --READY == 1--> Kicked
Kicked[Write KICK = 1]
Kicked --REQUEST or ERRORS changed--> Done
Done[Read results]
Done --> Idle
Hubrisflowchart LR
Startup --> Ready
Ready[Set READY = 1]
Ready --Delay-->Unready
Unready[Set READY = 0]
Unready --KICK == 1--> Running[Clear KICK\nRun script]
Unready --KICK == 0-->Ready
Running --Success--> Success[Increment REQUESTS]
Running --Failure--> Fail[Increment ERRORS]
Success --> Ready
Fail --> Ready
Data is owned by Hubris when Follow-up changes based on feedback:
|
|
Uh oh, some of our CPUs don't actually support I'm going to polyfill it in |
|
@mkeeter imo, at least in the embedded rust world at large, there are two main ways to abstract over "atomic or not":
I actually don't know if we allow tasks to take "real" critical sections, and also I don't know how well this model works with something like a debugger which is... somewhat different than single/multi core modeling. If we don't let tasks disable atomics, there's always kernel user helpers like what linux offers for old archs.
I'm not certain Re: reviewing atomic ordering, I don't have my best thinking cap on right now, I'll see if my head is a little less cloudy tomorrow to see if all of your assertions wrt ordering make sense to me. I would also deeply trust @hawkw's opinions on the matter. If READY is part of the synchronization between the debugger and task (we can model the debugger's interactions as if it was a separate thread, running on the "debug" core), I'm a little sus at making it Relaxed. |
|
I don't really think we should be attempting to "abstract over atomic or not" here. Typically, it is only safe to allow something that must be atomic on multi-threaded platforms to be non-atomic on other platforms because the platforms where it is not atomic are inherently always single-threaded. That doesn't really apply here: all our targets are single-threaded, but in this code, there is always the possibility that we will be interrupted at any point by:
I don't think it makes sense to think of the debugger as "another thread", exactly, since when the debugger is operating on us, we are halted. This makes it somewhat more similar to an interrupt than a concurrent thread of execution: it may halt us at any time, potentially in the midst of a non-atomic, multi-instruction operation, but we are not concurrently mutating or reading things when it has halted us. Given that, I don't think we can safely rely on compare-and-swap operations in I haven't had a chance to actually review the code here yet, but I intend to; this is mostly a response to James' last comment. |
This was my interpretation of the Rust reference, which says the following about
My reading of this is that For the atomics wrangling, the module-level docs for hubris/lib/armv6m-atomic-hack/src/lib.rs Lines 5 to 28 in 2d57a80 It's also worth noting that in our existing hiffy, the |
| HIFFY_READY.store(1, Ordering::Relaxed); | ||
| hl::sleep_for(sleep_ms); | ||
| HIFFY_READY.fetch_sub(1, Ordering::SeqCst); | ||
| HIFFY_READY.store(0, Ordering::Relaxed); |
There was a problem hiding this comment.
I agree that these needn't be SeqCst. However, I think we may actually be concerned about the second store to HIFFY_READY being reordered around the compare_exchange, which may be multiple instructions, to HIFFY_KICK.
There was a problem hiding this comment.
I've rewritten the compare_exchange as an explicit load/store pair, so I think we're now good on reordering, even with these marked as Relaxed (the load comes first and uses Ordering::Acquire).
|
|
||
| if HIFFY_KICK.load(Ordering::SeqCst) == 0 { | ||
| // Humility writes `1` to `HIFFY_KICK` | ||
| if HIFFY_KICK.compare_exchange_acqrel(1, 0).is_ok() { |
There was a problem hiding this comment.
I'm unconvinced by making this a CAS especially given that it may not actually be a CAS on some supported hardware. I think that in the previous code, and in the case where we are on a non-CAS MCU, the debugger could observe the state where we have been kicked, but have not yet decremented HIFFY_KICK, but...I don't actually think that matters, since it's not going to kick us again, right? And, the tradeoff is that the compare_exchange makes it less obvious that this may or may not be possible by tricking the reader into believing it's actually a LDREX/STREX...
There was a problem hiding this comment.
I'm fine either way – the polyfill expresses intent more clearly, but explicit load-stores are less misleading. However, if we don't use the polyfill for CAS, we should also stop using it for fetch_add elsewhere in this code. Deal?
There was a problem hiding this comment.
I've replaced the polyfilled compare-and-swap with an explicit load/store pair, and done the same for the fetch_add operations below!
| // Use an inline assembly instruction without `nomem`, so the | ||
| // compiler must assume that any memory can be invalidated (in | ||
| // this case, by the debugger). | ||
| core::arch::asm!("", options(nostack, preserves_flags)); |
There was a problem hiding this comment.
Hmm, I think I trust you about this...
There was a problem hiding this comment.
It's a little hand-wavey, but does seem to accomplish something:
https://godbolt.org/z/odEExhxYz
There was a problem hiding this comment.
I still think this is a roundabout way to achieve what fence/compiler_fence is designed for: https://godbolt.org/z/o9x1G47fd
There was a problem hiding this comment.
Okay, I'm now coming around to the idea that the inline assembly is not necessary – but that a compiler fence also isn't necessary, because of the Acquire load of HIFFY_KICK above!
We know that Humility only writes HIFFY_KICK after writing values to the data arrays in RAM. Doing an Acquire load of an atomic both forces the value to be loaded from RAM and adds a dmb, so it's already accomplishing the goals of compiler_fence / fence: https://godbolt.org/z/6ba7WhTP6
I'm less certain (but we should probably go ask T-Opsem), what I'm worried about is:
I'm going off of #t-opsem > Persistent memory across reboots @ 💬, from Amanieu:
|
8139560 to
958ce5d
Compare
|
@mkeeter drafted some diagrams for you (let me know if you have notes/want the monodraw file): |
jamesmunns
left a comment
There was a problem hiding this comment.
Overall happy with this, modulo the addition of the state diagrams, and the one main comment below.
| // - [`HIFFY_READY`] => Variable that will be non-zero iff the HIF | ||
| // execution engine is waiting to be kicked | ||
| // | ||
| #[used] |
There was a problem hiding this comment.
@mkeeter somewhat insisting on no_mangle! I think pub + used might be enough, but I don't think there is any particular downside to picking pub + no_mangle instead, so let's be as "loud" as possible to the compiler about it. Also wrote up some justification/explanation here that I think is worth having for future reader's context.
The pub+no_mangle should apply to all three of these, as well as the four tracking variables and HIFFY_FAILURE.
| #[used] | |
| // We are making the following items "no mangle" and "pub" to hint to the compiler | |
| // that they are "exposed", and may be written (spookily) outside the scope | |
| // of Rust itself. The aim is to prevent the optimizer from *assuming* the | |
| // contents of these buffers will remain unchanged between accesses, as they | |
| // will be written directly by the debugger. | |
| // | |
| // Below, we use atomic ordering (e.g. Acquire and Release) to inhibit | |
| // compile- and run-time re-ordering around the explicit sequencing performed | |
| // by the HIFFY_READY, HIFFY_KICK, HIFFY_REQUESTS, and HIFFY_ERRORS that are | |
| // used to arbitrate shared access between the debugger and this software task. | |
| #[unsafe(no_mangle)] |
|
Oh, and overall: I think this change is just spiffy. |
|
@jamesmunns I've added the diagrams, Interestingly, I need to keep I also switched the |
57d702e to
0d251d9
Compare
0d251d9 to
0604f3d
Compare

task-hiffyhas bad vibes: it usesstatic mutarrays, and accesses them without an obviousSAFETYcomment.The task is necessarily in a weird no man's land, because the debugger writes to those arrays to load HIF text and data; how can Rust know that they have changed? A fully robust option would be to use
read_volatileeverywhere; unfortunately, downstream libraries (hifand eventuallypostcard) expect Rust slices, and we don't want to hack all of them (or double our memory footprint by copying data around).This PR is an attempt to improve the vibes in three ways:
'staticlifetimes into HIF, which could theoretically stash them