Fix multi-region reads in dumps by mkeeter · Pull Request #2485 · oxidecomputer/hubris

mkeeter · 2026-04-22T19:09:29Z

While testing unrelated code, I noticed that reading a particular variable over the network failed. I switched to humility readmem, and ran into the same issue:

➜  hubris jj:(zuuk) h readmem 0x24021f70
humility: connecting to fe80::c1d:8cff:fec0:e207%28
humility readmem failed: 0x24021f70 can't be read via the archive or over the network

Caused by:
    dump agent failed: invalid response: Err(BadSegmentAdd)

Suspiciously, the buffer in question spans multiple MPU regions:

0x080057a8 0x24021a00 - 0x24021bff     512 rw---- 5  packrat
0x080057bc 0x24021c00 - 0x24021fff    1KiB rw---- 5  packrat
0x080057d0 0x24022000 - 0x24023fff    8KiB rw---- 5  packrat

It turns out that this is the userland equivalent of #1674: jefe checks to see if a read was contained within a single MPU region, but would reject reads which span multiple regions (even if they're contiguous and owned by the same task).

In this PR, I rely on the fact that region descriptors are sorted by base address to incrementally check for overlaps. I also add some documentation about get_task_dump_region's behavior and guarantees: it's got special behavior for the 0th region, which may not be sorted*

*usually, kernel memory is below task memory, but now that we can put tasks in dtcm, that's not always true!

jamesmunns

Some notes, but I also admit to not fully understanding what makes this change somewhat spicy :)

jamesmunns · 2026-04-23T12:16:21Z

+        let mut mem = start..end;
+        for ndx in 1..=usize::MAX {
+            // This is Accidentally Quadratic; see the note in `dump_task`
+            let Some(region) = kipc::get_task_dump_region(task, ndx) else {


nit/out of scope refactor suggestion: It feels like we probably could stand to have a helper for "iterate over dump regions", to avoid manual calls like this. It would be neat to be able to do:

for region in kipc::iter_task_dump_regions(task).skip(1) { // ... }

jamesmunns · 2026-04-23T12:23:22Z

+                // If we are beyond the start of our `mem` region, then there
+                // are no more overlaps and we can bail out immediately.
+                break;
+            }


Misc perf nit: it might be worth tracking if we've STARTED matching on items, and bail here if started && region.start < mem.start, because then we've found non-continugous overlaps

Assuming:

region(1): 0x1000..0x2000

region(2): 0x3000..0x4000

And the user asked to dump 0x1000..0x3800, currently I think they would never get okay = true because we'd take 0x1000..0x2000, leaving 0x2000..0x3800, but then 0x3000..0x4000 doesn't contains 0x2000, but then we keep iterating over all remaining regions.

This could probably also be:

} else if (region.start != start) && region.start < mem.start { break; }

Seems reasonable! N is small here (which is why we don't mind quadratic behavior), but also has a context switch into the kernel, so I support cheap early-exit optimizations.

jamesmunns · 2026-04-23T12:25:40Z

+        // Note: we also implicitly trust that kipc gives us regions which are
+        // in sorted order by base address.
+        let mut mem = start..end;
+        for ndx in 1..=usize::MAX {


Just checking, we DO allow dump requests that span multiple task ranges, but NOT one that spans the kernel and one or more task ranges? If this isn't for a specific reason: why do we need the first if block at all, and could this just be covered by for ndx in 0..=usize::MAX?

it's got special behavior for the 0th region, which may not be sorted*

Oh, I missed that in the PR description, let me see again if I missed that in the docs, otherwise this might be worth putting there.

Oh, my eyes glossed over the two other places you mentioned this. It might be worth mentioning this explicitly here again (for folks like myself that are oblivious outside local context, apparently), but you've definitely covered your bases. :D

I decided to make this more obvious by splitting the userlib functions into

fn get_task_desc_region(task_id: usize) -> TaskDumpRegion (infallible, no index)

fn get_task_dump_region(task_id: usize, region_id: usize) -> Option<TaskDumpRegion> (fallible, 0-indexed, sorted)

This means that users don't have to remember that index 0 is special at the raw KIPC layer; the raw KIPC wrapper is renamed to get_task_dump_region_raw.

It feels confusing and error-prone to have the two get_task_dump_region and get_task_dump_region_raw functions that are both used by jefe but have region index args that are off by 1. Some alternate ideas:

Have the only two userlib functions be get_task_dump_region (which does GetTaskDumpRegionRequest(region+1)) and get_task_desc_region (which does GetTaskDumpRegionRequest(0)). Then userspace has one consistent definition for the region index, though it's still off-by-one from the kernel's definition.

Split them up into two syscalls: GetTaskDumpRegionRequest and GetTaskDescriptorRegionRequest. Then both userspace and the kernel would agree that dump region indices start at 0 and descriptor regions are something separate.

Sorry if we're doing the thing where a new person gets confused by code and requests silly refactors, even though it would be perfectly clear to someone who had worked here for more than a month.

evan-oxide

Some questions before I keep reviewing

evan-oxide · 2026-04-23T15:52:59Z

 ==== Notes

 For the specified task index, this will return the dump region specified by
 the dump region index.  If the dump region index is equal to or greater


What is a "dump region" and how does it relate to an MPU region? How/where are a task's dump regions defined? What's a "dump area"?

"MPU region" is any memory region configured by the MPU for a task, which includes RAM, flash, peripherals, etc.

"Dump region" is vaguely defined as "a region that can / should be included in a RAM dump for that task". In practice, this is (1) the task's descriptor in kernel RAM and (2) writable, non-device memory regions assigned to that task

evan-oxide · 2026-04-23T16:51:03Z

 than the number of dump regions for the specified task, `None` will
 be returned.

+Passing a dump region of 0 returns the task's descriptor in kernel memory, and


I'm confused. It sounds like this is a new feature, and now you need to pass different region indices to GetTaskDumpRegionRequest than before. But I don't see any change to the kernel's implementation of get_task_dump_region...

You're correct that nothing has changed; this is just documenting existing behavior to make it explicit!

mkeeter requested review from cbiffle, evan-oxide, hawkw, jamesmunns and labbott April 22, 2026 19:09

jamesmunns approved these changes Apr 23, 2026

View reviewed changes

mkeeter added 2 commits April 23, 2026 10:27

Fix multi-region reads in dumps

f486e1f

Make userlib wrappers higher-level; optimize loop exit

77600fb

mkeeter force-pushed the mkeeter/multi-region-dump branch from 587f2b1 to 77600fb Compare April 23, 2026 14:28

evan-oxide reviewed Apr 23, 2026

View reviewed changes

Conversation

mkeeter commented Apr 22, 2026

Uh oh!

jamesmunns left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesmunns Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evan-oxide left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jamesmunns Apr 23, 2026 •

edited

Loading