Skip to content

Fix multi-region reads in dumps#2485

Open
mkeeter wants to merge 2 commits intomasterfrom
mkeeter/multi-region-dump
Open

Fix multi-region reads in dumps#2485
mkeeter wants to merge 2 commits intomasterfrom
mkeeter/multi-region-dump

Conversation

@mkeeter
Copy link
Copy Markdown
Collaborator

@mkeeter mkeeter commented Apr 22, 2026

While testing unrelated code, I noticed that reading a particular variable over the network failed. I switched to humility readmem, and ran into the same issue:

➜  hubris jj:(zuuk) h readmem 0x24021f70
humility: connecting to fe80::c1d:8cff:fec0:e207%28
humility readmem failed: 0x24021f70 can't be read via the archive or over the network

Caused by:
    dump agent failed: invalid response: Err(BadSegmentAdd)

Suspiciously, the buffer in question spans multiple MPU regions:

0x080057a8 0x24021a00 - 0x24021bff     512 rw---- 5  packrat
0x080057bc 0x24021c00 - 0x24021fff    1KiB rw---- 5  packrat
0x080057d0 0x24022000 - 0x24023fff    8KiB rw---- 5  packrat

It turns out that this is the userland equivalent of #1674: jefe checks to see if a read was contained within a single MPU region, but would reject reads which span multiple regions (even if they're contiguous and owned by the same task).

In this PR, I rely on the fact that region descriptors are sorted by base address to incrementally check for overlaps. I also add some documentation about get_task_dump_region's behavior and guarantees: it's got special behavior for the 0th region, which may not be sorted*

*usually, kernel memory is below task memory, but now that we can put tasks in dtcm, that's not always true!

Copy link
Copy Markdown
Contributor

@jamesmunns jamesmunns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes, but I also admit to not fully understanding what makes this change somewhat spicy :)

Comment thread task/jefe/src/dump.rs
let mut mem = start..end;
for ndx in 1..=usize::MAX {
// This is Accidentally Quadratic; see the note in `dump_task`
let Some(region) = kipc::get_task_dump_region(task, ndx) else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/out of scope refactor suggestion: It feels like we probably could stand to have a helper for "iterate over dump regions", to avoid manual calls like this. It would be neat to be able to do:

for region in kipc::iter_task_dump_regions(task).skip(1) {
  // ...
}

Comment thread task/jefe/src/dump.rs
// If we are beyond the start of our `mem` region, then there
// are no more overlaps and we can bail out immediately.
break;
}
Copy link
Copy Markdown
Contributor

@jamesmunns jamesmunns Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misc perf nit: it might be worth tracking if we've STARTED matching on items, and bail here if started && region.start < mem.start, because then we've found non-continugous overlaps

Assuming:

  • region(1): 0x1000..0x2000
  • region(2): 0x3000..0x4000

And the user asked to dump 0x1000..0x3800, currently I think they would never get okay = true because we'd take 0x1000..0x2000, leaving 0x2000..0x3800, but then 0x3000..0x4000 doesn't contains 0x2000, but then we keep iterating over all remaining regions.

This could probably also be:

} else if (region.start != start) && region.start < mem.start {
    break;
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable! N is small here (which is why we don't mind quadratic behavior), but also has a context switch into the kernel, so I support cheap early-exit optimizations.

Comment thread task/jefe/src/dump.rs Outdated
// Note: we also implicitly trust that kipc gives us regions which are
// in sorted order by base address.
let mut mem = start..end;
for ndx in 1..=usize::MAX {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking, we DO allow dump requests that span multiple task ranges, but NOT one that spans the kernel and one or more task ranges? If this isn't for a specific reason: why do we need the first if block at all, and could this just be covered by for ndx in 0..=usize::MAX?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's got special behavior for the 0th region, which may not be sorted*

Oh, I missed that in the PR description, let me see again if I missed that in the docs, otherwise this might be worth putting there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my eyes glossed over the two other places you mentioned this. It might be worth mentioning this explicitly here again (for folks like myself that are oblivious outside local context, apparently), but you've definitely covered your bases. :D

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to make this more obvious by splitting the userlib functions into

  • fn get_task_desc_region(task_id: usize) -> TaskDumpRegion (infallible, no index)
  • fn get_task_dump_region(task_id: usize, region_id: usize) -> Option<TaskDumpRegion> (fallible, 0-indexed, sorted)

This means that users don't have to remember that index 0 is special at the raw KIPC layer; the raw KIPC wrapper is renamed to get_task_dump_region_raw.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels confusing and error-prone to have the two get_task_dump_region and get_task_dump_region_raw functions that are both used by jefe but have region index args that are off by 1. Some alternate ideas:

  • Have the only two userlib functions be get_task_dump_region (which does GetTaskDumpRegionRequest(region+1)) and get_task_desc_region (which does GetTaskDumpRegionRequest(0)). Then userspace has one consistent definition for the region index, though it's still off-by-one from the kernel's definition.
  • Split them up into two syscalls: GetTaskDumpRegionRequest and GetTaskDescriptorRegionRequest. Then both userspace and the kernel would agree that dump region indices start at 0 and descriptor regions are something separate.

Sorry if we're doing the thing where a new person gets confused by code and requests silly refactors, even though it would be perfectly clear to someone who had worked here for more than a month.

@mkeeter mkeeter force-pushed the mkeeter/multi-region-dump branch from 587f2b1 to 77600fb Compare April 23, 2026 14:28
Copy link
Copy Markdown
Contributor

@evan-oxide evan-oxide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions before I keep reviewing

Comment thread doc/kipc.adoc
==== Notes

For the specified task index, this will return the dump region specified by
the dump region index. If the dump region index is equal to or greater
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "dump region" and how does it relate to an MPU region? How/where are a task's dump regions defined? What's a "dump area"?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"MPU region" is any memory region configured by the MPU for a task, which includes RAM, flash, peripherals, etc.

"Dump region" is vaguely defined as "a region that can / should be included in a RAM dump for that task". In practice, this is (1) the task's descriptor in kernel RAM and (2) writable, non-device memory regions assigned to that task

Comment thread doc/kipc.adoc
than the number of dump regions for the specified task, `None` will
be returned.

Passing a dump region of 0 returns the task's descriptor in kernel memory, and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. It sounds like this is a new feature, and now you need to pass different region indices to GetTaskDumpRegionRequest than before. But I don't see any change to the kernel's implementation of get_task_dump_region...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that nothing has changed; this is just documenting existing behavior to make it explicit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants