Skip to content

x64: Prevent modified rdi in PageTableArchX64.zero_page()#89

Merged
makubacki merged 1 commit intomainfrom
personal/mikuback/use_inout_zero_page
Jun 30, 2025
Merged

x64: Prevent modified rdi in PageTableArchX64.zero_page()#89
makubacki merged 1 commit intomainfrom
personal/mikuback/use_inout_zero_page

Conversation

@makubacki
Copy link
Copy Markdown
Collaborator

@makubacki makubacki commented Jun 30, 2025

Description

Per Rust inline assembly docs (https://doc.rust-lang.org/reference/inline-assembly.html#operand-type),
in states:

  • The allocated register will contain the value of at the
    start of the assembly code.
  • The allocated register must contain the same value at the end of
    the assembly code (except if a lateout is allocated to the same register).

This code:

    unsafe fn zero_page(base: VirtualAddress) {
        let _page: u64 = base.into();
        #[cfg(all(not(test), target_arch = "x86_64"))]
        unsafe {
            asm!(
            "cld",              // Clear the direction flag so that we increment rdi with each store
            "rep stosq",        // Repeat the store of qword in rax to [rdi] rcx times
            in("rcx") 0x200,    // we write 512 qwords (4096 bytes)
            in("rdi") _page,    // start at the page
            in("rax") 0,        // store 0
            options(nostack, preserves_flags)
            );
        }
    }

Will capture the _page address at the start of the assembly code
into rdi. However, it uses rep stosq which is going to:

  • Store the rax value in [rdi] (0)
  • Repat the quad word copy rcx times (0x200)
  • For each operation, rdx will be incremented by 8 (qword)
    • Going forward due to cld
  • Decrement rcx until it is zero

At the end, rdx is now _page + (rcx * 8) = _page + 0x1000

I think it then makes sense, to manually restore the rdi value
since the code is inlined. This uses r8 with out to let the
compiler know the register value is modified in the assembly block.

Note: The inline(never) attribute in the aarch64 implementation is
removed as well as that was mentioned as being added in response to
the x64 issue as opposed to a specific aarch64 issue.

  • Impacts functionality?
  • Impacts security?
  • Breaking change?
  • Includes tests?
  • Includes documentation?

How This Was Tested

  • Compilation
  • Boot release profile image to EFI shell on QEMU Q35

Integration Instructions

  • N/A

@makubacki makubacki self-assigned this Jun 30, 2025
@github-actions github-actions Bot added the impact:non-functional Does not have a functional impact label Jun 30, 2025
@makubacki
Copy link
Copy Markdown
Collaborator Author

I didn't have time to comment on #87. But this seemed like an issue to me, and the boot is working on Q35 with it. Would like to get additional feedback.

@makubacki makubacki requested review from Javagedes, joschock and os-d June 30, 2025 14:31
Copy link
Copy Markdown
Contributor

@os-d os-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@makubacki I was chatting with Wesley Wiser on this one, a couple things:

  • He also pointed out that in should not be used here, but he said that if you do inout that should prevent the compiler from clobbering rdi, so the move in and out of r8 should be handled by the compiler, not needed manually.
  • However, as I noted in my PR, I want to eliminate this from asm at all. I believe we can find a path where it is not needed, per conversation with Wesley, so my PR was to fix the immediate issue and then do the longer term issue. I'm fine to update to this (although if we do this, let's remove the never inline on aarch64 also), but as I said I'm planning on removing this in the short term.

@makubacki
Copy link
Copy Markdown
Collaborator Author

makubacki commented Jun 30, 2025

@makubacki I was chatting with Wesley Wiser on this one, a couple things:

  • He also pointed out that in should not be used here, but he said that if you do inout that should prevent the compiler from clobbering rdi, so the move in and out of r8 should be handled by the compiler, not needed manually.
  • However, as I noted in my PR, I want to eliminate this from asm at all. I believe we can find a path where it is not needed, per conversation with Wesley, so my PR was to fix the immediate issue and then do the longer term issue. I'm fine to update to this (although if we do this, let's remove the never inline on aarch64 also), but as I said I'm planning on removing this in the short term.

I also tried inout before this change and that functionally worked. But I still think the original value should be explicitly preserved given this is being inlined (and it should be able to be inlined).

I understand we're dealing with temporary changes, but this still allows inlining and properly declares the registers modified. Not inlining puts it into its own stack frame where it can technically modify rdi, but the assembly code is still not declaring modified resources correcly to the compiler.

@makubacki
Copy link
Copy Markdown
Collaborator Author

let's remove the never inline on aarch64 also

Was that just added because of the x86 issue?

@os-d
Copy link
Copy Markdown
Contributor

os-d commented Jun 30, 2025

@makubacki I was chatting with Wesley Wiser on this one, a couple things:

  • He also pointed out that in should not be used here, but he said that if you do inout that should prevent the compiler from clobbering rdi, so the move in and out of r8 should be handled by the compiler, not needed manually.
  • However, as I noted in my PR, I want to eliminate this from asm at all. I believe we can find a path where it is not needed, per conversation with Wesley, so my PR was to fix the immediate issue and then do the longer term issue. I'm fine to update to this (although if we do this, let's remove the never inline on aarch64 also), but as I said I'm planning on removing this in the short term.

I also tried inout before this change and that functionally worked. But I still think the original value should be explicitly preserved given this is being inlined (and it should be able to be inlined).

I understand we're dealing with temporary changes, but this still allows inlining and properly declares the registers modified. Not inlining puts it into its own stack frame where it can technically modify rdi, but the assembly code is still not declaring modified resources correcly to the compiler.

Sure, I think the idea is that the compiler, when inlining, will see the inout and make sure it doesn't clobber rdi. Or at least that's what I understood from the conversation. That doesn't quite make sense to me, but I guess is "compiler smarts." But, that was going to be one of my questions, when this is inout if you look at the disassembly, did it show it preserving rdi?

Regardless, after further conversation with the compiler folks, I am putting up a PR now to move these functions back to Rust. I am no longer concerned the compiler make optimize away the seemingly unused memory writes.

So, fine for this to go in, but also I'll put up the PR soon to remove these functions.

let's remove the never inline on aarch64 also

Was that just added because of the x86 issue?

Yes, just to keep parity.

@makubacki
Copy link
Copy Markdown
Collaborator Author

Sure, I think the idea is that the compiler, when inlining, will see the inout and make sure it doesn't clobber rdi. Or at least that's what I understood from the conversation. That doesn't quite make sense to me, but I guess is "compiler smarts." But, that was going to be one of my questions, when this is inout if you look at the disassembly, did it show it preserving rdi?

That's my understanding too. Because this is inlined it informs the compiler how to deal with the fact the register is being modified in a particular section of code. I just used inout("rdi") _page => _ to inform it that the register is being clobbered (and save rdi to _page at the beginning). This change was more to recognize the fact that the compiler probably cares about rdi on a more frequent basis than r8, so move the "modified value" to a less used register.

Regardless, after further conversation with the compiler folks, I am putting up a PR now to move these functions back to Rust. I am no longer concerned the compiler make optimize away the seemingly unused memory writes.

Sounds good. As discussed offline, we'll get this in as a fallback in case that doesn't go as planned for some reason.

So, fine for this to go in, but also I'll put up the PR soon to remove these functions.

let's remove the never inline on aarch64 also

Was that just added because of the x86 issue?

Yes, just to keep parity.

In that case, I'll update to remove the never inline on it here too.

@makubacki
Copy link
Copy Markdown
Collaborator Author

let's remove the never inline on aarch64 also

Was that just added because of the x86 issue?

Yes, just to keep parity.

In that case, I'll update to remove the never inline on it here too.

Removed it in the latest update.

@makubacki makubacki force-pushed the personal/mikuback/use_inout_zero_page branch from da7d2d3 to 27086ff Compare June 30, 2025 16:09
Per Rust inline assembly docs (https://doc.rust-lang.org/reference/inline-assembly.html#operand-type),
`in` states:

  - The allocated register will contain the value of <expr> at the
    start of the assembly code.
  - The allocated register must contain the same value at the end of
    the assembly code (except if a lateout is allocated to the same register).

This code:

```rust
    unsafe fn zero_page(base: VirtualAddress) {
        let _page: u64 = base.into();
        #[cfg(all(not(test), target_arch = "x86_64"))]
        unsafe {
            asm!(
            "cld",              // Clear the direction flag so that we increment rdi with each store
            "rep stosq",        // Repeat the store of qword in rax to [rdi] rcx times
            in("rcx") 0x200,    // we write 512 qwords (4096 bytes)
            in("rdi") _page,    // start at the page
            in("rax") 0,        // store 0
            options(nostack, preserves_flags)
            );
        }
    }
```

Will capture the `_page` address at the start of the assembly code
into `rdi`. However, it uses `rep stosq` which is going to:

- Store the `rax` value in `[rdi]` (`0`)
- Repat the quad word copy `rcx` times (`0x200`)
- For each operation, `rdx` will be incremented by `8` (qword)
  - Going forward due to `cld`
- Decrement `rcx` until it is zero

At the end, `rdx` is now `_page` + (`rcx` * 8) = `_page` + `0x1000`

I think it then makes sense, to manually restore the `rdi` value
since the code is inlined. This uses `r8` with `out` to let the
compiler know the register value is modified in the asembly block.

Note: The `inline(never)` attribute in the aarch64 implementation is
removed as well as that was mentioned as being added in response to
the x64 issue as opposed to a specific aarch64 issue.

Signed-off-by: Michael Kubacki <michael.kubacki@microsoft.com>
@makubacki makubacki force-pushed the personal/mikuback/use_inout_zero_page branch from 27086ff to e65bd94 Compare June 30, 2025 16:10
@makubacki makubacki merged commit 84b6952 into main Jun 30, 2025
5 checks passed
@makubacki makubacki deleted the personal/mikuback/use_inout_zero_page branch June 30, 2025 16:35
Comment thread src/x64.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

impact:non-functional Does not have a functional impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants