Skip to content

Wrap eip#1561

Open
donno2048 wants to merge 17 commits into
copy:masterfrom
donno2048:eip
Open

Wrap eip#1561
donno2048 wants to merge 17 commits into
copy:masterfrom
donno2048:eip

Conversation

@donno2048
Copy link
Copy Markdown
Contributor

@donno2048 donno2048 commented May 15, 2026

solves #1253

TODO:

@copy hope this is good :)

@donno2048
Copy link
Copy Markdown
Contributor Author

Went over pretty much the entire codebase to make sure there's nothing left "untreated", the only thing that bugs me is this:

pub unsafe fn instr32_E8(imm32s: i32) {
// call
return_on_pagefault!(push32(get_real_eip()));
*instruction_pointer = *instruction_pointer + imm32s;
dbg_assert!(*is_32 || get_real_eip() < 0x10000);
}
pub unsafe fn instr16_E9(imm16: i32) {
// jmp
jmp_rel16(imm16);
}
pub unsafe fn instr32_E9(imm32s: i32) {
// jmp
*instruction_pointer = *instruction_pointer + imm32s;
dbg_assert!(*is_32 || get_real_eip() < 0x10000);
}

I mean, yes, a 32-bit jump would wrap in 16-bit mode but perhaps it's safe to assume that any sane program won't deliberately encode it, maybe we can add debug info here or something...

@copy
Copy link
Copy Markdown
Owner

copy commented May 19, 2026

There's two problems with this change (both sort-of pre-existing, but they need to be discussed):

  • Correctness-wise, this bakes cs_offset into the compiled code (this is already done in some other places), however compiled blocks are not invalidated when called with a different cs_offset. Doing so naively would probably cause v86 to run extremely slow in certain 16-bit OSes where the same code is ran with different code segments (this needs to be measured). A better fix would compile code without cs_offset wrapping, and dynamically check at the beginning of each basic block whether the wrap-around could happen, and bail out to interpreted mode.
  • Performance-wise, how does this affect interpreted mode? (read_imm* is called many times, and often inlined). If this negatively impacts performance, how can it be implemented more efficiently? In particular, the change is currently added to the fast path of read_imm*, maybe a smart check would allow moving it into the slow path. The main goal here is not to negatively affect performance of 32-bit code, as 16-bit code already runs fast enough in v86 (in my experience).

And for later PRs or documentation: As we're emulating read hardware more precisely, how does should edge-cases be handled, for example:

  • 16-bit read at 0xFFFF, etc.
  • 32-bit absolute or relative jumps with 16-bit CS (in real mode, protected mode, vm86 modem, etc.)

@donno2048
Copy link
Copy Markdown
Contributor Author

donno2048 commented May 19, 2026

A better fix would compile code without cs_offset wrapping, and dynamically check at the beginning of each basic block whether the wrap-around could happen, and bail out to interpreted mode.

The problem is that I don't think it's possible, I mean, what's stopping someone from doing

mov ax, cs
dec ax
mov cs, ax

thus making eip increment more than the block's size, they're infinitely many ways for something like that to happen, so you can't really tell ahead if a wrap could happen...

I mean, you could theoretically check explicitly for jmp call ret mov cs, pop cs and such, but ultimately because we're dealing with a von-neuman architecture and the instructions themselves are prune to changing you can't really check it beforehand (unless you can solve the halting problem)

this bakes cs_offset into the compiled code

Wont just using get_seg_cs directly in the eip incremation function instead of taking a cs argument solves this?

Just to make sure I'm not missing something more structural... 😅

how does this affect interpreted mode?

I don't know how much we can trust it as a metric (as it might fluctuate) but if you look at the running time for the CI tests in, for example #1411 (which I intentionally chose because it's not merged - so the CI logs aren't deleted, and because the changes there aren't perf related) yoy can see both PR have roughly the same run-time, so I don't think it'll impact anything.

However, I'll try to remember checking it next week.

Edit: managed to do it today, ran time make tests for master and my eip branch (don't worry, I made sure in both the same build status will be present) and the results are:

On master:

make tests  728.23s user 27.27s system 435% cpu 2:53.32 total

and on eip:

make tests  450.08s user 16.81s system 461% cpu 1:41.25 total

Now, I'm not claiming the PR is performance positive (which makes the tests a bit weird actually), but imho means its effects aren't very dramatic, just to make sure I ran the master test again (same build status) but this time I made sure not to do anything in parallel while the tests are running and got:

make tests  439.01s user 16.71s system 479% cpu 1:34.97 total

which is still not a very dramatic change over the eip test - 6% faster, but again this isn't very accurate so it could very well be by chance, and if I run again on eip (without doing anything in parallel) we might get better results, again, by chance.

Overall, that is not very surprising imo because in 32-bit the only thing that's added is 3 function calls (increment_instruction_pointer, is_asize_32, get_seg_cs) and one conditional check (if !is_asize_32) but basically the logic is exactly the same: one wrapping_add operation and that's it.

16-bit read at 0xFFFF, etc.

read from 0xFFFF then from 0x0000, nothing special

32-bit absolute or relative jumps with 16-bit CS (in real mode, protected mode, vm86 modem, etc.)

iirc in real mode it will just treat the lowest 16-bits, in protected mode however, because eip exists it actual will change eip to the specified address but the CPU will continue fetching instructions from eip & 0xFFFF, in practice this operation is basically the same as in real mode unless you do something like

%assign x 8 ; for example
call(32) far _pop + x * 0x10000
_pop: pop ax
pop ax

@copy
Copy link
Copy Markdown
Owner

copy commented May 21, 2026

thus making eip increment more than the block's size, they're infinitely many ways for something like that to happen, so you can't really tell ahead if a wrap could happen...

All instructions that can change cs (far jumps/calls/returns, interrupts) already bail out to interpreted mode

I mean, you could theoretically check explicitly for jmp call ret mov cs, pop cs and such, but ultimately because we're dealing with a von-neuman architecture and the instructions themselves are prune to changing you can't really check it beforehand (unless you can solve the halting problem)

v86 already detects self-modifying code. It currently only logs, and it's not really used by OSes we run, but it could be implemented properly:

"SMC: bits={} eip={:x} writeaddr={:x} value={:x}",

Wont just using get_seg_cs directly in the eip incremation function instead of taking a cs argument solves this?

The code we're talking about (cpu_context.rs) reads the instructions that are going to be compiled once, and then code is generated for them. Later changes to the CS offset can't affect the already generated code.

@donno2048
Copy link
Copy Markdown
Contributor Author

donno2048 commented May 21, 2026

All instructions that can change cs (far jumps/calls/returns, interrupts) already bail out to interpreted mode

What about

pop cs
mov cs, ax ; for example
loadall

?

EDIT: Actually, it doesn't matter, pop cs is not supported on i386 (and iirc since the 186, and I don't think V86 is looking to specifically emulate the 8086), loadall is not supported (I think) on V86 and as for mov cs, r16:

{ opcode: 0x8E, block_boundary: 1, e: 1, skip: 1 }, // mov sreg

v86 already detects self-modifying code

It only detects self-modifying in the same page, should ideally mark any written block and if it's being jumped into (or just "entered") bail to interpreted mode.

self-modifying code is not the only problem, you also need to deal with misaligned jumps , etc

Can solve this by adding something like instruction_start_markers to the JitContext and mark each instruction start, then if a jump address is not aligned to a start marker bail to interpreted mode...

Edit: Actually, if jumps and calls bail to interpreted mode we don't need to handle it, just need to handle self-modifying the block or entering a modified block (without jumping).

But okay, when I'll have time I'll make a naive block escaping mechanism

P.S. There's no need to check for 32-bit jumps in 16-bit:

v86/src/rust/cpu/cpu.rs

Lines 3086 to 3089 in e37189a

let opcode = *memory::mem8.offset(phys_addr as isize) as i32;
*instruction_pointer += 1;
dbg_assert!(*prefixes == 0);
run_instruction(opcode | (*is_32 as i32) << 8);

pub unsafe fn instr_66() {
// Operand-size override prefix
*prefixes |= prefix::PREFIX_MASK_OPSIZE;
run_prefix_instruction();
*prefixes = 0;
}
pub unsafe fn instr_67() {
// Address-size override prefix
*prefixes |= prefix::PREFIX_MASK_ADDRSIZE;
run_prefix_instruction();
*prefixes = 0;
}

@donno2048
Copy link
Copy Markdown
Contributor Author

donno2048 commented Jun 4, 2026

Since the self-modifying code problem is separate from the eip wrapping problem I'll open a separate issue for it.

Edit: #1575

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants