Skip to content

Quit Application on any RenderErrors by default#24131

Open
kfc35 wants to merge 7 commits intobevyengine:mainfrom
kfc35:23183_exit_when_render_error
Open

Quit Application on any RenderErrors by default#24131
kfc35 wants to merge 7 commits intobevyengine:mainfrom
kfc35:23183_exit_when_render_error

Conversation

@kfc35
Copy link
Copy Markdown
Contributor

@kfc35 kfc35 commented May 5, 2026

Objective

  • Fixes Rendering OOM creates flashes on Vulkan due to bug in render recovery #23183
  • The issue describes strobing that happens on OOM’s (that used to just crash the application). I’ve also seen strobing happen on validation errors while some examples were in a broken state very recently. I believe it’s cause OOM’s and Validations are ignored, which just leads for rendering to resume and hit those same issues again. The looping is somehow causing the flashing. To protect against this strobing, I think the default render error policy needs to be changed.

Solution

  • Instead of ignoring any errors and continuing to render, this PR changes the default error policy to quit the application upon any error. OOM and Validation error induced strobing will not happen cause the app should just quit. I’m not aware of strobing that could happen on DeviceLost / Internal errors, but I’m leaning towards safety here. You can correct me (within reason) if there are some obvious errors we should handle with a different error policy, but I think this is a better starting point than Ignore for all.

  • Note: At times, even with this code, I’ve seen the app flash to a magenta/pink color before exiting (not a strobing pink that would happen if the app ignored validation errors, just a single abrupt switch before app exit), so I’m wondering if it’s better to throw a panic! rather than try to quit gracefully.

Testing

  • Fortunately, the pccm example and light_probe_blending examples are still broken on main.
  • cargo r --example light_probe_blending --features free_camera,https correctly closes the app upon Validation error
  • cargo run --example pccm --features="free_camera https” does the same.
Example console logs
cargo r --example light_probe_blending --features free_camera,https
   Compiling bevy v0.19.0-dev (/Users/kchen/CodingProjects/bevy)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 11.24s
     Running `target/debug/examples/light_probe_blending`
2026-05-05T01:04:33.028208Z  INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "macOS 26.4.1", kernel: "25.4.0", cpu: "Apple M4", core_count: "10", memory: "16.0 GiB" }
2026-05-05T01:04:33.031189Z  WARN bevy_asset::io::web: WebAssetPlugin is potentially insecure! Make sure to verify asset URLs are safe to load before loading them. If you promise you know what you're doing, you can silence this warning by setting silence_startup_warning: true in the WebAssetPlugin construction.
2026-05-05T01:04:33.448175Z  INFO bevy_render::renderer: AdapterInfo { name: "Apple M4", vendor: 0, device: 0, device_type: IntegratedGpu, device_pci_bus_id: "", driver: "", driver_info: "", backend: Metal, subgroup_min_size: 4, subgroup_max_size: 64, transient_saves_memory: true }
2026-05-05T01:04:34.315115Z  INFO bevy_pbr::cluster: GPU clustering is supported on this device.
2026-05-05T01:04:34.315226Z  INFO bevy_render::batching::gpu_preprocessing: GPU preprocessing is fully supported on this device.
2026-05-05T01:04:34.611394Z  INFO bevy_winit::system: Creating new window Bevy Light Probe Blending Example (65v0)
2026-05-05T01:04:37.312262Z ERROR bevy_render::error_handler: Caught rendering error: Validation Error

Caused by:
  In Device::create_bind_group, label = 'mesh_view_bind_group_binding_array'
    Number of bindings in bind group descriptor (3) does not match the number of bindings defined in the bind group layout (0)

2026-05-05T01:04:37.900677Z ERROR bevy_render::error_handler: Quitting the application due to Validation RenderError
2026-05-05T01:04:37.964583Z  WARN bevy_ecs::world::command_queue: CommandQueue has un-applied commands being dropped. Did you forget to call SystemState::apply?
2026-05-05T01:04:37.964628Z  WARN bevy_ecs::world::command_queue: CommandQueue has un-applied commands being dropped. Did you forget to call SystemState::apply?
...

There is some WARN bevy_ecs::world::command_queue: CommandQueue has un-applied commands being dropped. Did you forget to call SystemState::apply? spam (like at least 20+ lines of it) after quitting the application; if anyone knows if that could be prevented somehow, I’m keen to learn.

@kfc35 kfc35 added A-Rendering Drawing game state to the screen D-Straightforward Simple bug fixes and API improvements, docs, test and examples S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels May 5, 2026
@github-project-automation github-project-automation Bot moved this to Needs SME Triage in Rendering May 5, 2026
@kfc35 kfc35 added this to the 0.19 milestone May 5, 2026
@Zeophlite Zeophlite requested a review from atlv24 May 5, 2026 03:24
@alice-i-cecile alice-i-cecile added the X-Contentious There are nontrivial implications that should be thought through label May 5, 2026
@alice-i-cecile alice-i-cecile requested a review from tychedelia May 5, 2026 04:50
@alice-i-cecile alice-i-cecile added the A-Accessibility A problem that prevents users with disabilities from using Bevy label May 5, 2026
Comment thread crates/bevy_render/src/error_handler.rs
Comment thread crates/bevy_render/src/error_handler.rs Outdated
Comment thread crates/bevy_render/src/error_handler.rs Outdated
// do nothing
}
RenderErrorPolicy::QuitApplication => {
main_world.write_message(AppExit::error());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, we only get a u8 to embed error information.

Copy link
Copy Markdown
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right compromise, but the docs absolutely need to be clearer.

Strong, strong preference to sending AppExit over panicking: this is something you can 100% attempt an auto-save on.

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented May 5, 2026

Would it be an option to by default only exit when there are multiple render errors within a small timespan? Say 3 errors within 5 min or 10 within an hour.

@kfc35
Copy link
Copy Markdown
Contributor Author

kfc35 commented May 5, 2026

Would it be an option to by default only exit when there are multiple render errors within a small timespan? Say 3 errors within 5 min or 10 within an hour.

I think this is a good idea, thank you! But I think this can be a follow up. I think optimizing the default on how to better exit can come after the decision to default exit on any error (this PR).

(I think there is still the potential for flashing if you accept x number of errors within some minutes, if the errors come in rapid succession. In that case, you’d probably want to pause the renderer before attempting to render again, and that just seems like a more complicated render error policy that I do not feel like figuring out at the current moment.)

IIUC other apps will be able to implement that policy independently and replace the default handler, so this won’t block them from doing so.

@kfc35 kfc35 requested a review from alice-i-cecile May 5, 2026 14:39
@atlv24
Copy link
Copy Markdown
Contributor

atlv24 commented May 5, 2026

Note that bevy (and wgpu) have historically had tons of validation errors all the time, for me there hasn't been a release that runs 3d_scene without console spamming validation errors on vulkan since like 0.11 or so. This very likely would regress example UX. I investigated the render recovery paths a while ago around when the strobe issue was first reported and found no logic errors, the only divergence seems to be having device error handlers at all. This may be a wgpu bug with how it behaves when a handler is present, I'll investigate once I have more time on my hands

Copy link
Copy Markdown
Member

@tychedelia tychedelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about the behavior here. We should definitely not exit on regular vulkan validation errors.

@kfc35
Copy link
Copy Markdown
Contributor Author

kfc35 commented May 5, 2026

I'm a bit confused about the behavior here. We should definitely not exit on regular vulkan validation errors.

I can put an if-block there if we can trust that, specifically, vulkan validation errors won’t cause light strobing. I’ll do so in a bit.

My reasoning behind making all validation errors exit is because I’ve personally seen it cause strobing on main very recently. If you git checkout 6fcf9a6dcc5777c1eee253d82ccef6542f443110 (this commit 6fcf9a6 that introduces creating the mesh view bind group layout on demand a few days ago) and run the volumetric_fog example, there is strobing that occurs due to ignored validation errors (I’m on Mac/Metal). Granted the strobing ultimately happened because of programmer error, but if ignored validation errors potentially cause strobing, it is on my list to stop the app for.

I’m not experienced enough in rendering to know which validation errors could or could not cause strobing, but if Vulkan validation ones are not going to strobe, I can put that exception in.

@kfc35
Copy link
Copy Markdown
Contributor Author

kfc35 commented May 6, 2026

Context: On Mac/Metal, I’m able to see a strobing pink screen when I checkout this commit 6fcf9a6 and I run the volumetric_fog example:

git checkout 6fcf9a6dcc5777c1eee253d82ccef6542f443110
cargo run --example volumetric_fog

Under @atlv24 ’s suggestion, I commented out the internals of these wgpu handlers defined in error_handler.rs’s DeviceErrorHandler like so

            device.set_device_lost_callback(move |reason, str| {
                // bevy_log::error!("Caught DeviceLost error: {reason:?} {str}");
                // assert!(device_lost.lock().unwrap().replace((reason, str)).is_none());
            });
            device.on_uncaptured_error(Arc::new(move |e| {
                // bevy_log::error!("Caught rendering error: {e}");
                // uncaptured
                //    .lock()
                //    .unwrap()
                //    .get_or_insert(WgpuWrapper::new(e));
            }));

The app still strobes. If I comment even the setting of the handlers out, it panics (which is what I would expect from my previous comment regarding wgpu behavior).

It was suggested that this means the bug is in wgpu. However, I think the app strobes because the app has not changed RenderState at all, and therefore will still attempt to render. The error is successfully caught, but DeviceErrorHandler.poll() returns a None since neither the device_lost or uncaptured arc mutexes contain the error. That means that DeviceErrorHandler.update_state() never changes the RenderState into an Error state. It continues in the ready state as normal. I added a println!("{state:?}”); statement in after the state resource is removed in update_state to verify that it stays in Ready. IIUC, this is basically what RenderErrorPolicy.Ignore would do, except the state just stays in the Ready state as opposed to transitioning from Error to Ready.

For what it's worth, I’m unable to reproduce the pink screen flashing in the render_recovery example through manually triggering validation errors — not even a blip; I can only do it via the broken examples that the offending commit broke e.g. volumetric_fog, atmosphere, meshlet. It might vary per invocation: at times it can be a constant pink, sometimes theres no pink but there’s still blinking, but I get rapid blinking pink screen a lot.

Ultimately, I still remain convinced this PR should be considered. However, since strobing was not able to be reproduced on Vulkan (tested in Discord), and the pink screen effect seems to be a Mac specific thing (#20318), I think I can agree to ignore validation errors for non Metal adapters only at this time (I assume this is just a matter of checking render_adapter_info.backend for != Backends.Metal). If anyone who can tolerate flashing lights could additionally test the potential strobing behavior of the two commands at the top of the this comment a few times on different OS’s just to make sure we aren’t putting people in danger / to confirm I am not the only one on Mac/Metal who experiences the pink screen strobing, I’d appreciate it.

Comment thread crates/bevy_render/src/error_handler.rs Outdated
Comment thread crates/bevy_render/src/error_handler.rs Outdated
Comment thread crates/bevy_render/src/error_handler.rs Outdated
@atlv24
Copy link
Copy Markdown
Contributor

atlv24 commented May 6, 2026

This change is correct in spirit: the slip-up earlier was the belief that wgpu validation error == vulkan validation error. Vulkan validation errors are a different layer outside wgpu, but look very similar to the point of being easy to confuse: I think char and I were both operating under the assumption that your PR would make vulkan validation errors stop the renderer. This is not the case. the handler is only for wgpu validation errors.

Just do it in the default handler

Co-authored-by: atlv <email@atlasdostal.com>
@kfc35 kfc35 force-pushed the 23183_exit_when_render_error branch from 809d8cd to 2c0706c Compare May 6, 2026 02:10
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

The generated examples/README.md is out of sync with the example metadata in Cargo.toml or the example readme template. Please run cargo run -p build-templated-pages -- update examples to update it, and commit the file change.

@kfc35 kfc35 requested a review from atlv24 May 6, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Accessibility A problem that prevents users with disabilities from using Bevy A-Rendering Drawing game state to the screen D-Straightforward Simple bug fixes and API improvements, docs, test and examples S-Needs-Review Needs reviewer attention (from anyone!) to move forward X-Contentious There are nontrivial implications that should be thought through

Projects

Status: Needs SME Triage

Development

Successfully merging this pull request may close these issues.

Rendering OOM creates flashes on Vulkan due to bug in render recovery

5 participants