Skip to content

[metal] arm64_32 (ILP32/watchOS) support#9411

Open
matthargett wants to merge 6 commits intogfx-rs:trunkfrom
rebeckerspecialties:fix/metal-arm64_32-ilp32
Open

[metal] arm64_32 (ILP32/watchOS) support#9411
matthargett wants to merge 6 commits intogfx-rs:trunkfrom
rebeckerspecialties:fix/metal-arm64_32-ilp32

Conversation

@matthargett
Copy link
Copy Markdown

Connections

#9406

Description

Three fixes to make the Metal backend work on arm64_32 (Apple Watch watchOS, ILP32 ABI where sizeof(long) == sizeof(void*) == 4).

1. Use MTLStorageMode::Shared for textures on arm64_32

The AGXMetalS4 driver (A13/S6 GPU) crashes with KERN_INVALID_ADDRESS at 0x50 during copyFromTexture:toBuffer: on MTLStorageMode::Private textures on ILP32. The native Swift Metal implementation that works on the same hardware uses Shared storage for render textures. Apple's unified memory architecture makes Shared equally performant for GPU access.

Gated behind cfg!(target_pointer_width = "32") — zero effect on 64-bit.

2. Disable buffer mutability hints on arm64_32

The AGXMetalS4 driver exhibits instability when MTLMutability hints are combined with Shared storage mode. Conservative disable on 32-bit targets. Can be re-enabled per-device with broader test coverage.

3. Fix CGFloat type in surface.rs

CGFloat is f64 on LP64 but f32 on ILP32. CGSize::new() was hardcoded to as f64, which fails to compile on arm64_32. Changed to as _ to infer the correct type.

Testing

Validated on physical hardware with wgpu-native v29:

  • Apple Watch SE2 (arm64_32, S6/A13, watchOS 11.6): full pipeline passes — WGSL→MSL via naga, compute dispatch, render pipeline, 8 indirect draws, texture-to-buffer readback ✅
  • Apple Watch Series 10 (arm64, S9/A15, watchOS 26.4): no regressions ✅

The AGXMetalS4 driver (A13/S6 GPU, used in Apple Watch Series 6-9
and SE2) crashes with KERN_INVALID_ADDRESS at offset 0x50 during
copyFromTexture:toBuffer: on MTLStorageMode::Private textures when
called via objc_msgSend on the ILP32 (arm64_32) ABI.

The native Swift Metal implementation that works on the same hardware
uses MTLStorageMode::Shared for render textures. Apple's unified
memory architecture makes Shared equally performant for GPU access
while enabling the blit DMA path that the driver expects on ILP32.

This change is gated behind cfg!(target_pointer_width = "32") and
has zero effect on 64-bit platforms.

Tested on:
- Apple Watch SE2 (arm64_32, S6/A13, watchOS 11.6)
- Apple Watch Series 10 (arm64, S9/A15, watchOS 26.4) — no regression
After the Shared texture storage mode fix, the AGXMetalS4 driver
(A13/S6 GPU on watchOS arm64_32) still exhibits instability when
MTLMutability hints are set on pipeline buffer descriptors.

Conservatively disable supports_mutability on 32-bit targets.
Can be re-enabled per-device once broader watchOS test coverage
confirms stability.

Gated behind cfg!(target_pointer_width = "32") — no effect on
64-bit platforms.
CGFloat is f64 on LP64 but f32 on ILP32 (arm64_32, used by watchOS).
CGSize::new() expects CGFloat, so use `as _` to let the compiler infer
the correct type instead of hardcoding `as f64`.
@matthargett
Copy link
Copy Markdown
Author

CI fail looks unrelated:
error: failed to load source for dependency libtest-mimic

Caused by:
Unable to update https://github.com/cwfitzgerald/libtest-mimic.git?rev=9979b3c

Caused by:
revspec '9979b3c' not found; class=Reference (4); code=NotFound (-3)

@inner-daemons inner-daemons self-requested a review April 11, 2026 03:17
@inner-daemons
Copy link
Copy Markdown
Collaborator

Related article since I was interested in this:
https://www.phoronix.com/news/GCC-May-Deprecate-ARM64-ILP32

At the end of that it mentions that GCC later deprecated this.

Also, it looks like this is an architecture used exclusively for Apple Watches where the registers are 64bit but the pointers are 32bit. I also think that newer apple watches moved away from this, since there is a aarch64 watchos target triple, and this target is tier 2 whereas arm64_32 is tier 3.

@matthargett
Copy link
Copy Markdown
Author

Related article since I was interested in this: https://www.phoronix.com/news/GCC-May-Deprecate-ARM64-ILP32

At the end of that it mentions that GCC later deprecated this.

I can understand why: Apple ecosystem almost exclusively uses LLVM/clang/swift, so it would make sense for GCC to drop the maintenance overhead.

Also, it looks like this is an architecture used exclusively for Apple Watches where the registers are 64bit but the pointers are 32bit. I also think that newer apple watches moved away from this, since there is a aarch64 watchos target triple, and this target is tier 2 whereas arm64_32 is tier 3.

yes, Apple Watch 9 (Ultra 2, SE 3) and newer are now on pure arm64, and Apple Watch 6/7/8/SE2 are the arm64_32 ABI which are still supported in watchOS 26. as I mentioned in the issue text, I was testing on Apple Watch SE2 (arm64_32 / A12) and Apple Watch 10 (arm64 / A15). That's still ~100 million devices, but also some of these issues I've supplied patches that should affect any Apple 32-bit platform. I personally want my WebGPU-oriented app to reach more of the majority of the ~170M active devices, including users who have hand-me-down devices.

FWIW, the A12-derived GPU (and it's Metal) in the arm64_32 watchOS 26 watches is surprisingly capable. Happy to include a demo video, if that's helpful.

@inner-daemons
Copy link
Copy Markdown
Collaborator

Oh, I'm not at all trying to argue that this is bad. Just throwing this out there. And yeah, I figured people were using clang anyway.

Copy link
Copy Markdown
Collaborator

@inner-daemons inner-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nothing too controversial here.

Comment on lines +525 to +531
// On arm64_32 (watchOS ILP32), the AGXMetalS4 driver (A13/S6 GPU)
// crashes in copyFromTexture:toBuffer: on Private textures — null
// deref at offset 0x50 in the driver's internal texture state. Use
// Shared storage which works correctly on Apple's unified memory
// architecture and matches what native Swift Metal code uses on
// these devices.
MTLStorageMode::Shared
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting bug, is it documented anywhere?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm aware of. There's a couple of reasons this amazing capability of these Apple Watch devices are shrouded:

  • the feature/capability flags will say a feature isn't available, but the MSL will actually just work. before I ported wgpu, I tested Metal exhaustively to figure out what actually works. This was kicked off by seeing the Memoji app on my child's Apple Watch 9 and realizing the functionality it implied.
  • if you try to do some of these MSL features in the simulator, you'll get a hard abort or it just won't work. I'm guessing most people (very reasonably) give up.
  • some features, like ASTC HDR textures, work intermittently (shows magenta sometimes) on Apple Watch 6/SE2, but works fine on Apple Watch 9 and later. I haven't pinned down why, but again I'm assuming this would ward off most mildly curious app/3D developers.

This is probably a good conference/meetup talk to commoditize this hard-won knowledge, happy to apply if anyone has suggestions on a good venue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants