[metal] arm64_32 (ILP32/watchOS) support#9411
[metal] arm64_32 (ILP32/watchOS) support#9411matthargett wants to merge 6 commits intogfx-rs:trunkfrom
Conversation
The AGXMetalS4 driver (A13/S6 GPU, used in Apple Watch Series 6-9 and SE2) crashes with KERN_INVALID_ADDRESS at offset 0x50 during copyFromTexture:toBuffer: on MTLStorageMode::Private textures when called via objc_msgSend on the ILP32 (arm64_32) ABI. The native Swift Metal implementation that works on the same hardware uses MTLStorageMode::Shared for render textures. Apple's unified memory architecture makes Shared equally performant for GPU access while enabling the blit DMA path that the driver expects on ILP32. This change is gated behind cfg!(target_pointer_width = "32") and has zero effect on 64-bit platforms. Tested on: - Apple Watch SE2 (arm64_32, S6/A13, watchOS 11.6) - Apple Watch Series 10 (arm64, S9/A15, watchOS 26.4) — no regression
After the Shared texture storage mode fix, the AGXMetalS4 driver (A13/S6 GPU on watchOS arm64_32) still exhibits instability when MTLMutability hints are set on pipeline buffer descriptors. Conservatively disable supports_mutability on 32-bit targets. Can be re-enabled per-device once broader watchOS test coverage confirms stability. Gated behind cfg!(target_pointer_width = "32") — no effect on 64-bit platforms.
CGFloat is f64 on LP64 but f32 on ILP32 (arm64_32, used by watchOS). CGSize::new() expects CGFloat, so use `as _` to let the compiler infer the correct type instead of hardcoding `as f64`.
|
CI fail looks unrelated: Caused by: Caused by: |
|
Related article since I was interested in this: At the end of that it mentions that GCC later deprecated this. Also, it looks like this is an architecture used exclusively for Apple Watches where the registers are 64bit but the pointers are 32bit. I also think that newer apple watches moved away from this, since there is a aarch64 watchos target triple, and this target is tier 2 whereas arm64_32 is tier 3. |
I can understand why: Apple ecosystem almost exclusively uses LLVM/clang/swift, so it would make sense for GCC to drop the maintenance overhead.
yes, Apple Watch 9 (Ultra 2, SE 3) and newer are now on pure arm64, and Apple Watch 6/7/8/SE2 are the arm64_32 ABI which are still supported in watchOS 26. as I mentioned in the issue text, I was testing on Apple Watch SE2 (arm64_32 / A12) and Apple Watch 10 (arm64 / A15). That's still ~100 million devices, but also some of these issues I've supplied patches that should affect any Apple 32-bit platform. I personally want my WebGPU-oriented app to reach more of the majority of the ~170M active devices, including users who have hand-me-down devices. FWIW, the A12-derived GPU (and it's Metal) in the arm64_32 watchOS 26 watches is surprisingly capable. Happy to include a demo video, if that's helpful. |
|
Oh, I'm not at all trying to argue that this is bad. Just throwing this out there. And yeah, I figured people were using clang anyway. |
inner-daemons
left a comment
There was a problem hiding this comment.
LGTM, nothing too controversial here.
| // On arm64_32 (watchOS ILP32), the AGXMetalS4 driver (A13/S6 GPU) | ||
| // crashes in copyFromTexture:toBuffer: on Private textures — null | ||
| // deref at offset 0x50 in the driver's internal texture state. Use | ||
| // Shared storage which works correctly on Apple's unified memory | ||
| // architecture and matches what native Swift Metal code uses on | ||
| // these devices. | ||
| MTLStorageMode::Shared |
There was a problem hiding this comment.
This is an interesting bug, is it documented anywhere?
There was a problem hiding this comment.
Not that I'm aware of. There's a couple of reasons this amazing capability of these Apple Watch devices are shrouded:
- the feature/capability flags will say a feature isn't available, but the MSL will actually just work. before I ported wgpu, I tested Metal exhaustively to figure out what actually works. This was kicked off by seeing the Memoji app on my child's Apple Watch 9 and realizing the functionality it implied.
- if you try to do some of these MSL features in the simulator, you'll get a hard abort or it just won't work. I'm guessing most people (very reasonably) give up.
- some features, like ASTC HDR textures, work intermittently (shows magenta sometimes) on Apple Watch 6/SE2, but works fine on Apple Watch 9 and later. I haven't pinned down why, but again I'm assuming this would ward off most mildly curious app/3D developers.
This is probably a good conference/meetup talk to commoditize this hard-won knowledge, happy to apply if anyone has suggestions on a good venue!
Connections
#9406
Description
Three fixes to make the Metal backend work on arm64_32 (Apple Watch watchOS, ILP32 ABI where
sizeof(long) == sizeof(void*) == 4).1. Use
MTLStorageMode::Sharedfor textures on arm64_32The AGXMetalS4 driver (A13/S6 GPU) crashes with
KERN_INVALID_ADDRESS at 0x50duringcopyFromTexture:toBuffer:onMTLStorageMode::Privatetextures on ILP32. The native Swift Metal implementation that works on the same hardware uses Shared storage for render textures. Apple's unified memory architecture makes Shared equally performant for GPU access.Gated behind
cfg!(target_pointer_width = "32")— zero effect on 64-bit.2. Disable buffer mutability hints on arm64_32
The AGXMetalS4 driver exhibits instability when
MTLMutabilityhints are combined with Shared storage mode. Conservative disable on 32-bit targets. Can be re-enabled per-device with broader test coverage.3. Fix
CGFloattype insurface.rsCGFloatisf64on LP64 butf32on ILP32.CGSize::new()was hardcoded toas f64, which fails to compile on arm64_32. Changed toas _to infer the correct type.Testing
Validated on physical hardware with wgpu-native v29: