Skip to content

ArrayBuffer Lazy-assignment for GPU Context#5076

Draft
SamSJackson wants to merge 24 commits intoconnorjward/pyop3from
SamSJackson/pyop3-outer
Draft

ArrayBuffer Lazy-assignment for GPU Context#5076
SamSJackson wants to merge 24 commits intoconnorjward/pyop3from
SamSJackson/pyop3-outer

Conversation

@SamSJackson
Copy link
Copy Markdown

Description

This PR introduces a context manager to allow dynamic assignment of PyOP3 arrays on a given device.
Devices are defined within the new device.py module, where internal array management is also contained.

Key modifications:

  • Introduce device.py module to represent offloading devices as offloading context manager
  • Change buffer.py to lazily-evaluate data with respect to the current context (i.e. host or offloading device)
    • _lazy_data is now a dictionary object, mapping between Device objects and respective arrays.
    • _data property lazily-evaluates to appropriate data for the given context. If the data is not up-to-date, as per state property, the data is copied.
  • All data is copied lazily. Entering and exiting the context window will not automatically transfer data between devices.
  • Buffers are maintained between context windows - exiting a context window will not release the memory on the device.
  • All device-specific array management is kept within device.py so buffer.py can remain device/gpu-agnostic - apart from some type-hinting, i.e. cp.ndarray.

Notable issues:

  • CuPy has no support for writeable flag. It will throw away the flag when converting from NumPy objects (and it will not return if converted back)
  • Defaultdict is used for our state dictionary. Previously discussed that neither Connor or I like this but I cannot think of another approach.
    • Main issue: If an array is assigned, it does not have any knowledge of other devices beyond that in its current context. As such, there is also no state counter assigned to it. Hence, it is difficult to allow the user to check the buffer on other devices, even if they are initialised. Example can be seen in ./pyop3_gpu_demo.py from asserts before entering context manager.
    • Potential approaches:
      • Dictionary wrapper so we can create a more strict defaultdict (bit extreme and needless maintenance but should work).
      • Do not allow users to check state of array on device for which it has not entered context.
    • Open to any advice or solutions for this.

@SamSJackson SamSJackson requested a review from connorjward May 4, 2026 10:38
Copy link
Copy Markdown
Contributor

@connorjward connorjward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been strict but in general this is fantastic, thank you

Comment thread pyop3/buffer.py Outdated
Comment thread pyop3/buffer.py Outdated
Comment thread pyop3/buffer.py Outdated
Comment thread pyop3/buffer.py
Comment thread pyop3/buffer.py Outdated
Comment thread pyop3/device.py Outdated
Comment thread pyop3/device.py Outdated
Comment thread pyop3/device.py Outdated
Comment thread pyop3/device.py Outdated
Comment thread pyop3/buffer.py Outdated
SamSJackson added 6 commits May 4, 2026 14:04
- defaultdict gives -1 as default value if device object does not exist
- constant property was lost between cupy/numpy conversions
- fixed by passing kwarg that is disregarded for cupy but used for numpy
- using @Property for last_updated_device, known by state, does not need
  to be variable.
- duplicate method only copies most up-to-date copy and non-copy
  duplicate only copies for current device
- initialisation adds None as optional for data input
- v3.24.5 was failing to compile petsc4py due to PETSc no support for
  PCPatchSetComputeFunctionExteriorFacets
@connorjward
Copy link
Copy Markdown
Contributor

For any trivial changes please go ahead and resolve them. Saves me cross checking things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants