[Feature][QDP] Add MPI-ready distributed amplitude execution scaffolding by viiccwen · Pull Request #1296 · apache/mahout

viiccwen · 2026-05-04T17:31:45Z

Related Issues

Closed #1295

Changes

add distributed amplitude planning, layout, runtime, and state scaffolding for QDP multi-GPU execution
add DistributedExecutionContext so distributed execution is driven by a bundled mesh-plus-collectives object rather than ad hoc device and collective parameters
add a CollectiveCommunicator seam and an in-process LocalCollectiveCommunicator implementation for the current single-process path
ensure distributed execution resolves CUDA handles from planned device_id values so shard metadata, device handles, and active device context stay aligned
add a distributed q34 probe plus runtime / planner / topology / communicator coverage for reordered execution paths
update CUDA arch targeting so the probe builds on the GPUs available on this host

Why

establish a QDP-native distributed amplitude foundation that can scale beyond a single GPU without depending on Lightning-specific assumptions
make the current single-process path extensible toward future MPI-backed collectives
verify that the implementation can successfully materialize and encode a 34-qubit float32 distributed state on this host

How

separate distributed planning, execution context, runtime, and state responsibilities
drive shard execution from planned placement metadata instead of raw mesh iteration order
keep the current collectives local and in-process, while shaping the execution boundary for future MPI-backed implementations
validate the path with distributed tests and the distributed_multigpu_q34_probe example

sequenceDiagram
    participant Caller as QdpEngine caller
    participant Engine as QdpEngine
    participant Mesh as DeviceMesh
    participant Planner as PlacementPlanner
    participant Ctx as DistributedExecutionContext
    participant Runtime as distributed runtime
    participant Comm as LocalCollectiveCommunicator

    Caller->>Engine: encode distributed amplitude request
    Engine->>Engine: validate input and resolve request
    Engine->>Mesh: build distributed mesh
    Engine->>Ctx: construct execution context
    Engine->>Planner: build placement plan
    Planner-->>Engine: placement + shard ranges
    Engine->>Runtime: execute distributed encode
    Runtime->>Runtime: bind planned device handles
    Runtime->>Comm: reduce local norm contributions
    Comm-->>Runtime: global norm
    Runtime-->>Engine: DistributedStateVector
    Engine-->>Caller: sharded distributed state

400Ping · 2026-05-04T19:28:19Z

Nice one, will probably take a look on Thursday.

ryankert01

Hi, nice initiative. Curious of can it be plugging in our current lighting.gpu (penny lane) workflow?

mahout.qdp -> lighting.gpu (by zero copy)

viiccwen · 2026-05-06T14:09:58Z

@ryankert01, after researching, I think the main gap is on the lightning.gpu integration boundary.

From what I can tell, the missing piece is not “can Mahout produce a GPU-resident state?” but “can lightning.gpu ingest an external GPU state buffer through a stable public interface without copying?”.

So my current estimate would be:

expose or formalize an external-state ingest path on the PennyLane / lightning.gpu side
add a Mahout-side bridge/adapter for mahout.qdp -> lightning.gpu
add end-to-end tests and examples for the workflow

So yes: for a proper zero-copy integration, I would expect roughly 3 PRs for a narrow MVP.

ryankert01 · 2026-05-07T10:16:48Z

@viiccwen I think since it's 3 PRs away and so big. Can we postpone it to the next release? It will be more mature at next release and we can think about its detail.

viiccwen · 2026-05-07T13:20:21Z

@viiccwen I think since it's 3 PRs away and so big. Can we postpone it to the next release? It will be more mature at next release and we can think about its detail.

Sure, it'll be fine.

viiccwen requested review from 400Ping, guan404ming and ryankert01 as code owners May 4, 2026 17:31

ryankert01 reviewed May 5, 2026

View reviewed changes

viiccwen added 5 commits May 5, 2026 05:11

refactor(gpu): improve device-aware allocation diagnostics

622f4b8

feat(gpu): add distributed amplitude scaffolding

be01bc6

test(gpu): cover distributed runtime and planning

9086a50

feat(gpu): enable q34 mpi-shaped distributed amplitude probe

aca3a0f

refactor(gpu): harden distributed execution context

baa6a90

viiccwen force-pushed the feature/qdp-multigpu-plan branch from 4145efa to baa6a90 Compare May 5, 2026 12:25

viiccwen changed the title ~~Feat(GPU): add single-node distributed amplitude scaffolding~~ [Feature][QDP] Add MPI-ready distributed amplitude execution scaffolding May 5, 2026

viiccwen mentioned this pull request May 5, 2026

[Feature][QDP] Add MPI-ready distributed amplitude execution scaffolding #1298

Closed

5 tasks

viiccwen added 3 commits May 5, 2026 12:45

docs(gpu): add distributed api comments

79f8cfe

chore(tests): add missing ASF headers to distributed tests

1f455ef

build(qdp-kernels): restore ci-safe CUDA targets

2a2c952

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][QDP] Add MPI-ready distributed amplitude execution scaffolding#1296

[Feature][QDP] Add MPI-ready distributed amplitude execution scaffolding#1296
viiccwen wants to merge 8 commits intoapache:mainfrom
viiccwen:feature/qdp-multigpu-plan

viiccwen commented May 4, 2026 •

edited

Loading

Uh oh!

400Ping commented May 4, 2026 •

edited

Loading

Uh oh!

ryankert01 left a comment

Uh oh!

viiccwen commented May 6, 2026

Uh oh!

ryankert01 commented May 7, 2026 •

edited

Loading

Uh oh!

viiccwen commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

viiccwen commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Changes

Why

How

Uh oh!

400Ping commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryankert01 left a comment

Choose a reason for hiding this comment

Uh oh!

viiccwen commented May 6, 2026

Uh oh!

ryankert01 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viiccwen commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

viiccwen commented May 4, 2026 •

edited

Loading

400Ping commented May 4, 2026 •

edited

Loading

ryankert01 commented May 7, 2026 •

edited

Loading