VE2 drm scheduler enable by saifuddin-xilinx · Pull Request #1276 · amd/xdna-driver

saifuddin-xilinx · 2026-04-26T23:40:42Z

No description provided.

Signed-off-by: Saif <saifuddin.kaijar@amd.com>

Copilot

Pull request overview

This PR enables the DRM GPU scheduler for the VE2 (amdxdna) driver path by replacing the prior custom FIFO/workqueue-based scheduling with drm_gpu_scheduler + per-context drm_sched_entity usage to manage partition context switching and job execution.

Changes:

Add DRM scheduler state to VE2 hwctx (drm_sched_entity) and VE2 mgmtctx (drm_gpu_scheduler).
Implement VE2 DRM scheduler backend callbacks (run_job, free_job, timedout_job) and initialize/finalize the scheduler per partition.
Route VE2 command submission through drm_sched_job_init/arm + drm_sched_entity_push_job, and destroy entities on hwctx teardown.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
`src/driver/amdxdna/ve2_of.h`	Adds DRM scheduler fields to hwctx/mgmtctx structs and exports `ve2_job_put()`.
`src/driver/amdxdna/ve2_mgmt.c`	Adds DRM scheduler backend ops, scheduler init/fini, and removes the old FIFO/workqueue scheduler logic.
`src/driver/amdxdna/ve2_hwctx.c`	Integrates job submission with DRM scheduler jobs/entities and adds entity init/destroy in hwctx lifecycle.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-26T23:47:15Z

+                if (!mgmtctx)
+                        goto cleanup_xrs;
+


mgmtctx is computed as the address of an array element and can’t be NULL, so this check is dead code. It also jumps to cleanup_xrs without setting ret, which could mask the real error path. Remove the check (or replace it with a bounds/initialization check that can actually fail and sets an error code).

Suggested change

if (!mgmtctx)

goto cleanup_xrs;

Copilot · 2026-04-26T23:47:16Z

+	/* DRM scheduler components */
+	struct drm_sched_entity		entity;		/* Scheduling entity for this hwctx */
+	struct drm_gpu_scheduler	*sched;		/* Pointer to mgmtctx scheduler */
+	u64				seq;		/* Job sequence counter */


struct amdxdna_ctx_priv now embeds struct drm_sched_entity/struct drm_gpu_scheduler, but this header doesn’t include <drm/gpu_scheduler.h>. This can break compilation for translation units that include ve2_of.h without first including amdxdna_ctx.h (e.g., ve2_of.c). Include the required DRM scheduler header (or add local forward declarations and move the full types out of the header).

Copilot · 2026-04-26T23:47:16Z

+		ret = ve2_request_context_switch(xdna, mgmtctx);
+		if (ret) {
+			XDNA_ERR(xdna, "Context switch request failed: ret=%d", ret);
+			dma_fence_set_error(fence, ret);


On context-switch request failure, the fence error is set but the fence is never signaled. That leaves the DRM scheduler waiting until timeout even though submission already failed. Consider signaling the fence (and completing the sched job) immediately on unrecoverable errors so the scheduler can retire the job and propagate the error.

Suggested change

dma_fence_set_error(fence, ret);

dma_fence_set_error(fence, ret);

dma_fence_signal(fence);

Copilot · 2026-04-26T23:47:16Z

+	/* Notify firmware that command is ready */
+	ret = notify_fw_cmd_ready(hwctx);
+	if (ret < 0)
+		XDNA_ERR(xdna, "notify_fw_cmd_ready failed: ret=%d", ret);
+
+unlock:
+	mutex_unlock(&mgmtctx->ctx_lock);
+	return fence;


notify_fw_cmd_ready() failure is logged but the job is still treated as successfully submitted (a fence is returned with no error/signaling). If notifying firmware fails, the scheduler will likely hang until timeout. Set the fence error and complete the job immediately (or return an error fence) so the scheduler doesn’t block indefinitely.

Copilot · 2026-04-26T23:47:16Z

+		u64 read_idx = 0, write_idx = 0, reserved_wr = 0;
+
+		/* Read the actual values from the queue header */
+		read_idx = queue->hsa_queue_p->hq_header.read_index;
+		write_idx = queue->hsa_queue_p->hq_header.write_index;
+		reserved_wr = queue->reserved_write_index;


read_idx, write_idx, and reserved_wr are assigned but never used. This will trigger -Wunused-but-set-variable warnings in many builds. Either log these values (since the comment says they’re for inspection) or remove the variables/reads.

Suggested change

u64 read_idx = 0, write_idx = 0, reserved_wr = 0;

/* Read the actual values from the queue header */

read_idx = queue->hsa_queue_p->hq_header.read_index;

write_idx = queue->hsa_queue_p->hq_header.write_index;

reserved_wr = queue->reserved_write_index;

u64 read_idx = queue->hsa_queue_p->hq_header.read_index;

u64 write_idx = queue->hsa_queue_p->hq_header.write_index;

u64 reserved_wr = queue->reserved_write_index;

XDNA_ERR(xdna,

"[TS] !!!TIMEOUT!!! HSA queue=%p read_idx=%llu write_idx=%llu reserved_wr=%llu",

queue->hsa_queue_p,

(unsigned long long)read_idx,

(unsigned long long)write_idx,

(unsigned long long)reserved_wr);

Copilot · 2026-04-26T23:47:17Z

 	// Reset the pending list
 	priv_ctx->pending[get_job_idx(job->seq)] = NULL;
-	ve2_job_put(job);
+
+	/* Signal DRM scheduler fence */
+	if (job->fence)
+		dma_fence_signal(job->fence);
+


The job completion fence is being signaled here, but this helper is only called from the synchronous wait path (ve2_cmd_wait) and cleanup paths—not from the IRQ handler. With DRM scheduler enabled, this means jobs may never complete from the scheduler’s perspective unless userspace calls cmd_wait, leading to scheduler stalls/timeouts. Fence signaling (and any job-done notification required by drm_sched) should happen from the actual completion event path (IRQ / polling work) independent of userspace waiting.

Copilot · 2026-04-26T23:47:17Z

+	/* DRM scheduler - initialize and push job */
+	ret = drm_sched_job_init(&job->base, &hwctx->priv->entity, 1, NULL);
+	if (ret) {
+		return ret;
+	}


drm_sched_job_init() is called after the command has already been written/committed to the HSA queue and the job is added to the pending list. If drm_sched_job_init() fails, this function returns an error even though the job may still execute in hardware, and the driver has no rollback/cleanup. Initialize/arm the sched job before making the hardware-visible submission (or add a rollback path that invalidates the queued packet and removes the pending entry).

Copilot · 2026-04-26T23:47:17Z

@@ -1411,19 +1429,15 @@ int ve2_hwctx_init(struct amdxdna_ctx *hwctx)

 	hwctx->priv = priv;

-	/* Allocate unique ID using XArray (thread-safe, supports ID recycling) */
-	{
-		u32 hwctx_id;
-
-		ret = xa_alloc_cyclic(&xdna->dev_handle->hwctx_ids, &hwctx_id, priv,
-				      XA_LIMIT(1, U32_MAX),
-				      &xdna->dev_handle->next_hwctx_id, GFP_KERNEL);
-		if (ret < 0) {
-			XDNA_ERR(xdna, "Failed to allocate hwctx ID, ret=%d", ret);
-			goto cleanup_priv;
-		}
-		priv->id = hwctx_id;
-	}
+        /* Allocate unique ID using XArray (thread-safe, supports ID recycling) */
+        ret = xa_alloc_cyclic(&xdna->dev_handle->hwctx_ids, &hwctx_id, priv,
+                        XA_LIMIT(1, U32_MAX),
+                        &xdna->dev_handle->next_hwctx_id, GFP_KERNEL);
+        if (ret < 0) {
+                XDNA_ERR(xdna, "Failed to allocate hwctx ID, ret=%d", ret);
+                goto cleanup_priv;
+        }
+        priv->id = hwctx_id;


Kernel style: these new lines use spaces instead of tabs and have inconsistent indentation, which will fail checkpatch.pl and makes the block hard to read. Please align indentation with surrounding code (tabs, consistent continuation indentation).

drm scheduler is working

95f3054

Signed-off-by: Saif <saifuddin.kaijar@amd.com>

Copilot AI review requested due to automatic review settings April 26, 2026 23:40

Copilot started reviewing on behalf of saifuddin-xilinx April 26, 2026 23:42 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VE2 drm scheduler enable#1276

VE2 drm scheduler enable#1276
saifuddin-xilinx wants to merge 1 commit intoamd:mainfrom
saifuddin-xilinx:ve2_drm_sched

saifuddin-xilinx commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	dma_fence_set_error(fence, ret);
	dma_fence_set_error(fence, ret);
	dma_fence_signal(fence);

-		u64 read_idx = 0, write_idx = 0, reserved_wr = 0;
-		/* Read the actual values from the queue header */
-		read_idx = queue->hsa_queue_p->hq_header.read_index;
-		write_idx = queue->hsa_queue_p->hq_header.write_index;
-		reserved_wr = queue->reserved_write_index;
+		u64 read_idx = queue->hsa_queue_p->hq_header.read_index;
+		u64 write_idx = queue->hsa_queue_p->hq_header.write_index;
+		u64 reserved_wr = queue->reserved_write_index;
+		XDNA_ERR(xdna,
+			 "[TS] !!!TIMEOUT!!! HSA queue=%p read_idx=%llu write_idx=%llu reserved_wr=%llu",
+			 queue->hsa_queue_p,
+			 (unsigned long long)read_idx,
+			 (unsigned long long)write_idx,
+			 (unsigned long long)reserved_wr);

Conversation

saifuddin-xilinx commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants