Skip to content

os/kernel/task: Fix for multiple exit scenario via task_exithook()#7288

Open
rish-sg wants to merge 1 commit intoSamsung:masterfrom
rish-sg:task_manager_stop_issue
Open

os/kernel/task: Fix for multiple exit scenario via task_exithook()#7288
rish-sg wants to merge 1 commit intoSamsung:masterfrom
rish-sg:task_manager_stop_issue

Conversation

@rish-sg
Copy link
Copy Markdown
Contributor

@rish-sg rish-sg commented Apr 17, 2026

Description:
This patch prevents multiple exit of task via task_exithook(), in SMP case.


AS-IS:
TCB_FLAG_EXIT_PROCESSING is bit set in tcb->flag to prevent multiple exit. But the checker and setter code are not protected and are being called at different places.
Checker: at the start of the function to return early if the flag is set. Setter: flag is set at the end of task_exithook(). This difference in exection of checker and setter opens a window of race where one process sets it, but another process has already passed the checker point.
In this race, both the processes may attempt to kill the same task, which leads to a multiple-exit scenario.


LOGS:
log_csi_config: enable: 0
wifi_csi_mq_close: MQ cltask_sigchild: task: csifw_task_Main, tg_nchildren = 1 task_sigchild: task: task_manager, tg_nchildren = 0

security level: 0
=========================================================== Assertion details
=========================================================== Assertion failed CPU0 at file: task/task_exithook.c line 339 task: task_manager pid: 17 print_assert_detail: Assert location (PC) : 0x0e024eff check_assert_location: Code asserted in normal thread! =========================================================== Asserted task's stack details
=========================================================== check_sp_corruption: Current SP is User Thread SP: 60154490


Fix: We move the setter code to the beginning of task_exithook() (just after the checker) and protect them within critical section to ensure atomicity.


Comment thread os/kernel/task/task_exithook.c Outdated
* that bit is set, then just exit doing nothing more..
*/

#ifdef CONFIG_SMP
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SMP conditional is necessary? What about context switching here in non smp?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got you. Need to make sure pre-emption is disabled in non-SMP case too. Condition removed.

@seokhun-eom24
Copy link
Copy Markdown
Contributor

seokhun-eom24 commented Apr 17, 2026

Automated nightly Codex PR review
PR: #7288 — os/kernel/task: Fix for multiple exit scenario via task_exithook()
PR URL: #7288
GitHub updatedAt: 2026-04-24T00:58:36Z
Refreshed (UTC): 2026-04-24 17:36:31 UTC
Refreshed (KST): 2026-04-25 02:36:31 KST
Comment model: single evolving Codex-authored PR comment

PR #7288 — os/kernel/task: Fix for multiple exit scenario via task_exithook()

This review done by codex. AI reviews can be inaccurate.
UTC: 2026-04-24 17:34 UTC
KST (UTC+9): 2026-04-25 02:34 KST


Repository: Samsung/TizenRT

Base → Head: masterpr/7288

HEAD Commit: 871eb2a627c86669d0e564bfb9ad4d9cc3b7be14

Scope: Review of the task_exithook() / unloaded-task exit-path changes in os/kernel/task/task_exithook.c and os/kernel/task/task_terminate_unloaded.c, with focus on SMP re-entry handling and RTOS teardown safety.


🔎 Review Summary

Category Status
Build / Compile Status ⚠️ Not run
Functional Correctness ❌ Blocking issues found
Validation Coverage ⚠️ Static review only

Final Verdict: ❗ Request Changes


🚨 Must-Fix Issues

1. The new guard still does not atomically check and set TCB_FLAG_EXIT_PROCESSING
Item Details
Location os/kernel/task/task_exithook.c:558
Severity High
Type Functional correctness

Problem
The patch moves the early check into a critical section, but it still releases that critical section before the flag is set. The first guard runs at task_exithook.c:558-566, while the setter is still deferred until task_exithook.c:616-623 after dbg_save_termination_info(), optional task_atexit() / task_onexit(), and task_recover(). That leaves the same race window the PR description is trying to close: a second CPU can pass the early guard before the first CPU sets TCB_FLAG_EXIT_PROCESSING.

Impact

  • Two concurrent exit paths can still both execute the teardown body for the same TCB.
  • Duplicate execution here is exactly the class of failure the PR is trying to prevent: repeated task_signalparent(), task_exitwakeup(), group_leave(), and sig_cleanup() on one task.
  • Because task_terminate.c:230-236 already documents that task_exithook() may be called twice, leaving this race open means the asserted double-exit path is still credible after the patch.

Evidence

  • os/kernel/task/task_exithook.c:558 — enters the critical section only to test the flag.
  • os/kernel/task/task_exithook.c:566 — drops the critical section before any state is changed.
  • os/kernel/task/task_exithook.c:623 — sets TCB_FLAG_EXIT_PROCESSING much later, after non-trivial cleanup work has already run.
  • os/kernel/task/task_terminate.c:236task_exithook() is a shared termination path that can be reached more than once for the same task.

Required Fix
Files to edit:

  • os/kernel/task/task_exithook.ctask_exithook

Change outline:

  • Keep the first enter_critical_section() / leave_critical_section() pair.
  • Perform the TCB_FLAG_EXIT_PROCESSING test and set in that same critical section before returning to normal execution.
  • Keep the later critical section only for the smaller region that truly needs it, instead of using it as the place where the flag is first published.

Example patch:

diff --git a/os/kernel/task/task_exithook.c b/os/kernel/task/task_exithook.c
@@
 	flags = enter_critical_section();
 	if ((tcb->flags & TCB_FLAG_EXIT_PROCESSING) != 0) {
 		leave_critical_section(flags);
 		return;
 	}
+	tcb->flags |= TCB_FLAG_EXIT_PROCESSING;
 	leave_critical_section(flags);
@@
-	flags = enter_critical_section();
-	tcb->flags |= TCB_FLAG_EXIT_PROCESSING;
+	flags = enter_critical_section();

Inference:

  • The exact placement of the later signal-handling critical section may still need minor adjustment after runtime validation, but the flag itself needs to be published in the first guarded block, not at line 623.

Validation Method

  • Reproduce the original SMP stress case that drives concurrent exit/terminate handling for the same task.
  • Add temporary tracing around the early-return path and the main cleanup body, then confirm only one CPU reaches the teardown body once the flag is set early.
  • Verify the original assert does not reappear under repeated SMP stress.
2. The patch now holds the exit path inside enter_critical_section() across heavyweight teardown on all builds
Item Details
Location os/kernel/task/task_exithook.c:616
Severity High
Type Stability / Safety

Problem
The new enter_critical_section() at task_exithook.c:616 is unconditional and is not released until task_exithook.c:667. In this tree, enter_critical_section() maps to irqsave() when CONFIG_IRQCOUNT is off (os/include/tinyara/irq.h:142-145), so this change widens the IRQ-disabled region on non-SMP builds too. The widened region now includes task_flushstreams(), binary_manager_remove_binlist(), group_leave(), and sig_cleanup(). group_leave() explicitly documents that no special precautions are required (os/kernel/group/group_leave.c:356-358) and can cascade into group_release() and binfmt_exit() (os/kernel/group/group_leave.c:404-412), which is far heavier than the original SMP-only signal/wakeup protection.

Impact

  • Exit now runs with interrupts disabled across stream flushing and task-group release paths that were previously outside the critical section on non-SMP builds.
  • This increases latency and watchdog risk in a hot kernel teardown path.
  • It also changes locking expectations around teardown helpers that were not designed to run inside a global IRQ-disabled window, creating new regression risk unrelated to the original race.

Evidence

  • os/kernel/task/task_exithook.c:616 — unconditional enter_critical_section().
  • os/kernel/task/task_exithook.c:645 / 650 / 658 / 664 — stream flushing, binary-list removal, group teardown, and signal cleanup all now run before interrupts are restored.
  • os/include/tinyara/irq.h:144 — on common builds this is just irqsave(), not an SMP-only helper.
  • os/kernel/group/group_leave.c:404 and :412 — the same IRQ-disabled region can reach group_release() and binfmt_exit().

Required Fix
Files to edit:

  • os/kernel/task/task_exithook.ctask_exithook

Change outline:

  • Restore the earlier behavior where the wider teardown path is not covered by a global critical section on non-SMP builds.
  • Keep only the minimum IRQ-off region needed for the atomic exit-processing flag update and the specific signal/wakeup state that truly requires critical protection.
  • Move task_flushstreams(), binary_manager_remove_binlist(), and group_leave() back outside the IRQ-disabled region.

Example patch:

diff --git a/os/kernel/task/task_exithook.c b/os/kernel/task/task_exithook.c
@@
-	flags = enter_critical_section();
-	task_signalparent(tcb, status);
-	task_exitwakeup(tcb, status);
-	if (!nonblocking) {
-		task_flushstreams(tcb);
-	}
-#if defined(CONFIG_BINARY_MANAGER) && defined(CONFIG_APP_BINARY_SEPARATION)
-	binary_manager_remove_binlist(tcb);
-#endif
-#ifdef HAVE_TASK_GROUP
-	group_leave(tcb);
-#endif
-#ifndef CONFIG_DISABLE_SIGNALS
-	sig_cleanup(tcb);
-#endif
-	leave_critical_section(flags);
+	flags = enter_critical_section();
+	task_signalparent(tcb, status);
+	task_exitwakeup(tcb, status);
+#ifndef CONFIG_DISABLE_SIGNALS
+	sig_cleanup(tcb);
+#endif
+	leave_critical_section(flags);
+
+	if (!nonblocking) {
+		task_flushstreams(tcb);
+	}
+#if defined(CONFIG_BINARY_MANAGER) && defined(CONFIG_APP_BINARY_SEPARATION)
+	binary_manager_remove_binlist(tcb);
+#endif
+#ifdef HAVE_TASK_GROUP
+	group_leave(tcb);
+#endif

Inference:

  • The exact minimum protected set may need a quick re-check against the signal/waitpid contract, but the current all-the-way-to-group_leave() region is too broad for the problem being solved.

Validation Method

  • Build at least one UP configuration and one SMP configuration after narrowing the critical section.
  • Re-run the reported exit stress scenario and confirm both the original double-exit assert and any new watchdog / latency regressions stay absent.
  • Check that exit-path logging shows interrupts are restored before group_leave() / binary teardown runs.

🟡 Nice-to-Have Improvements

1. Add a targeted regression test or stress harness for the duplicate-exit race
Item Details
Location os/kernel/task/task_exithook.c:549
Severity Medium
Type Validation

Problem
This PR fixes a timing-sensitive SMP race, but the change does not add any deterministic regression coverage for the failure it describes.

Impact

  • Future edits can reopen the same window without an obvious functional diff.
  • Static review can catch obvious ordering mistakes, but it is weak against scheduler-sensitive regressions.

Recommended Action

  • Add an SMP-focused stress test or kernel self-test that repeatedly forces concurrent exit/terminate handling for the same task.
  • Capture a simple pass/fail signal such as "cleanup body entered once" or "no duplicate SIGCHLD / waitpid wakeup observed".

Expected Output

  • A repeatable stress artifact or test log showing the race scenario can be exercised many times without the original assert or duplicate teardown symptoms.

✅ Notable Improvements

Notable Improvements

✔ The PR is addressing a real kernel-level race instead of a cosmetic issue

  • The problem statement matches the code structure in the merge base: the old check was at the start of task_exithook() and the flag was only set at the very end.
  • Focusing on the TCB_FLAG_EXIT_PROCESSING contract is the right place to harden this path.

✔ The unloaded-binary termination path is now at least aligned with the same exit-processing flag

  • os/kernel/task/task_terminate_unloaded.c:148-152 now marks TCB_FLAG_EXIT_PROCESSING, which is directionally consistent with the normal exit path.
  • That keeps the dedicated unload path closer to the same "single teardown" invariant as the regular task termination flow.

🧾 Final Assessment

Must-Fix Summary

  • The main race is still present because the flag check and the flag set are still separated by a long unprotected window.
  • The patch also broadens the IRQ-disabled region around exit teardown in a way that can create new regressions outside the original bug.

Nice-to-Have Summary

  • Add a targeted SMP regression/stress test once the ordering fix is corrected.

Residual Risk / Test Gap

  • Static review only; I did not run a kernel build or board-level stress workload in this review pass.
  • Any conclusion about timing behavior is based on the checked-out code paths and surrounding TizenRT teardown contracts, not on live runtime traces.

📌 Final Verdict

❗ Request Changes

The PR identifies the right race, but the current patch does not actually make the exit-processing guard atomic and it widens the IRQ-disabled teardown window in the process. I would not merge this as-is; the flag publication needs to move into the first guarded block, and the broad critical section needs to be narrowed before re-review.

@rish-sg rish-sg force-pushed the task_manager_stop_issue branch from 120574f to 660b442 Compare April 20, 2026 06:48
@rish-sg
Copy link
Copy Markdown
Contributor Author

rish-sg commented Apr 20, 2026

Yes, we have checked this, there should be a single task termination handler.

Similar migration was done on Nuttx and we have found the commit:
Related Nuttx commit: apache/nuttx@eae57cb
But this seems to be part of bigger code changes and simple patch may not be enough.

We need to evaluate all the related code changes and then take a decision which will take time to validate.
Currently only going with this change.

@seokhun-eom24
Copy link
Copy Markdown
Contributor

seokhun-eom24 commented Apr 21, 2026

Yes, we have checked this, there should be a single task termination handler.

Similar migration was done on Nuttx and we have found the commit: Related Nuttx commit: apache/nuttx@eae57cb But this seems to be part of bigger code changes and simple patch may not be enough.

We need to evaluate all the related code changes and then take a decision which will take time to validate. Currently only going with this change.

Thank you for checking.
Please review and correct this part too in new PR soon.

I think we need to check task_terminate, task_exithook, task_delete, pthread_exit, and exit.

Comment thread os/kernel/task/task_exithook.c Outdated
* processing.
*/

tcb->flags |= TCB_FLAG_EXIT_PROCESSING;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if Setting it here, We must ensure that the exit hook sequent is performed.

but I think we should consider the case below.

  1. the thread is entering terminate itself and pass the line.
  2. the context switching is occured.
  3. other thread is call task_delete or pthread_cancel.
    -> TCB_FLAG_EXIT_PROCESSING is setted, skip the task_exithook.
    -> pull it out of readytorun queue.
    => the task_exithook sequence will not perfored.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your point. For now, we have moved the setting of TCB_FLAG_EXIT_PROCESSING to the lower critical_section() of task_exithook(), this will ensure that the exithook sequence is performed at least once.

But it just reduces the race window and not completely closes it. We have tested it for the issue observed, currently it is working fine.

@rish-sg rish-sg force-pushed the task_manager_stop_issue branch from 660b442 to 301b6d9 Compare April 22, 2026 11:33
Description:
This patch reduces the window for multiple exit of task via
task_exithook(), in SMP case.

----------------------------------------------------------------------

AS-IS:
TCB_FLAG_EXIT_PROCESSING is bit set in tcb->flag to prevent multiple exit.
But the checker and setter code are not protected and are being called at
different places.
Checker: at the start of the function to return early if the flag is set.
Setter: flag is set at the end of task_exithook().
This difference in exection of checker and setter opens a window of race
where one process sets it, but another process has already passed the
checker point.
In this race, both the processes may attempt to kill the same task,
which leads to a multiple-exit scenario.

----------------------------------------------------------------------

LOGS:
log_csi_config: enable: 0
wifi_csi_mq_close: MQ cltask_sigchild: task: csifw_task_Main, tg_nchildren = 1
task_sigchild: task: task_manager, tg_nchildren = 0

security level: 0
===========================================================
Assertion details
===========================================================
Assertion failed CPU0 at file: task/task_exithook.c line 339 task: task_manager pid: 17
print_assert_detail: Assert location (PC) : 0x0e024eff
check_assert_location: Code asserted in normal thread!
===========================================================
Asserted task's stack details
===========================================================
check_sp_corruption: Current SP is User Thread SP: 60154490

----------------------------------------------------------------------

Fix: We move the setter code to the beginning of critical section of
task_exithook() and protect the checker within critical section to
ensure atomicity.

----------------------------------------------------------------------

Signed-off-by: Rishabh Singh <ris.singh@samsung.com>
@rish-sg rish-sg force-pushed the task_manager_stop_issue branch from 301b6d9 to 871eb2a Compare April 22, 2026 11:37
@ewoodev
Copy link
Copy Markdown
Contributor

ewoodev commented Apr 24, 2026

Yes, we have checked this, there should be a single task termination handler.

Similar migration was done on Nuttx and we have found the commit: Related Nuttx commit: apache/nuttx@eae57cb But this seems to be part of bigger code changes and simple patch may not be enough.

We need to evaluate all the related code changes and then take a decision which will take time to validate. Currently only going with this change.

The nuttx changes is including task_recover into enter_critical_section. Could you please check we also need to change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants