Skip to content

fix the nested predicate bug#2713

Open
minansys wants to merge 1 commit intoEnzymeAD:mainfrom
minansys:mixu/fix-nested-predicate-taping
Open

fix the nested predicate bug#2713
minansys wants to merge 1 commit intoEnzymeAD:mainfrom
minansys:mixu/fix-nested-predicate-taping

Conversation

@minansys
Copy link
Copy Markdown
Collaborator

@minansys minansys commented Feb 24, 2026

Fix nested inactive-outer / active-inner predicate taping bug (#2629)

Summary

This PR fixes an incorrect reverse CFG reconstruction in Enzyme for nested branch patterns where an inner active predicate is only computed under an outer inactive guard (e.g., if (fan) { ... if (radius > 0) ... } with fan passed as enzyme_const). In such cases, Enzyme could materialize / read the inner predicate unconditionally during reverse CFG reconstruction, leading to undef/poison reads, wrong control flow, and incorrect gradients (especially at -O0/-O1).

The fix ensures the inner predicate is only materialized under the outer guard by using a staging block in reverse CFG reconstruction and properly registering that synthetic block in reverseBlockToPrimal. As an additional safety net, i1 predicate caches are zero-initialized to represent the correct “not executed” default.

Motivation / Problem

fix the issue #2629

Enzyme’s 3-target merge handling in GradientUtils::branchToCorrespondingTarget() included a fast-path that:

  • computed the first split predicate (cond1) and the second split predicate (cond2) eagerly, and
  • then used control-flow or PHI rewriting to represent the 3-way selection.

In the nested-guard scenario from #2629, cond2 is only defined/taped when the outer guard is true. Eagerly materializing cond2 on paths where the outer guard is false can cause reads of uninitialized cached predicates and lead to incorrect reverse control flow.

Approach

Key ideas

  1. Delay materialization of the second predicate (cond2)
    • Create a synthetic staging block and only compute cond2 inside staging, which is reached only when cond1 holds.
  2. Register synthetic reverse blocks
    • lookupM() expects reverse-only blocks to be present in reverseBlockToPrimal.
    • Map staging to the appropriate primal/original block to satisfy invariants and avoid assertions.
  3. Avoid unsafe 3-target fast-path in replacePHIs mode
    • The replacePHIs != nullptr path cannot introduce control flow. Attempting to express a 3-way split purely via PHI rewriting can still force eager cond2 materialization.
    • For this case, fall back to the generic safe reconstruction path.
  4. Zero-init i1 predicate caches
    • When a predicate cache is not written on some paths, the correct default for “not executed” is false, preventing reverse from taking the inner path spuriously.

Changes

Functional changes

  • GradientUtils::branchToCorrespondingTarget

    • Rework 3-target merge reconstruction to:
      • branch on cond1 to a staging block when needed,
      • compute cond2 inside staging, and
      • branch to the final target blocks.
    • Register staging in reverseBlockToPrimal to avoid originalForReverseBlock assertions when lookupM() is called in that block.
    • Disable the 3-target fast-path when replacePHIs != nullptr, falling back to the safe path.
  • CacheUtility::createCacheForScope

    • For non-loop predicate caches (i1), force zero initialization so that missing stores do not leave the cache uninitialized.

Tests

  • Add a regression test covering the nested guard pattern:
    • outer guard is inactive (enzyme_const),
    • inner predicate is active and only computed under the outer guard,
    • reverse must not branch on an uninitialized taped predicate.

Reproduction (before)

A minimal reproducer is a kernel with:

  • if (fan) { ... radius = sqrt(...); if (radius > 0) ... }
  • fan passed as enzyme_const
  • output remains differentiable w.r.t. fvec regardless of fan

Observed behavior:

  • wrong gradients at -O0/-O1 and/or crashes/assertions during Enzyme transformation due to reverse CFG reconstruction reading an uninitialized predicate.

Result (after)

  • Correct gradients for the reproducer at -O0 and -O1.
  • No assertion failures in originalForReverseBlock when building reverse CFG.

Performance / Risk Assessment

  • The behavioral change is highly localized:
    • affects only the 3-target merge reconstruction path, and
    • further restricts the bypass only when replacePHIs != nullptr.
  • The removed PHI-based optimization was unsafe for nested inactive-outer patterns; falling back to the generic path may slightly increase IR size/compile-time in that niche case, but improves correctness.
  • Zero-initializing i1 caches is low risk and aligns with the correct semantic default (false) when a predicate was never computed.

Checklist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enzyme miscompiles nested branch with outer: uninitialized taped predicate causes inactive path to execute

1 participant