Skip to content

JIT: Strip shift amount masking via VN to prevent CSE interference#126801

Draft
Copilot wants to merge 5 commits intomainfrom
copilot/jit-strip-masking-shiftamount
Draft

JIT: Strip shift amount masking via VN to prevent CSE interference#126801
Copilot wants to merge 5 commits intomainfrom
copilot/jit-strip-masking-shiftamount

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 11, 2026

Description

Roslyn emits shl(x, and(y, 31)) for shift operations. On architectures where hardware already masks the shift amount, the AND is redundant. Currently LowerShift strips it, but that runs after CSE. When multiple shifts share the same variable shift amount, CSE hoists (shift & 31) into a temp, and LowerShift no longer recognizes the pattern:

uint ShiftAndCSE(uint foo, int shift)
{
    uint res = (foo >> shift);
    res <<= shift;
    return res;
}
; Before: CSE'd mask not removed
and      r8d, 31
shrx     eax, edx, r8d
shlx     eax, eax, r8d

; After: VN sees through the mask, no AND emitted
shrx     eax, edi, esi
shlx     eax, eax, esi

Changes

  • src/coreclr/jit/valuenum.cpp: In fgValueNumberTree, when computing the VN for shift operations (GT_LSH, GT_RSH, GT_RSZ), if the shift amount is AND(y, mask) where the mask preserves all relevant shift bits (0x1F for 32-bit, 0x3F for 64-bit), use y's VN instead of the AND's VN. This lets CSE see through the redundant mask. Guarded by the same target set as LowerShift (TARGET_XARCH || TARGET_ARM64 || TARGET_LOONGARCH64 || TARGET_RISCV64). Mirrors the existing LowerShift mask check logic.

  • src/tests/JIT/opt/Shifts/ShiftMaskCSE.cs: New test covering the CSE+shift interaction pattern — multiple shifts sharing a variable shift amount across 32-bit, 64-bit, signed, and unsigned variants.

The existing LowerShift optimization is intentionally kept as a safety net for AND masks introduced by later phases. Rotation recognition in morph (fgRecognizeAndMorphBitwiseRotation) is unaffected — it handles both masked and unmasked shift amounts.

Copilot AI requested review from Copilot and removed request for Copilot April 11, 2026 22:49
Copilot AI linked an issue Apr 11, 2026 that may be closed by this pull request
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 11, 2026
When Roslyn emits `x << y`, it generates `shl(x, and(y, 31))`. On
architectures where the hardware already masks the shift amount
(x86, x64, ARM64, LoongArch64, RISC-V64), this AND is redundant.

Previously, the AND was only stripped late in LowerShift, after CSE
had already run. If two shifts share the same variable shift amount,
CSE would hoist the `(shift & 31)` expression, preventing LowerShift
from recognizing the pattern.

Now we strip the mask early during import, before CSE runs. This
prevents the AND from being CSE'd and avoids generating unnecessary
masking instructions.

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/0d8e29d4-bfec-465e-bd94-77a38f880f35

Co-authored-by: EgorBo <523221+EgorBo@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 11, 2026 23:36
Copilot AI requested review from Copilot and removed request for Copilot April 11, 2026 23:38
Copilot AI changed the title [WIP] Optimize shiftAmount masking in JIT import process JIT: Strip shift amount masking in importer to prevent CSE interference Apr 11, 2026
Copilot AI requested a review from EgorBo April 11, 2026 23:42
@EgorBo
Copy link
Copy Markdown
Member

EgorBo commented Apr 12, 2026

@copilot please adjust RangeOps::ShiftRight and ShiftLeft as well.

Since the importer now strips the AND mask from shift amounts,
the range checker may see shift amounts with wider ranges than
[0..31]. Add ClampShiftAmount to apply the implicit hardware
masking so range analysis is not regressed.

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/1c1b5e88-5f5e-4f48-a357-0bf2704a210b

Co-authored-by: EgorBo <523221+EgorBo@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 12, 2026 15:55
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 12, 2026

@copilot please adjust RangeOps::ShiftRight and ShiftLeft as well.

Done in 2772485. Added ClampShiftAmount helper that applies the implicit hardware masking (& 31) to the shift amount range, and wired it into both ShiftRight and ShiftLeft. This prevents range analysis from regressing when the importer strips the explicit AND mask.

@EgorBo
Copy link
Copy Markdown
Member

EgorBo commented Apr 12, 2026

@copilot revert all changes and only make VN changes - if shift's arg is AND that we can ignore - use its op1 VN effectively ignoring AND.

Comment on lines +7536 to +7560
#if defined(TARGET_XARCH) || defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64)
// Shift instructions on these targets mask the shift amount,
// so strip the redundant AND mask (e.g., & 31 for 32-bit or & 63 for 64-bit)
// that Roslyn emits. Doing this early prevents CSE from hoisting
// the mask and hiding it from later optimizations.
{
size_t mask = 0x1f;
#ifdef TARGET_64BIT
if (varTypeIsLong(type))
{
mask = 0x3f;
}
#endif
while (op2->OperIs(GT_AND) && op2->gtGetOp2()->IsCnsIntOrI())
{
if ((static_cast<size_t>(op2->gtGetOp2()->AsIntCon()->IconValue()) & mask) != mask)
{
break;
}
op2 = op2->gtGetOp1();
}
}
#endif

op1 = gtNewOperNode(oper, type, op1, op2);
Copy link
Copy Markdown
Member

@tannergooding tannergooding Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorBo, this rather feels like something we should just handle gtFoldExpr. It should make the logic simpler and would kick in anywhere such a pattern is found.

Basically just look at the rhs of the shift operation to see if its GT_AND and see if it is x & cns where cns is one of the well known masks. In such a case, just fold away the and

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regardless of where I put it, it currently produces regressions, I assume some of them might be range check analysis related - not sure, needs an investigation. I want to check if we can simply ignore its existence on the VN level

Revert the importer.cpp shift mask stripping and rangecheck.h
ClampShiftAmount changes per reviewer feedback.

Instead, strip redundant AND masks from shift amounts at VN level:
in fgValueNumberTree, when computing VN for shift operations, if
the shift amount is AND(y, mask) where the mask preserves all
relevant shift bits, use y's VN instead. This lets CSE see through
the redundant mask without modifying the tree structure.

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/c731e39e-f338-4d79-b29a-a13762165d21

Co-authored-by: EgorBo <523221+EgorBo@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 12, 2026 18:55
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 12, 2026

@copilot revert all changes and only make VN changes - if shift's arg is AND that we can ignore - use its op1 VN effectively ignoring AND.

Done in 8d83b8a. Reverted all importer.cpp and rangecheck.h changes. The optimization is now VN-only: in fgValueNumberTree, when computing VN for shift operations (GT_LSH/GT_RSH/GT_RSZ), if the shift amount is AND(y, mask) where the mask preserves all relevant shift bits, the shift's VN uses y's VN instead of the AND's VN.

Copilot AI changed the title JIT: Strip shift amount masking in importer to prevent CSE interference JIT: Strip shift amount masking via VN to prevent CSE interference Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JIT: Strip out masking of shiftAmount way earlier when possible

3 participants