JIT: Strip shift amount masking via VN to prevent CSE interference#126801
JIT: Strip shift amount masking via VN to prevent CSE interference#126801
Conversation
When Roslyn emits `x << y`, it generates `shl(x, and(y, 31))`. On architectures where the hardware already masks the shift amount (x86, x64, ARM64, LoongArch64, RISC-V64), this AND is redundant. Previously, the AND was only stripped late in LowerShift, after CSE had already run. If two shifts share the same variable shift amount, CSE would hoist the `(shift & 31)` expression, preventing LowerShift from recognizing the pattern. Now we strip the mask early during import, before CSE runs. This prevents the AND from being CSE'd and avoids generating unnecessary masking instructions. Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/0d8e29d4-bfec-465e-bd94-77a38f880f35 Co-authored-by: EgorBo <523221+EgorBo@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/0d8e29d4-bfec-465e-bd94-77a38f880f35 Co-authored-by: EgorBo <523221+EgorBo@users.noreply.github.com>
|
@copilot please adjust RangeOps::ShiftRight and ShiftLeft as well. |
Since the importer now strips the AND mask from shift amounts, the range checker may see shift amounts with wider ranges than [0..31]. Add ClampShiftAmount to apply the implicit hardware masking so range analysis is not regressed. Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/1c1b5e88-5f5e-4f48-a357-0bf2704a210b Co-authored-by: EgorBo <523221+EgorBo@users.noreply.github.com>
Done in 2772485. Added |
|
@copilot revert all changes and only make VN changes - if shift's arg is AND that we can ignore - use its op1 VN effectively ignoring AND. |
src/coreclr/jit/importer.cpp
Outdated
| #if defined(TARGET_XARCH) || defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64) || defined(TARGET_RISCV64) | ||
| // Shift instructions on these targets mask the shift amount, | ||
| // so strip the redundant AND mask (e.g., & 31 for 32-bit or & 63 for 64-bit) | ||
| // that Roslyn emits. Doing this early prevents CSE from hoisting | ||
| // the mask and hiding it from later optimizations. | ||
| { | ||
| size_t mask = 0x1f; | ||
| #ifdef TARGET_64BIT | ||
| if (varTypeIsLong(type)) | ||
| { | ||
| mask = 0x3f; | ||
| } | ||
| #endif | ||
| while (op2->OperIs(GT_AND) && op2->gtGetOp2()->IsCnsIntOrI()) | ||
| { | ||
| if ((static_cast<size_t>(op2->gtGetOp2()->AsIntCon()->IconValue()) & mask) != mask) | ||
| { | ||
| break; | ||
| } | ||
| op2 = op2->gtGetOp1(); | ||
| } | ||
| } | ||
| #endif | ||
|
|
||
| op1 = gtNewOperNode(oper, type, op1, op2); |
There was a problem hiding this comment.
@EgorBo, this rather feels like something we should just handle gtFoldExpr. It should make the logic simpler and would kick in anywhere such a pattern is found.
Basically just look at the rhs of the shift operation to see if its GT_AND and see if it is x & cns where cns is one of the well known masks. In such a case, just fold away the and
There was a problem hiding this comment.
regardless of where I put it, it currently produces regressions, I assume some of them might be range check analysis related - not sure, needs an investigation. I want to check if we can simply ignore its existence on the VN level
Revert the importer.cpp shift mask stripping and rangecheck.h ClampShiftAmount changes per reviewer feedback. Instead, strip redundant AND masks from shift amounts at VN level: in fgValueNumberTree, when computing VN for shift operations, if the shift amount is AND(y, mask) where the mask preserves all relevant shift bits, use y's VN instead. This lets CSE see through the redundant mask without modifying the tree structure. Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/c731e39e-f338-4d79-b29a-a13762165d21 Co-authored-by: EgorBo <523221+EgorBo@users.noreply.github.com>
Done in 8d83b8a. Reverted all importer.cpp and rangecheck.h changes. The optimization is now VN-only: in |
Description
Roslyn emits
shl(x, and(y, 31))for shift operations. On architectures where hardware already masks the shift amount, the AND is redundant. CurrentlyLowerShiftstrips it, but that runs after CSE. When multiple shifts share the same variable shift amount, CSE hoists(shift & 31)into a temp, andLowerShiftno longer recognizes the pattern:Changes
src/coreclr/jit/valuenum.cpp: InfgValueNumberTree, when computing the VN for shift operations (GT_LSH,GT_RSH,GT_RSZ), if the shift amount isAND(y, mask)where the mask preserves all relevant shift bits (0x1F for 32-bit, 0x3F for 64-bit), usey's VN instead of the AND's VN. This lets CSE see through the redundant mask. Guarded by the same target set asLowerShift(TARGET_XARCH || TARGET_ARM64 || TARGET_LOONGARCH64 || TARGET_RISCV64). Mirrors the existingLowerShiftmask check logic.src/tests/JIT/opt/Shifts/ShiftMaskCSE.cs: New test covering the CSE+shift interaction pattern — multiple shifts sharing a variable shift amount across 32-bit, 64-bit, signed, and unsigned variants.The existing
LowerShiftoptimization is intentionally kept as a safety net for AND masks introduced by later phases. Rotation recognition in morph (fgRecognizeAndMorphBitwiseRotation) is unaffected — it handles both masked and unmasked shift amounts.