Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
9e89b7c
Specialized x86 implementation of interleave_vectors
abadams Jan 26, 2026
188bee0
Update test to be more exhaustive
abadams Jan 27, 2026
2ba8dde
Fix comment.
abadams Jan 27, 2026
d102f7b
Comment fix
abadams Jan 27, 2026
46d41dd
clang-tidy fixes
abadams Jan 27, 2026
27f1220
Make variable names more consistent
abadams Jan 27, 2026
5576f46
Simplify code with helper lambda
abadams Jan 27, 2026
107aaa5
Comment tweaks
abadams Jan 27, 2026
0bc1b9f
Don't do half-width unpcks
abadams Jan 28, 2026
cdc1de2
Use optimization fences in the base class too
abadams Jan 30, 2026
23b79ba
Merge branch 'main' into abadams/fix_x86_transpose
mcourteaux Feb 1, 2026
3eef5db
Use Catanzaro's algorithm for non-power-of-two interleaves
abadams Feb 12, 2026
678a353
Support more interleave and deinterleave patterns
abadams Feb 18, 2026
a0b7d66
Merge remote-tracking branch 'origin/main' into abadams/fix_x86_trans…
abadams Feb 18, 2026
4c1adf7
clang-tidy fix
abadams Feb 19, 2026
1c940e8
Handle multiple let injections at same site
abadams Feb 19, 2026
c39b1a0
better simplification and better handling of composite factors
abadams Feb 20, 2026
794df0b
Fix innermost_containing_node
abadams Feb 20, 2026
486addd
Fix some simd op check failures
abadams Feb 21, 2026
a1ecca9
Fix infinite recursion issue and missed case in interleave codegen
abadams Feb 23, 2026
f66d5ea
Adjust expectations in stage_strided_loads test
abadams Feb 23, 2026
c25142f
Allow reversed suffix or not in sve test
abadams Feb 23, 2026
bae3e02
Don't use optimization fences on hexagon
abadams Feb 23, 2026
b7defbd
Fix infinite simplifier loop
abadams Feb 23, 2026
23944a0
Don't hoist transposes on hexagon
abadams Feb 23, 2026
0d110d2
Make distinct strided load nodes in the IR distinct in memory too
abadams Feb 23, 2026
53ae7e4
Merge remote-tracking branch 'origin/main' into abadams/fix_x86_trans…
abadams Feb 24, 2026
84f10b1
arm-32 has no vst2 for 64-bit elements
abadams Feb 24, 2026
8d93c3c
Windows bad filename fix in simd op check
abadams Feb 24, 2026
36565ce
Temporary dumping of cpu info to debug github actions issue
abadams Feb 24, 2026
3f45c47
dump cpuinfo in makefile testing workflow
abadams Feb 24, 2026
223dd7f
Merge remote-tracking branch 'origin/main' into abadams/fix_x86_trans…
abadams Mar 2, 2026
2695151
Address review comments
abadams Mar 6, 2026
31f180a
Merge remote-tracking branch 'origin/main' into abadams/fix_x86_trans…
abadams Mar 10, 2026
2962ea1
Remove duplicate function body
abadams Mar 10, 2026
fa2fcb7
Use slice of predicate
abadams Mar 11, 2026
dcdfb90
clang-format
abadams Mar 11, 2026
70afc58
SVE fixes
abadams Mar 11, 2026
cd04fb2
Merge branch 'main' into abadams/fix_x86_transpose
alexreinking Mar 16, 2026
5d2b524
Move optimization_fence back
alexreinking Mar 16, 2026
bccf4b7
Merge remote-tracking branch 'origin/main' into abadams/fix_x86_trans…
abadams Mar 30, 2026
60cd341
Try to thread the needle with webassembly nonsense
abadams Apr 3, 2026
9d7b904
Fix msvc warning
abadams Apr 6, 2026
9dd04eb
Skip simd_op_check_sve2 on old llvms
abadams Apr 6, 2026
11126e7
Merge remote-tracking branch 'origin/main' into abadams/fix_x86_trans…
abadams Apr 6, 2026
e5e6b66
Skip test on sve2 with llvm 21
abadams Apr 8, 2026
2470267
Skip block transpose performance test for sve2 on llvm 21
abadams Apr 9, 2026
479afa8
Skip sub-test that triggers llvm bug
abadams Apr 10, 2026
b46cb04
Test should hopefully now work with llvm main
abadams Apr 15, 2026
a753620
Introduce MultiRamp, a multi-dimensional ramp abstraction
abadams Apr 22, 2026
3f81633
Clarify MultiRamp API and simplify the atomic-store reduction path
abadams Apr 22, 2026
21bbcac
Add for_each_coordinate helper and use it
abadams Apr 22, 2026
fea1d7c
Fix three bugs found by the randomized vectorized-reduction test
abadams Apr 22, 2026
a4604f4
Expand MultiRamp and vectorized-reduction tests
abadams Apr 22, 2026
d468195
Cut down on number of transposed_vector_reduce test cases
abadams Apr 22, 2026
38f965f
Clarify comments flagged by a weak-model review
abadams Apr 23, 2026
23838fb
Merge remote-tracking branch 'origin/main' into abadams/multiramp
abadams Apr 29, 2026
98ff7c3
Remove holdover from transpose branch
abadams Apr 29, 2026
7a1fe5e
Apply pre-commit auto-fixes
halide-ci[bot] Apr 29, 2026
2f8e976
Fixes for the pre-commit auto fixes
abadams Apr 29, 2026
fcf9646
clang-tidy fix
abadams Apr 29, 2026
3a97c3b
Skip test under SVE
abadams May 1, 2026
198c3e3
This level of staging now unnecessary
abadams May 1, 2026
5f5c756
Simplify new simplifier rules
abadams May 3, 2026
92f1e4a
Rewrite ExtractTileOperations to use MultiRamp
abadams May 5, 2026
c96b6dd
typo fix
abadams May 5, 2026
e311029
Move and dedup success prints
abadams May 6, 2026
7968ce5
Remove bad assert (e may be undefined)
abadams May 6, 2026
01aecc2
Fully relower intrinsics to help multiramp matching
abadams May 6, 2026
1a74e32
Merge remote-tracking branch 'origin/main' into abadams/fix_amx
abadams May 6, 2026
8a8f5e9
Support outer tiling
abadams May 6, 2026
ebb55fe
Better error handling
abadams May 7, 2026
466e063
Add more test cases and fix bugs found
abadams May 7, 2026
9d7b737
Merge remote-tracking branch 'origin/main' into abadams/fix_amx
abadams May 8, 2026
3defc4b
clang-tidy fix
abadams May 8, 2026
0559b03
Revert unintended changes
abadams May 8, 2026
065b79c
Better error message if no update def.
abadams May 8, 2026
61388de
Add error-path test suite for ExtractTileOperations
abadams May 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions src/Deinterleave.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -610,7 +610,20 @@
return expr;
}

Scope<MemoryType> allocation_scope;
Stmt visit(const Allocate *op) override {
ScopedBinding<MemoryType> bind(allocation_scope, op->name, op->memory_type);
return IRMutator::visit(op);
}

Stmt visit(const Store *op) override {
// Don't mess with matrix multiply ops, which use natively-supported 2D
// loads and stores.
if (auto *alloc = allocation_scope.find(op->name);

Check failure on line 622 in src/Deinterleave.cpp

View workflow job for this annotation

GitHub Actions / Check clang-tidy

readability-qualified-auto,-warnings-as-errors

'auto *alloc' can be declared as 'const auto *alloc'
alloc && (*alloc) == MemoryType::AMXTile) {
return op;
}

bool old_should_deinterleave = should_deinterleave;
int old_num_lanes = num_lanes;

Expand Down Expand Up @@ -657,6 +670,13 @@
return Stmt();
}

// Don't mess with matrix multiply ops, which use natively-supported 2D
// loads and stores.
if (auto *alloc = allocation_scope.find(store->name);

Check failure on line 675 in src/Deinterleave.cpp

View workflow job for this annotation

GitHub Actions / Check clang-tidy

readability-qualified-auto,-warnings-as-errors

'auto *alloc' can be declared as 'const auto *alloc'
alloc && (*alloc) == MemoryType::AMXTile) {
return Stmt();
}

const Ramp *r0 = store->index.as<Ramp>();

// It's not a store of a ramp index.
Expand Down
Loading
Loading