-
Notifications
You must be signed in to change notification settings - Fork 12
Description
When looking at #57 in O0, i found at that this is a more generic issue.
Here are two examples with the problem (64-bit rotation & 64-bit count leading zeros): https://dpu.dev/z/zwJoIP
The important, and faulty, part in both cases is that the result from the arithmetic + jump instruction is not correctly spilled:
- after
lsl r0, r1, r0, sh32, .LBB0_2,r0is directly erased - after
clz.u d4, r0, nmax, .LBB0_2,d4is not used and finally erased
Here is my hypothesis: in O0, llvm will always spill registers just before the last instruction in a MBB. However, this is only possible if the last instruction in the MBB does not modify registers, and it seems like llvm does not handle the other case correctly (it will try to spill the register before the last instruction anyway).
It seems to me that the only safe solution for us is to never use arithmetic + jump instructions in O0.
Most uses can be found in the functions called by DPUTargetLowering::EmitInstrWithCustomInserter (in llvm/lib/Target/DPU/DPUTargetLowering.cpp). The 16-bit multiplication case seems to be already handled (cf PerformMULCombine in llvm/lib/Target/DPU/DPUTargetLowering.cpp).
The other important case is when we automatically try to merge two instructions (cf DPUMacroFusion.cpp and mostly DPUMergeComboInstrPass.cpp). I could not find an example to make it fail (llvm adds a lot of load/store between the operations before this pass, so that it may not be possible to merge anything), however I think it would be safer to also disable this pass in O0.