Обсуждение: llvmjit - improve code generated in O0

Поиск
Список
Период
Сортировка

llvmjit - improve code generated in O0

От
Pierre Ducroquet
Дата:
Hi

After spending some time looking at the assembly code generated by llvmjit for a simple query (SELECT * FROM demo WHERE a = 42), digging in the IR showed that by simply tweaking the IR one could push llvm into generating better code, kind of "for free", without having to spend time in the LLVM optimizer.

I noticed the following patterns:
  • in EEOP_QUAL, the test is done, then if the test succeeds a jump is done, otherwise another jump is done:
```
0x...123 test $0x1, %al
0x...125 jne 0x....129
0x...127 jmp 0x...147
```
This can be fixed by inverting the jump: ```0x...125 je 0x.....147```
One less jump "for free".
  • in tuple deforming, the end blocks are generated before the attributes blocks. In O0 their assembly code end up before the attributes code, and thus there are jumps around these for no reason. This non-natural code layout can be prevented by simply creating the end blocks after the attributes blocks.
  • in tuple deforming, some blocks are created even when empty, and instead of jumping to the next non-empty block we jump to the next block. This creates the weird jump pattern I mentioned in my previous patch about always enabling the simplifycfg pass. By playing around the attributes blocks, it's possible to remove most jumps without having to rely on the optimizer (note that I do not withdraw the suggestion of adding the simplifycfg pass, it can do more than I do in this patch).

This may make the C code a bit harder to read, but the end result is quite positive.
On my Zen2 desktop system, running a very basic query with jit on and below optimization, I get the following:
no-patch:
AVG: 31.0667 run;1.2143000000000002 jit
MIN: 31.003 run;1.191 jit
MAX: 31.096 run;1.235 jit
STDEV: 0.030408514889382194;0.01458347618977652
patch:
AVG: 26.159 run;1.1922 jit
MIN: 26.069 run;1.165 jit
MAX: 26.235 run;1.215 jit
STDEV: 0.05207473262320014;0.01559772063119765

As you can see, this gives an interesting boost in performance for no CPU cost.

Вложения

Re: llvmjit - improve code generated in O0

От
Andres Freund
Дата:
Hi,

On 2026-02-10 17:39:40 +0000, Pierre Ducroquet wrote:
> After spending some time looking at the assembly code generated by llvmjit
> for a simple query (SELECT * FROM demo WHERE a = 42), digging in the IR
> showed that by simply tweaking the IR one could push llvm into generating
> better code, kind of "for free", without having to spend time in the LLVM
> optimizer.

Yea, if that's simple enough to do, there's no reason to not do that.

I do think we eventually need a somewhat better "cheap" optimization
pipeline. But even if we had that, there's no reason to not just immediately
generate better code if it's cheap.

Do these changes still make a difference after adding simplifycfg as you
propose?

Greetings,

Andres Freund



Re: llvmjit - improve code generated in O0

От
Pierre Ducroquet
Дата:
Le mardi 10 février 2026 à 9:43 PM, Andres Freund <andres@anarazel.de> a écrit :

> Hi,
>
> On 2026-02-10 17:39:40 +0000, Pierre Ducroquet wrote:
> > After spending some time looking at the assembly code generated by llvmjit
> > for a simple query (SELECT * FROM demo WHERE a = 42), digging in the IR
> > showed that by simply tweaking the IR one could push llvm into generating
> > better code, kind of "for free", without having to spend time in the LLVM
> > optimizer.
>
> Yea, if that's simple enough to do, there's no reason to not do that.
>
> I do think we eventually need a somewhat better "cheap" optimization
> pipeline. But even if we had that, there's no reason to not just immediately
> generate better code if it's cheap.
>
> Do these changes still make a difference after adding simplifycfg as you propose?

Nop, simplifycfg does all that so it makes these changes useless, and it is better at doing that than what I managed to
do.But I've not looked at the impact on simplifycfg runtime, having less work to do may make it a tiny bit faster, and
Ialso still have some edge cases I want to try regarding the impact of simplifycfg.