Re: JIT compiling with LLVM v9.0

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: JIT compiling with LLVM v9.0
Дата
Msg-id 20180127212037.qrsq2hhfymhpsmjg@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: JIT compiling with LLVM v9.0  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: JIT compiling with LLVM v9.0
Список pgsql-hackers
Hi,

On 2018-01-26 22:52:35 -0800, Jeff Davis wrote:
> The version of LLVM that I tried this against had a linker option
> called "InternalizeLinkedSymbols" that would prevent the visibility
> problem you mention (assuming I understand you correctly).

I don't think they're fully solvable - you can't really internalize a
reference to a mutable static variable in another translation
unit. Unless you modify that translation unit, which doesn't work when
postgres running.


> That option is no longer there so I will have to figure out how to do
> it with the current LLVM API.

Look at the llvmjit_wrap.c code invoking FunctionImporter - that pretty
much does that.  I'll push a cleaned up version of that code sometime
this weekend (it'll then live in llvmjit_inline.cpp).


> > Afaict that's effectively what I've already implemented. We could export
> > more input as constants to the generated program, but other than that...
>
> I brought this up in the context of slot_compile_deform(). In your
> patch, you have code like:
>
> +               if (!att->attnotnull)
> +               {
> ...
> +                       v_nullbyte = LLVMBuildLoad(
> +                               builder,
> +                               LLVMBuildGEP(builder, v_bits,
> +                                                        &v_nullbyteno, 1, ""),
> +                               "attnullbyte");
> +
> +                       v_nullbit = LLVMBuildICmp(
> +                               builder,
> +                               LLVMIntEQ,
> +                               LLVMBuildAnd(builder, v_nullbyte,
> v_nullbytemask, ""),
> +                               LLVMConstInt(LLVMInt8Type(), 0, false),
> +                               "attisnull");
> ...
>
> So it looks like you are reimplementing the generic code, but with
> conditional code gen. If the generic code changes, someone will need
> to read, understand, and change this code, too, right?

Right. Not that that's code that has changed that much...


> With my approach, then it would initially do *un*conditional code gen,
> and be less efficient and less specialized than the code generated by
> your current patch. But then it would link in the constant tupledesc,
> and optimize, and the optimizer will realize that they are constants
> (hopefully) and then cut out a lot of the dead code and specialize it
> to the given tupledesc.

Right.


> This places a lot of faith in the optimizer and I realize it may not
> happen as nicely with real code as it did with my earlier experiments.
> Maybe you already tried and you are saying that's a dead end? I'll
> give it a shot, though.

I did that, yes. There's two major downsides:

a) The code isn't as efficient as the handrolled code. The handrolled
   code e.g. can take into account that it doesn't need to access the
   NULL bitmap for a NOT NULL column and we don't need to check the
   tuple's number of attributes if there's a following NOT NULL
   attribute. Those safe a good number of cycles.

b) The optimizations to take advantage of the constants and make the
   code faster with the constant tupledesc is fairly slow (you pretty
   much need at least an -O2 equivalent), whereas the handrolled tuple
   deforming is faster than the slot_getsomeattrs with just a single,
   pretty cheap, mem2reg pass.  We're talking about ~1ms vs 70-100ms in
   a lot of cases.  The optimizer often will not actually unroll the
   loop with many attributes despite that being beneficial.

I think in most cases using the approach you advocate makes sense, to
avoid duplication, but tuple deforming is such a major bottleneck that I
think it's clearly worth doing it manually. Being able to use llvm with
just a always-inline and a mem2reg pass makes it so much more widely
applicable than doing the full inlining and optimization work.


> >> I experimented a bit before and it works for basic cases, but I'm not
> >> sure if it's as good as your hand-generated LLVM.
> >
> > For deforming it doesn't even remotely get as good in my experiments.
>
> I'd like some more information here -- what didn't work? It didn't
> recognize constants? Or did recognize them, but didn't optimize as
> well as you did by hand?

It didn't optimize as well as I did by hand, without significantly
complicating (and slowing) the originating the code. It sometimes
decided not to unroll the loop, and it takes a *lot* longer than the
direct emission of the code.

I'm hoping to work on making more of the executor JITed, and there I do
think it's largely going to be what you're proposing, due to the sheer
mass of code.

Greetings,

Andres Freund


В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?