Re: Lazy JIT IR code generation to increase JIT speed with partitions

Поиск
Список
Период
Сортировка
От David Geier
Тема Re: Lazy JIT IR code generation to increase JIT speed with partitions
Дата
Msg-id 254288e2-159c-dd85-b2ce-f9d331663e43@gmail.com
обсуждение исходный текст
Ответ на Re: Lazy JIT IR code generation to increase JIT speed with partitions  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Can you elaborate a bit more on how you conclude that?

Looking at the numbers I measured in one of my previous e-mails, it looks to me like the overhead of using multiple modules is fairly low and only measurable in queries with dozens of modules. Given that JIT is most useful in queries that process a fair amount of rows, having to spend marginally more time on creating the JIT program while being able to use JIT much more fine grained seems desirable. For example, the time you lose for handling more modules, you save right away because not the whole plan gets JIT compiled.

It is a trade-off between optimizing for the best case where everything in the plan can truly benefit from jitting and hence a single module that has it all is best, vs the worst-case where almost nothing truly profits from jitting and hence only a small fraction of the plan should actually be jitted. The penalty for the best case seems low though, because (1) the overhead is low in absolute terms, and (2) also if the entire plan truly benefits from jitting, spending sub-ms more per node seems neglectable because there is anyways going to be significant time spent.

--
David Geier
(ServiceNow)

On 7/4/22 22:23, Andres Freund wrote:
Hi,

On 2022-07-04 06:43:00 +0000, Luc Vlaming Hummel wrote:
Thanks for reviewing this and the interesting examples!

Wanted to give a bit of extra insight as to why I'd love to have a system that can lazily emit JIT code and hence creates roughly a module per function:
In the end I'm hoping that we can migrate to a system where we only JIT after a configurable cost has been exceeded for this node, as well as a configurable amount of rows has actually been processed.
Reason is that this would safeguard against some problematic planning issues
wrt JIT (node not being executed, row count being massively off).
I still don't see how it's viable to move to always doing function-by-function
emission overhead wise.

I also want to go to do JIT in the background and triggered by acutal
usage. But to me it seems a dead end to require moving to
one-function-per-module model for that.


If this means we have to invest more in making it cheap(er) to emit modules,
I'm all for that.
I think that's just inherently more expensive and thus a no-go.


@Andres if there's any other things we ought to fix to make this cheap
(enough) compared to the previous code I'd love to know your thoughts.
I'm not seeing it.

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Следующее
От: Aleksander Alekseev
Дата:
Сообщение: Re: [Commitfest 2022-07] Begins Now