Re: Lazy JIT IR code generation to increase JIT speed with partitions

Поиск

Список

Период

Сортировка

От	David Geier
Тема	Re: Lazy JIT IR code generation to increase JIT speed with partitions
Дата	18 июля 2022 г. 09:00:08
Msg-id	254288e2-159c-dd85-b2ce-f9d331663e43@gmail.com обсуждение исходный текст
Ответ на	Re: Lazy JIT IR code generation to increase JIT speed with partitions (Andres Freund <andres@anarazel.de>)
Список	pgsql-hackers

Дерево обсуждения

Can you elaborate a bit more on how you conclude that?

Looking at the numbers I measured in one of my previous e-mails, it looks to me like the overhead of using multiple modules is fairly low and only measurable in queries with dozens of modules. Given that JIT is most useful in queries that process a fair amount of rows, having to spend marginally more time on creating the JIT program while being able to use JIT much more fine grained seems desirable. For example, the time you lose for handling more modules, you save right away because not the whole plan gets JIT compiled.

It is a trade-off between optimizing for the best case where everything in the plan can truly benefit from jitting and hence a single module that has it all is best, vs the worst-case where almost nothing truly profits from jitting and hence only a small fraction of the plan should actually be jitted. The penalty for the best case seems low though, because (1) the overhead is low in absolute terms, and (2) also if the entire plan truly benefits from jitting, spending sub-ms more per node seems neglectable because there is anyways going to be significant time spent.

--

David Geier
 (ServiceNow)

On 7/4/22 22:23, Andres Freund wrote:

Hi,

On 2022-07-04 06:43:00 +0000, Luc Vlaming Hummel wrote:

Thanks for reviewing this and the interesting examples!

Wanted to give a bit of extra insight as to why I'd love to have a system that can lazily emit JIT code and hence creates roughly a module per function:
In the end I'm hoping that we can migrate to a system where we only JIT after a configurable cost has been exceeded for this node, as well as a configurable amount of rows has actually been processed.
Reason is that this would safeguard against some problematic planning issues
wrt JIT (node not being executed, row count being massively off).

I still don't see how it's viable to move to always doing function-by-function
emission overhead wise.

I also want to go to do JIT in the background and triggered by acutal
usage. But to me it seems a dead end to require moving to
one-function-per-module model for that.

If this means we have to invest more in making it cheap(er) to emit modules,
I'm all for that.

I think that's just inherently more expensive and thus a no-go.

@Andres if there's any other things we ought to fix to make this cheap
(enough) compared to the previous code I'd love to know your thoughts.

I'm not seeing it.

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Lazy JIT IR code generation to increase JIT speed with partitions