Обсуждение: Default JIT setting in V12

Поиск
Список
Период
Сортировка

Default JIT setting in V12

От
Jeff Janes
Дата:
Since JIT is on by default in v12, I wanted to revisit the issue raised in https://www.postgresql.org/message-id/CAMkU=1zVhQ5k5d=YyHNyrigLUNTkOj4=YB17s9--3ts8H-SO=Q@mail.gmail.com

When the total estimated cost is between jit_above_cost and jit_optimize_above_cost, I get a substantial regression in the attached.  Note that I did not devise this case specifically to cause this problem, I just stumbled upon it.

JIT, no optimization: 10.5s
JIT, optimization: 3.8s
no JIT:  4.1s

It seems like the unoptimized JIT code is much worse than the general purpose code.

This is on AWS c4.large, Ubuntu 18.04, installed from PGDG apt repository.  No config changes were made, other than the local ones included in the script. (Previously there were questions about how LLVM was configured, that I couldn't really answer well, but here there should be question as I didn't compile or configure it at all.) 
 
There were some proposed mitigations in sister threads, but none have been adopted in v12.

I think it is intuitive, and with empirical evidence, that we do not want to JIT compile at all unless we are going to optimize the compiled code.

Is there a rationale for, or other examples to show, that it makes sense for the default value of jit_optimize_above_cost to be 5 fold higher than the default setting of jit_above_cost?

I think these defaults are setting a trap for our users who aren't really interested in JIT, and are just upgrading to stay on the most-current version.  I would propose lowering the default jit_optimize_above_cost to be the same as jit_above_cost, or set it to 0 so that  jit_above_cost is always in control and always optimizes.

Cheers,

Jeff

Вложения

Re: Default JIT setting in V12

От
Andres Freund
Дата:
Hi,

On 2019-09-04 09:56:28 -0400, Jeff Janes wrote:
> I think it is intuitive, and with empirical evidence, that we do not want
> to JIT compile at all unless we are going to optimize the compiled code.

There's pretty clear counter-evidence however as well :(

I think it's probably more sensible to use some cheap minimal
optimization for the "unoptimized" mode - because there's some
non-linear cost algorithms with full optimizations enabled.

How does your example look with something like:

diff --git i/src/backend/jit/llvm/llvmjit.c w/src/backend/jit/llvm/llvmjit.c
index 82c4afb7011..85ddae2ea2b 100644
--- i/src/backend/jit/llvm/llvmjit.c
+++ w/src/backend/jit/llvm/llvmjit.c
@@ -428,7 +428,7 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
     if (context->base.flags & PGJIT_OPT3)
         compile_optlevel = 3;
     else
-        compile_optlevel = 0;
+        compile_optlevel = 1;
 
     /*
      * Have to create a new pass manager builder every pass through, as the

which I think - but I'd have to check - doesn't include any of the
non-linear cost optimizations.


> Is there a rationale for, or other examples to show, that it makes sense
> for the default value of jit_optimize_above_cost to be 5 fold higher than
> the default setting of jit_above_cost?

Yes. IIRC even tpc-h or something shows that with small scale one does
get noticable - but not crazy - speedups with unoptimized code, but that
it's a loss with optimized code.

Greetings,

Andres Freund



Re: Default JIT setting in V12

От
Andres Freund
Дата:
Hi,

On 2019-09-04 07:51:16 -0700, Andres Freund wrote:
> On 2019-09-04 09:56:28 -0400, Jeff Janes wrote:
> > I think it is intuitive, and with empirical evidence, that we do not want
> > to JIT compile at all unless we are going to optimize the compiled code.
> 
> There's pretty clear counter-evidence however as well :(
> 
> I think it's probably more sensible to use some cheap minimal
> optimization for the "unoptimized" mode - because there's some
> non-linear cost algorithms with full optimizations enabled.
> 
> How does your example look with something like:
> 
> diff --git i/src/backend/jit/llvm/llvmjit.c w/src/backend/jit/llvm/llvmjit.c
> index 82c4afb7011..85ddae2ea2b 100644
> --- i/src/backend/jit/llvm/llvmjit.c
> +++ w/src/backend/jit/llvm/llvmjit.c
> @@ -428,7 +428,7 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
>      if (context->base.flags & PGJIT_OPT3)
>          compile_optlevel = 3;
>      else
> -        compile_optlevel = 0;
> +        compile_optlevel = 1;
>  
>      /*
>       * Have to create a new pass manager builder every pass through, as the
> 
> which I think - but I'd have to check - doesn't include any of the
> non-linear cost optimizations.

Or better, something slightly more complete, like the attached (which
affects both code-gen time optimizations (which are more like peephole
ones), and both function/global ones that are cheap).

Greetings,

Andres Freund

Вложения

Re: Default JIT setting in V12

От
Jeff Janes
Дата:
On Wed, Sep 4, 2019 at 11:24 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2019-09-04 07:51:16 -0700, Andres Freund wrote:
 
Or better, something slightly more complete, like the attached (which
affects both code-gen time optimizations (which are more like peephole
ones), and both function/global ones that are cheap).

Yes, that does completely solve the issue I raised.  It makes JIT either better or at least harmless, even when falling into the gap between  jit_above_cost and jit_optimize_above_cost.

What TPC-H implementation do you use/recommend?  This one https://wiki.postgresql.org/wiki/DBT-3?

Cheers,

Jeff

Re: Default JIT setting in V12

От
Soumyadeep Chakraborty
Дата:
Hello,

Based on this thread, Alexandra and I decided to investigate if we could borrow
some passes from -O1 and add on to the default optimization of -O0 and mem2reg.
To determine what passes would make most sense, we ran ICW with jit_above_cost
set to 0, dumped all the backends and then analyzed them with 'opt'. Based on
the stats dumped that the instcombine pass and sroa had the most scope for
optimization. We have attached the stats we dumped.

Then, we investigated whether mixing in sroa and instcombine gave us a better
run time. We used TPCH Q1 (TPCH repo we used:
https://github.com/dimitri/tpch-citus) at scales of 1, 5 and 50. We found that
there was no significant difference in query runtime over the default of -O0
with mem2reg.

We also performed the same experiment with -O1 as the default
optimization level, as Andres had suggested on this thread. We found
that the results were much more promising (refer the results for scale
= 5 and 50 below). At the lower scale of 1, we had to force optimization
to meet the query cost. There was no adverse impact from increased
query optimization time due to the ramp up to -O1 at this lower scale.


Results summary (eyeball-averaged over 5 runs, excluding first run after
restart. For each configuration we flushed the OS cache and restarted the
database):

settings: max_parallel_workers_per_gather = 0

scale = 50:
-O3                                                      : 77s
-O0 + mem2reg                                   : 107s
-O0 + mem2reg + instcombine            : 107s
-O0 + mem2reg + sroa                        : 107s
-O0 + mem2reg + sroa + instcombine : 107s
-O1                                                       : 84s

scale = 5:
-O3                                                       : 8s
-O0 + mem2reg                                    : 10s
-O0 + mem2reg + instcombine             : 10s
-O0 + mem2reg + sroa                         : 10s
-O0 + mem2reg + sroa + instcombine : 10s
-O1                                                       : 8s


scale = 1:
-O3                                                       : 1.7s
-O0 + mem2reg                                    : 1.7s
-O0 + mem2reg + instcombine            : 1.7s
-O0 + mem2reg + sroa                         : 1.7s
-O0 + mem2reg + sroa + instcombine : 1.7s
-O1                                                       : 1.7s

Based on the evidence above, maybe it is worth considering ramping up the
default optimization level to -O1.

Regards,

Soumyadeep and Alexandra
Вложения