[HACKERS] JIT compiling expressions/deform + inlining prototype v2.0

Поиск
Список
Период
Сортировка
От Andres Freund
Тема [HACKERS] JIT compiling expressions/deform + inlining prototype v2.0
Дата
Msg-id 20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
обсуждение исходный текст
Ответы Re: [HACKERS] JIT & function naming  (Andres Freund <andres@anarazel.de>)
Re: [HACKERS] JIT compiling expressions/deform + inlining prototypev2.0  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Re: [HACKERS] JIT compiling - v4.0  (Andres Freund <andres@anarazel.de>)
fixed tuple descs (was JIT compiling expressions/deform)  (Andres Freund <andres@anarazel.de>)
JIT compiling with LLVM v9.0  (Andres Freund <andres@anarazel.de>)
Re: JIT compiling with LLVM v10.0  (Andres Freund <andres@anarazel.de>)
JIT compiling with LLVM v11  (Andres Freund <andres@anarazel.de>)
JIT compiling with LLVM v12  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

I previously had an early prototype of JITing [1] expression evaluation
and tuple deforming.  I've since then worked a lot on this.

Here's an initial, not really pretty but functional, submission. This
supports all types of expressions, and tuples, and allows, albeit with
some drawbacks, inlining of builtin functions.  Between the version at
[1] and this I'd done some work in c++, because that allowed to
experiment more with llvm, but I've now translated everything back.
Some features I'd to re-implement due to limitations of C API.

As a teaser:
tpch_5[9586][1]=# set jit_expressions=0;set jit_tuple_deforming=0;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql

┌──────────────┬──────────────┬───────────┬──────────────────┬──────────────────┬──────────────────┬──────────────────┬──────────────────┬────────────────────┬─────────────┐
│ l_returnflag │ l_linestatus │  sum_qty  │  sum_base_price  │  sum_disc_price  │    sum_charge    │     avg_qty      │
  avg_price     │      avg_disc      │ count_order │
 

├──────────────┼──────────────┼───────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────────────┼─────────────┤
│ A            │ F            │ 188818373 │ 283107483036.109 │ 268952035589.054 │  279714361804.23 │ 25.5025937044707 │
38237.6725307617│ 0.0499976863510723 │     7403889 │
 
│ N            │ F            │   4913382 │ 7364213967.94998 │  6995782725.6633 │ 7275821143.98952 │ 25.5321530459003 │
38267.7833908406│ 0.0500308669240696 │      192439 │
 
│ N            │ O            │ 375088356 │ 562442339707.852 │ 534321895537.884 │ 555701690243.972 │ 25.4978961033505 │
38233.9150565265│ 0.0499956453049625 │    14710561 │
 
│ R            │ F            │ 188960009 │ 283310887148.206 │ 269147687267.211 │ 279912972474.866 │ 25.5132328961366 │
38252.4148049933│ 0.0499958481590264 │     7406353 │
 

└──────────────┴──────────────┴───────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┴────────────────────┴─────────────┘
(4 rows)

Time: 4367.486 ms (00:04.367)
tpch_5[9586][1]=# set jit_expressions=1;set jit_tuple_deforming=1;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
<repeat>
(4 rows)

Time: 3158.575 ms (00:03.159)

tpch_5[9586][1]=# set jit_expressions=0;set jit_tuple_deforming=0;
tpch_5[9586][1]=# \i ~/tmp/tpch/pg-tpch/queries/q01.sql
<repeat>
(4 rows)
Time: 4383.562 ms (00:04.384)

The potential wins of the JITing itself are considerably larger than the
already significant gains demonstrated above - this version here doesn't
exactly generate the nicest native code around.  After these patches the
bottlencks for TCP-H's Q01 are largely inside the float* functions and
the non-expressionified execGrouping.c code.  The latter needs to be
expressified to gain benefits due to JIT - that shouldn't be very hard.

The code generation can be improved by moving more of the variable data
into llvm allocated stack data, that also has other benefits.

The patch series currently consists out of the following:

0001-Rely-on-executor-utils-to-build-targetlist-for-DML-R.patch
- boring prep work

0002-WIP-Allow-tupleslots-to-have-a-fixed-tupledesc-use-i.patch
- for JITed deforming we need to know whether a slot's tupledesc will
  change

0003-WIP-Add-configure-infrastructure-to-enable-LLVM.patch
- boring

0004-WIP-Beginning-of-a-LLVM-JIT-infrastructure.patch
- infrastructure for llvm, including memory lifetime management, and
  bulk emission of functions.

0005-Perform-slot-validity-checks-in-a-separate-pass-over.patch
- boring, prep work for expression jiting

0006-WIP-deduplicate-int-float-overflow-handling-code.patch
- boring

0007-Pass-through-PlanState-parent-to-expression-instanti.patch
- boring

0008-WIP-JIT-compile-expression.patch
- that's the biggest patch, actually adding JITing
- code needs to be better documented, tested, and deduplicated

0009-Simplify-aggregate-code-a-bit.patch
0010-More-efficient-AggState-pertrans-iteration.patch
0011-Avoid-dereferencing-tts_values-nulls-repeatedly.patch
0012-Centralize-slot-deforming-logic-a-bit.patch
- boring, mostly to make comparison between JITed and non-jitted a bit
  fairer and to remove unnecessary other bottlenecks.

0013-WIP-Make-scan-desc-available-for-all-PlanStates.patch
- this isn't clean enough.

0014-WIP-JITed-tuple-deforming.patch

- do JITing of deforming, but only when called from within expression,
  there we know which columns we want to be deformed etc.

- Not clear what'd be a good way to also JIT other deforming without
  additional infrastructure - doing a separate function emission for
  every slot_deform_tuple() is unattractive performancewise and
  memory-lifetime wise, I did have that at first.

0015-WIP-Expression-based-agg-transition.patch
- allows to JIT aggregate transition invocation, but also speeds up
  aggregates without JIT.

0016-Hacky-Preliminary-inlining-implementation.patch
- allows to inline functions, by using bitcode. That bitcode can be
  loaded from a list of directories - as long as compatibly configured
  the bitcode doesn't have to be generated by the same compiler as the
  postgres binary. i.e. gcc postgres + clang bitcode works.

I've whacked this around quite heavily today, this likely has some new
bugs, sorry for that :(


I plan to spend some considerable time over the next weeks to clean this
up and address some of the areas where the performance isn't yet as good
as desirable.


Greetings,

Andres Freund

[1] http://archives.postgresql.org/message-id/20161206034955.bh33paeralxbtluv%40alap3.anarazel.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuro Yamada
Дата:
Сообщение: Re: [HACKERS] CLUSTER command progress monitor
Следующее
От: Haribabu Kommi
Дата:
Сообщение: Re: [HACKERS] utility commands benefiting from parallel plan