fixed tuple descs (was JIT compiling expressions/deform)

Поиск
Список
Период
Сортировка
От Andres Freund
Тема fixed tuple descs (was JIT compiling expressions/deform)
Дата
Msg-id 20171206093717.vqdxe5icqttpxs3p@alap3.anarazel.de
обсуждение исходный текст
Ответ на [HACKERS] JIT compiling expressions/deform + inlining prototype v2.0  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

One part of the work to make JITing worth it's while is JITing tuple
deforming. That's currently often the biggest consumer of time, and if not
most often in the top entries.

My experimentation shows that tuple deforming is primarily beneficial
when it happens as *part* of jit compiling expressions. I'd originally
tried to jit compile deforming inside heaptuple.c, and cache the
deforming program inside the tuple slot. That turns out to not work very
well, because a lot of tuple descriptors are very short lived, computed
during ExecInitNode(). Even if that were not the case, compiling for
each deforming on demand has significant downsides:
- it requires emitting code in smaller increments (whenever something
  new is deformed)
- because the generated code has to be generic for all potential
  deformers, the number of branches to check for that are
  significant. If instead the the deforming code is generated for a
  specific callsite, no branches for the number of to-be-deformed
  columns has to be generated. The primary remaining branches then are
  the ones checking for NULLs and the number of attributes in the
  column, and those can often be optimized away if there's NOT NULL
  columns present.
- the call overhead is still noticeable
- the memory / function lifetime management is awkward.

If the JITing of expressions is instead done as part of expression
evaluation we can emit all the necessary code for the whole plantree
during executor startup, in one go. And, more importantly, LLVMs
optimizer is free to inline the deforming code into the expression code,
often yielding noticeable improvements (although that still could use
some improvements).

To allow doing JITing at ExecReadyExpr() time, we need to know the tuple
descriptor a EEOP_{INNER,OUTER,SCAN}_FETCHSOME step refers to. There's
currently two major impediments to that.

1) At a lot of ExecInitExpr() callsites the tupledescs for inner, outer,
   scan aren't yet known. Therefore that code needs to be reordered so
   we (if applicable):
   a) initialize subsidiary nodes, thereby determining the left/right
      (inner/outer) tupledescs
   b) initialize the scan tuple desc, often that refers to a)
   c) determine the result tuple desc, required to build the projection
   d) build projections
   e) build expressions

   Attached is a patch doing so. Currently it only applies with a few
   preliminary patches applied, but that could be easily reordered.

   The patch is relatively large, as I decided to try to get the
   different ExecInitNode functions to look a bit more similar. There's
   some judgement calls involved, but I think the result looks a good
   bit better, regardless of the later need.

   I'm not really happy with the, preexisting, split of functions
   between execScan.c, execTuples.c, execUtils.c. I wonder if the
   majority, except the low level slot ones, shouldn't be moved to
   execUtils.c, I think that'd be clearer. There seems to be no
   justification for execScan.c to contain
   ExecAssignScanProjectionInfo[WithVarno].

2) TupleSlots need to describe whether they'll contain a fixed tupledesc
   for all their lifetime, or whether they can change their nature. Most
   places don't need to ever change a slot's identity, but in a few
   places it's quite convenient.

   I've introduced the notion that a tupledesc can be marked as "fixed",
   by passing a tupledesc at its creation. That also gains a bit of
   efficiency (memory management overhead, higher cache hit ratio)
   because the slot, tts_values, tts_isnull can be allocated in one
   chunk.

3) At expression initialization time we need to figure out what slots
   (or just descs INNER/OUTER/SCAN refer to. I've solved that by looking
   up inner/outer/scan via the provided parent node, which required
   adding a new field to store the scan slot.

   Currently no expressions initialized with a parent node have a
   INNER/OUTER/SCAN slot + desc that doesn't refer to the relevant node,
   but I'm not sure I like that as a requirement.


Attached is a patch that implements 1 + 2. I'd welcome a quick look
through it. It currently only applies ontop a few other recently
submitted patches, but it'd just be an hour's work or so to reorder
that.

Comments about either the outline above or the patch?

Regards,

Andres

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: es_query_dsa is broken
Следующее
От: Amit Khandekar
Дата:
Сообщение: Re: [HACKERS] Parallel Append implementation