Avoiding tuple construction/deconstruction during joining

Поиск
Список
Период
Сортировка
=?ISO-8859-2?Q?Miroslav_=A9ulc?= <miroslav.sulc@startnet.cz> writes:
> [ concerning a deeply nested LEFT JOIN to get data from a star schema ]

> So I have some results. I have tested the query on both PostgreSQL 8.0.1
> and MySQL 4.1.8 with LIMIT set to 30 and OFFSET set to 6000. PostgreSQL
> result is 11,667.916 ms, MySQL result is 448.4 ms.

That's a fairly impressive discrepancy :-(, and even the slot_getattr()
patch that Atsushi Ogawa provided isn't going to close the gap.
(I got about a 4x speedup on Miroslav's example in my testing, which
leaves us still maybe 6x slower than MySQL.)

Looking at the post-patch profile for the test case, there is still
quite a lot of cycles going into tuple assembly and disassembly:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 24.47      4.49     4.49                             _mcount
  8.01      5.96     1.47  9143692     0.00     0.00  ExecEvalVar
  6.92      7.23     1.27  6614373     0.00     0.00  slot_deformtuple
  6.54      8.43     1.20  9143692     0.00     0.00  slot_getattr
  6.21      9.57     1.14   103737     0.01     0.03  ExecTargetList
  5.56     10.59     1.02   103775     0.01     0.01  DataFill
  3.22     11.18     0.59   103775     0.01     0.01  ComputeDataSize
  2.83     11.70     0.52                             ExecEvalVar
  2.72     12.20     0.50  9094122     0.00     0.00  memcpy
  2.51     12.66     0.46                             encore
  2.40     13.10     0.44   427448     0.00     0.00  nocachegetattr
  2.13     13.49     0.39   103775     0.00     0.02  heap_formtuple
  2.07     13.87     0.38                             noshlibs
  1.20     14.09     0.22   225329     0.00     0.00  _doprnt
  1.20     14.31     0.22                             msquadloop
  1.14     14.52     0.21                             chunks
  0.98     14.70     0.18   871885     0.00     0.00  AllocSetAlloc
  0.98     14.88     0.18                             $$dyncall
  0.76     15.02     0.14   594242     0.00     0.00  FunctionCall3
  0.71     15.15     0.13   213312     0.00     0.00  comparetup_heap
  0.65     15.27     0.12     6364     0.02     0.13  printtup
  0.60     15.38     0.11   790702     0.00     0.00  pfree

(_mcount is profiling overhead, ignore it.)  It looks to me like just
about everything in the top dozen functions is there as a result of the
fact that join steps form new tuples that are the merge of their input
tuples.  Even our favorite villains, palloc and pfree, are down in the
sub-percent range.

I am guessing that the reason MySQL wins on this is that they avoid
doing any data copying during a join step.  I wonder whether we could
accomplish the same by taking Ogawa's patch to the next level: allow
a TupleTableSlot to contain either a "materialized" tuple as now,
or a "virtual" tuple that is simply an array of Datums and null flags.
(It's virtual in the sense that any pass-by-reference Datums would have
to be pointing to data at the next level down.)  This would essentially
turn the formtuple and deformtuple operations into no-ops, and get rid
of a lot of the associated overhead such as ComputeDataSize and
DataFill.  The only operations that would have to forcibly materialize
a tuple would be ones that need to keep the tuple till after they fetch
their next input tuple --- hashing and sorting are examples, but very
many plan node types don't ever need to do that.

I haven't worked out the details, but it seems likely that this could be
a relatively nonintrusive patch.  The main thing that would be an issue
would be that direct reference to slot->val would become verboten (since
you could no longer be sure there was a materialized tuple there).
I think this would possibly affect some contrib stuff, which is a strong
hint that it'd break some existing user-written code out there.

Thoughts?

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Miroslav Šulc
Дата:
Сообщение: Re: How to read query plan
Следующее
От: Robert Treat
Дата:
Сообщение: Re: column name is "LIMIT"