Re: asynchronous and vectorized execution

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: asynchronous and vectorized execution
Дата
Msg-id CA+TgmobwSEndyr669qpyN_u4XhkPo_2C1BAs2ydb22T4niv_aQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: asynchronous and vectorized execution  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Tue, May 10, 2016 at 8:23 PM, Andres Freund <andres@anarazel.de> wrote:
>> c. Modify some nodes (perhaps start with nodeAgg.c) to allow them to
>> process a batch TupleTableSlot. This will require some tight loop to
>> aggregate the entire TupleTableSlot at once before returning.
>> d. Add function in execAmi.c which returns true or false depending on
>> if the node supports batch TupleTableSlots or not.
>> e. At executor startup determine if the entire plan tree supports
>> batch TupleTableSlots, if so enable batch scan mode.
>
> It doesn't really need to be the entire tree. Even if you have a subtree
> (say a parametrized index nested loop join) which doesn't support batch
> mode, you'll likely still see performance benefits by building a batch
> one layer above the non-batch-supporting node.

+1.

I've also wondered about building a new executor node that is sort of
a combination of Nested Loop and Hash Join, but capable of performing
multiple joins in a single operation. (Merge Join is different,
because it's actually matching up the two sides, not just doing
probing once per outer tuple.) So the plan tree would look something
like this:

Multiway Join
-> Seq Scan on driving_table
-> Index Scan on something
-> Index Scan on something_else
-> Hash -> Seq Scan on other_thing
-> Hash -> Seq Scan on other_thing_2
-> Index Scan on another_one

With the current structure, every level of the plan tree has its own
TupleTableSlot and we have to project into each new slot.  Every level
has to go through ExecProcNode.  So it seems to me that this sort of
structure might save quite a few cycles on deep join nests.  I haven't
tried it, though.

With batching, things get even better for this sort of thing.
Assuming the joins are all basically semi-joins, either because they
were written that way or because they are probing unique indexes or
whatever, you can fetch a batch of tuples from the driving table, do
the first join for each tuple to create a matching batch of tuples,
and repeat for each join step.  Then at the end you project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Academic help for Postgres
Следующее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: asynchronous and vectorized execution