Support Parallel Query Execution in Executor

Поиск
Список
Период
Сортировка
От Qingqing Zhou
Тема Support Parallel Query Execution in Executor
Дата
Msg-id e12qms$2s6q$1@news.hub.org
обсуждение исходный текст
Ответы Re: Support Parallel Query Execution in Executor  (Martijn van Oosterhout <kleptog@svana.org>)
Re: Support Parallel Query Execution in Executor  ("Jonah H. Harris" <jonah.harris@gmail.com>)
Список pgsql-hackers
I have written some experimental code of doing master-slave seqscan in
PostgreSQL. During the work, I feel we had enough infrastructure to support
parallel query execution.

What I did is adding a new node PARA and plug it above the node that we want
to execute in parallel. In this stage, a PARA node is just a SeqScan node,
which is:

typedef struct Para
{/* TODO: add a union to put all nodes supporting parallism here */SeqScan     scan;
/* Split / Merge / Redistribute */ParaType    type;
/* TODO: other possible parameters */
} Para;

At the execution, the master (the process who receives the query) will wake
up a slave process (an idle ordinary backend) and the slave will pass the
scan results to the master via a shared memory communication-buffer. In
details, the execution is like this:

Master process:
1. PARA init: wake up a slave, pass the queryTree and outerPlan(planTree) to
it by nodeToString();
2. PARA exec:   get an item from the communication-buffer;   if item is a valid tuple       return item;   else
handleother types of item;    /* execution done/error */
 
3. PARA end:  do some cleanup.

As we can see from PARA init stage, with even the most simple PARA node, it
is easy to support inter-node parallism.

Slave process (use similar code for autovacuum process):
1. Get queryTree and planTree;
2. Redirect the destReceiver to the communication-buffer;
3. Encapsulate them in an executor and run;

The query plan is like this:
TEST=# explain select max(a), max(b) from t;                             QUERY PLAN
----------------------------------------------------------------------Aggregate  (cost=7269.01..7269.02 rows=1
width=53) ->  Para [Split = 1] (cost=10.00..5879.00 rows=278000 width=53)        ->  Seq Scan on T  (cost=0.00..5879.00
rows=278000width=53)
 
(3 rows)

There are some problems I haven't addressed yet. The most difficult one for
me is the xid assignment: master and slaves should see an identical view,
and the key is the xid. I am not sure the correct solution of this problem.
We may use the same xid or use a continuous portion of xids for master and
slaves. There are other problems like the login problem (the master and
slaves should be acting as the same user), the elog message passing etc are
also important but I think we are able to handle them without any problem.

I haven't touched the most difficult part, the parallel query optimizer. But
thanks to the two-phase parallel optimization technique, this part can be
treated as the geqo optimizer, without enough evidence, we don't enable
parallel query execution.

Is there any show-stop reasons of not doing this?

Regards,
Qingqing













В списке pgsql-hackers по дате отправления:

Предыдущее
От: Luckys
Дата:
Сообщение: Explaining Explain
Следующее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Support Parallel Query Execution in Executor