Multi CPU Queries - Feedback and/or suggestions wanted!

Поиск
Список
Период
Сортировка
От Julius Stroffek
Тема Multi CPU Queries - Feedback and/or suggestions wanted!
Дата
Msg-id 48FCD67D.4070006@sun.com
обсуждение исходный текст
Ответы Re: Multi CPU Queries - Feedback and/or suggestions wanted!  ("Jeffrey Baker" <jwbaker@gmail.com>)
Re: Multi CPU Queries - Feedback and/or suggestions wanted!  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Multi CPU Queries - Feedback and/or suggestions wanted!  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
Hi All,

we would like to start some work on improving the performance of
PostgreSQL in a multi-CPU environment. Dano Vojtek is student at the
Faculty of Mathematics and Physics of Charles university in Prague
(http://www.mff.cuni.cz) and he is going to cover this topic in his
master thesis. He is going to do some investigation in the methods and
write down the possibilities and then he is going to implement something
from that for PostgreSQL.

We want to come out with a serious proposal for this work after
collecting the feedback/opinions and doing the more serious investigation.

Topics that seem to be of interest and most of them were already
discussed at developers meeting in Ottawa are
1.) parallel sorts
2.) parallel query execution
3.) asynchronous I/O
4.) parallel COPY
5.) parallel pg_dump
6.) using threads for parallel processing

A scaling with increasing number of CPUs in 1.) and 2.) will face with
the I/O bottleneck at some point and the benefit gained here should be
nearly the same as for 3.) - the OS or disk could do a better job while
scheduling multiple reads from the disk for the same query at the same time.

1.)
More merges could be executed on different CPUs. However, one N-way
merge on one CPU is probably better than two N/2-way merges on 2 CPUs
while sharing the limit of work_mem together for these. This is specific
and separate from 2.) or 3.) and if something implemented here it could
probably share just the parallel infrastructure code.
========

2.)
Different subtrees (or nodes) of the plan could be executed in parallel
on different CPUs and the results of this subtrees could be requested
either synchronously or asynchronously.
========

3.)
The simplest possible way is to change the scan nodes that they will
send out the asynchronous  I/O requests for the next blocks before they
manage to run out of tuples in the block they are going through. The
more advanced way would arise just by implementing 2.) which will then
lead to different scan nodes to be executed on different CPUs at the
same time.
========

4.) and 5.)
We do not want to focus here, since there are on-going projects already.
========

6.)
Currently, threads are not used in PostgreSQL (except some cases for 
Windows OS). Generally using them would bring some problems

a) different thread implementations on different OSes
b) crash of the whole process if the problem happens in one thread.
Backends are isolated and the problem in one backend leads to the
graceful shut down of other backends.
c) synchronization problems

* a) seem just to be more for implementation. Is there any problem with 
execution of more threads on any supported OS? Like some planning issue 
that all the threads for the same process end up planned on the same 
CPU? Or something similar?

* b) is fine with using more threads for processing the same query in 
the same backend - if one crashes others could do the graceful shutdown.

* c) does not have to be solved in general because the work of
all the threads will be synchronized and we could expect pretty well 
which data are being accessed by which thread. The memory allocation 
have to be adjusted to be thread safe and should not affect the 
performance (Is different memory context for different threads 
sufficient?). Other common code might need some changes as well. 
Possibly, the synchronization/critical section exclusion could be done 
in executor and only if needed.

* Using processes instead of threads makes other things more complex  - sharing objects between processes might need
muchmore coding  - more overhead during execution and synchronization
 
========

It seems to that it makes sense to start working on 2) and 3) and we
would like to think of using more threads for processing the same query
within one backend.

We appreciate feedback, comments and/or suggestions.

Cheers

Julo



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Mark Cave-Ayland"
Дата:
Сообщение: Patch status for reducing de-TOAST overhead?
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Subtransaction commits and Hot Standby