Multi CPU Queries - Feedback and/or suggestions wanted!
От | Julius Stroffek |
---|---|
Тема | Multi CPU Queries - Feedback and/or suggestions wanted! |
Дата | |
Msg-id | 48FCD67D.4070006@sun.com обсуждение исходный текст |
Ответы |
Re: Multi CPU Queries - Feedback and/or suggestions wanted!
("Jeffrey Baker" <jwbaker@gmail.com>)
Re: Multi CPU Queries - Feedback and/or suggestions wanted! (Simon Riggs <simon@2ndQuadrant.com>) Re: Multi CPU Queries - Feedback and/or suggestions wanted! (Bruce Momjian <bruce@momjian.us>) |
Список | pgsql-hackers |
Hi All, we would like to start some work on improving the performance of PostgreSQL in a multi-CPU environment. Dano Vojtek is student at the Faculty of Mathematics and Physics of Charles university in Prague (http://www.mff.cuni.cz) and he is going to cover this topic in his master thesis. He is going to do some investigation in the methods and write down the possibilities and then he is going to implement something from that for PostgreSQL. We want to come out with a serious proposal for this work after collecting the feedback/opinions and doing the more serious investigation. Topics that seem to be of interest and most of them were already discussed at developers meeting in Ottawa are 1.) parallel sorts 2.) parallel query execution 3.) asynchronous I/O 4.) parallel COPY 5.) parallel pg_dump 6.) using threads for parallel processing A scaling with increasing number of CPUs in 1.) and 2.) will face with the I/O bottleneck at some point and the benefit gained here should be nearly the same as for 3.) - the OS or disk could do a better job while scheduling multiple reads from the disk for the same query at the same time. 1.) More merges could be executed on different CPUs. However, one N-way merge on one CPU is probably better than two N/2-way merges on 2 CPUs while sharing the limit of work_mem together for these. This is specific and separate from 2.) or 3.) and if something implemented here it could probably share just the parallel infrastructure code. ======== 2.) Different subtrees (or nodes) of the plan could be executed in parallel on different CPUs and the results of this subtrees could be requested either synchronously or asynchronously. ======== 3.) The simplest possible way is to change the scan nodes that they will send out the asynchronous I/O requests for the next blocks before they manage to run out of tuples in the block they are going through. The more advanced way would arise just by implementing 2.) which will then lead to different scan nodes to be executed on different CPUs at the same time. ======== 4.) and 5.) We do not want to focus here, since there are on-going projects already. ======== 6.) Currently, threads are not used in PostgreSQL (except some cases for Windows OS). Generally using them would bring some problems a) different thread implementations on different OSes b) crash of the whole process if the problem happens in one thread. Backends are isolated and the problem in one backend leads to the graceful shut down of other backends. c) synchronization problems * a) seem just to be more for implementation. Is there any problem with execution of more threads on any supported OS? Like some planning issue that all the threads for the same process end up planned on the same CPU? Or something similar? * b) is fine with using more threads for processing the same query in the same backend - if one crashes others could do the graceful shutdown. * c) does not have to be solved in general because the work of all the threads will be synchronized and we could expect pretty well which data are being accessed by which thread. The memory allocation have to be adjusted to be thread safe and should not affect the performance (Is different memory context for different threads sufficient?). Other common code might need some changes as well. Possibly, the synchronization/critical section exclusion could be done in executor and only if needed. * Using processes instead of threads makes other things more complex - sharing objects between processes might need muchmore coding - more overhead during execution and synchronization ======== It seems to that it makes sense to start working on 2) and 3) and we would like to think of using more threads for processing the same query within one backend. We appreciate feedback, comments and/or suggestions. Cheers Julo
В списке pgsql-hackers по дате отправления: