Re: Parallel query execution
От | Gavin Flower |
---|---|
Тема | Re: Parallel query execution |
Дата | |
Msg-id | 50F5E056.1060502@archidevsys.co.nz обсуждение исходный текст |
Ответ на | Parallel query execution (Bruce Momjian <bruce@momjian.us>) |
Ответы |
Re: Parallel query execution
(Bruce Momjian <bruce@momjian.us>)
Re: Parallel query execution (Stephen Frost <sfrost@snowman.net>) Re: Parallel query execution (Jeff Janes <jeff.janes@gmail.com>) |
Список | pgsql-hackers |
<div class="moz-cite-prefix">On 16/01/13 11:14, Bruce Momjian wrote:<br /></div><blockquote cite="mid:20130115221419.GI27934@momjian.us"type="cite"><pre wrap="">I mentioned last year that I wanted to start workingon parallelism: <a class="moz-txt-link-freetext" href="https://wiki.postgresql.org/wiki/Parallel_Query_Execution">https://wiki.postgresql.org/wiki/Parallel_Query_Execution</a> Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: <a class="moz-txt-link-freetext" href="https://wiki.postgresql.org/wiki/Parallel_Query_Execution">https://wiki.postgresql.org/wiki/Parallel_Query_Execution</a> Please consider updating the page yourself or posting your ideas to this thread. Thanks. </pre></blockquote><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><font size="-1"><fontsize="-1">Hmm...<br /><br /> How about being aware of multiple spindles - so if the requested data coversmultiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels?<br/><br /> On large multiple processor machines, there are different blocks of memory that might be accessed atdifferent speeds depending on the processor. Possibly a mechanism could be used to split a transaction over multiple processorsto ensure the fastest memory is used?<br /><br /> Once a selection of rows has been made, then if there is a lotof reformatting going on, then could this be done in parallel? I can of think of 2 very simplistic strategies: (A) usea different processor core for each column, or (B) farm out sets of rows to different cores. I am sure in reality, thereare more subtleties and aspects of both the strategies will be used in a hybrid fashion along with other approaches.<br/><br /> I expect that before any parallel algorithm is invoked, then some sort of threshold needs to be exceededto make it worth while. Different aspects of the parallel algorithm may have their own thresholds. It may not beworth applying a parallel algorithm for 10 rows from a simple table, but selecting 10,000 records from multiple tableseach over 10 <font size="-1">million</font> rows using joins may <font size="-1">benefit</font> for more extreme parallelism.<br/><br /> I expect that UNIONs, as well as the processing of partitioned tables, may be amenable to parallelprocessing.<br /><br /><br /> Cheers,<br /> Gavin<br /><br /></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font>
В списке pgsql-hackers по дате отправления: