Re: Parallel query execution

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Parallel query execution
Дата
Msg-id 20130115230847.GB32658@momjian.us
обсуждение исходный текст
Ответ на Re: Parallel query execution  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
Список pgsql-hackers
On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote:
> On 16/01/13 11:14, Bruce Momjian wrote:
> 
>     I mentioned last year that I wanted to start working on parallelism:
> 
>             https://wiki.postgresql.org/wiki/Parallel_Query_Execution
> 
>     Years ago I added thread-safety to libpq.  Recently I added two parallel
>     execution paths to pg_upgrade.  The first parallel path allows execution
>     of external binaries pg_dump and psql (to restore).  The second parallel
>     path does copy/link by calling fork/thread-safe C functions.  I was able
>     to do each in 2-3 days.
> 
>     I believe it is time to start adding parallel execution to the backend.
>     We already have some parallelism in the backend:
>     effective_io_concurrency and helper processes.  I think it is time we
>     start to consider additional options.
> 
>     Parallelism isn't going to help all queries, in fact it might be just a
>     small subset, but it will be the larger queries.  The pg_upgrade
>     parallelism only helps clusters with multiple databases or tablespaces,
>     but the improvements are significant.
> 
>     I have summarized my ideas by updating our Parallel Query Execution wiki
>     page:
> 
>             https://wiki.postgresql.org/wiki/Parallel_Query_Execution
> 
>     Please consider updating the page yourself or posting your ideas to this
>     thread.  Thanks.
> 
> 
> Hmm...
> 
> How about being aware of multiple spindles - so if the requested data covers
> multiple spindles, then data could be extracted in parallel.  This may, or may
> not, involve multiple I/O channels?

Well, we usually label these as tablespaces.  I don't know if
spindle-level is a reasonable level to add.

> On large multiple processor machines, there are different blocks of memory that
> might be accessed at different speeds depending on the processor.  Possibly a
> mechanism could be used to split a transaction over multiple processors to
> ensure the fastest memory is used?

That seems too far-out for an initial approach.

> Once a selection of rows has been made, then if there is a lot of reformatting
> going on, then could this be done in parallel?  I can of think of 2 very
> simplistic strategies: (A) use a different processor core for each column, or
> (B) farm out sets of rows to different cores.  I am sure in reality, there are
> more subtleties and aspects of both the strategies will be used in a hybrid
> fashion along with other approaches.

Probably #2, but that is going to require having some of modules
thread/fork-safe, and that is going to be tricky.

> I expect that before any parallel algorithm is invoked, then some sort of
> threshold needs to be exceeded to make it worth while.  Different aspects of
> the parallel algorithm may have their own thresholds.  It may not be worth
> applying a parallel algorithm for 10 rows from a simple table, but selecting
> 10,000 records from multiple tables each over 10 million rows using joins may
> benefit for more extreme parallelism.

Right, I bet we will need some way to control when the overhead of
parallel execution is worth it.

> I expect that UNIONs, as well as the processing of partitioned tables, may be
> amenable to parallel processing.

Interesting idea on UNION.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: [PATCH] COPY .. COMPRESSED
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: Parallel query execution