Re: Parallel Sort

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Parallel Sort
Дата
Msg-id CAB7nPqQMEOSXkVK75C=Z-kWbrWbtamA-BSQ7c=9cSV4AgTU7Sg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Sort  (Noah Misch <noah@leadboat.com>)
Ответы Re: Parallel Sort
Список pgsql-hackers
On Tue, May 14, 2013 at 11:59 PM, Noah Misch <noah@leadboat.com> wrote:
On Tue, May 14, 2013 at 01:51:42PM +0900, Michael Paquier wrote:
> On Mon, May 13, 2013 at 11:28 PM, Noah Misch <noah@leadboat.com> wrote:
>
> > * Identifying Parallel-Compatible Functions
> >
> > Not all functions can reasonably run on a worker backend.  We should not
> > presume that a VOLATILE function can tolerate the unstable execution order
> > imposed by parallelism, though a function like clock_timestamp() is
> > perfectly
> > reasonable to run that way.  STABLE does not have that problem, but neither
> > does it constitute a promise that the function implementation is compatible
> > with parallel execution.  Consider xid_age(), which would need code
> > changes to
> > operate correctly in parallel.  IMMUTABLE almost guarantees enough; there
> > may
> > come a day when all IMMUTABLE functions can be presumed parallel-safe.  For
> > now, an IMMUTABLE function could cause trouble by starting a (read-only)
> > subtransaction.  The bottom line is that parallel-compatibility needs to be
> > separate from volatility classes for the time being.
> >
> I am not sure that this problem is only limited to functions, but to all
> the expressions
> and clauses of queries that could be shipped and evaluated on the worker
> backends when
> fetching tuples that could be used to accelerate a parallel sort. Let's
> imagine for example
> the case of a LIMIT clause that can be used by worker backends to limit the
> number of tuples
> to sort as final result.

It's true that the same considerations apply to other plan tree constructs;
however, every such construct is known at build time, so we can study each one
and decide how it fits with parallelism.
The concept of clause parallelism for backend worker is close to the concept of clause shippability introduced in Postgres-XC. In the case of XC, the equivalent of the master backend is a backend located on a node called Coordinator that merges and organizes results fetched in parallel from remote nodes where data scans occur (on nodes called Datanodes). The backends used for tuple scans across Datanodes share the same data visibility as they use the same snapshot and transaction ID as the backend on Coordinator. This is different from the parallelism as there is no idea of snapshot import to worker backends.

However, the code in XC planner used for clause shippability evaluation is definitely worth looking at just considering the many similarities it shares with parallelism when evaluating if a given clause can be executed on a worker backend or not. It would be a waste to implement twice the same thing is there is code already available.
 
Since functions are user-definable, it's preferable to reason about classes of functions.
Yes. You are right.
--
Michael

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: PostgreSQL 9.3 beta breaks some extensions "make install"
Следующее
От: Mark Kirkwood
Дата:
Сообщение: Re: [GENERAL] autoanalyze criteria