Re: a funnel by any other name

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: a funnel by any other name
Дата
Msg-id CANP8+jK6SLnND6tGwNdpkw=h_SyoCt8Nd5521AOyA50M9NrNsg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: a funnel by any other name  (Nicolas Barbier <nicolas.barbier@gmail.com>)
Ответы Re: a funnel by any other name  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 17 September 2015 at 05:07, Nicolas Barbier <nicolas.barbier@gmail.com> wrote:
2015-09-17 Robert Haas <robertmhaas@gmail.com>:

> 1. Exchange Bushy
> 2. Exchange Inter-Operator (this is what's currently implemented)
> 3. Exchange Replicate
> 4. Exchange Merge
> 5. Interchange

> 1. ?
> 2. Gather
> 3. Broadcast (sorta)
> 4. Gather Merge
> 5. Redistribute

> 1. Parallel Child
> 2. Parallel Gather
> 3. Parallel Replicate
> 4. Parallel Merge
> 5. Parallel Redistribute

FYI, SQL Server has these in its execution plans:

* Distribute Streams: read from one thread, write to multiple threads
* Repartition Streams: both read and write from/to multiple threads
* Gather Streams: read from multiple threads, write to one thread

Robert, thanks for asking. We'll be stuck with these words for some time, user visible via EXPLAIN so this is important.

In general we should stick to words already used in other similar situations, which could include DBMS and parallel ETL tools, of which there are many more than mentioned here.

I would be against using any of these words: Funnel, Motion, Bushy because I don't find them very descriptive (I think of spiders, bowels and shrubs respectively, sorry).

These words are liable to confusion with other concepts: Replicate, Duplicate, Distribute, Partition, Repartition, MERGE.

I've seen this concept called Fan-In/Fan-Out and Scatter/Gather

The main operations are the 3 mentioned by Nicolas:
1. Send data from many to one - which has subtypes for Unsorted, Sorted and Evenly balanced (but unsorted)
2. Send data from one process to many
3. Send data from many to many

My preferences for this would be 
1. Gather (but not Gather Motion) e.g. Gather, Gather Sorted
2. Scatter (since Broadcast only makes sense in the context of a distributed query, it sounds weird for intra-node query)
3. Redistribution - which implies the description of how we spread data across nodes is "Distribution" (or DISTRIBUTED BY)

For 3 we should definitely use Redistribute, since this is what Teradata has been calling it for 30 years, which is where Greenplum got it from.
For 1, Gather makes most sense.

For 2, it could be either Scatter or Distribute. The former works well with Gather, the latter works well with Redistribute.

Sorry for my absence for further review on parallel ops.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Geoff Winkless
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Use gender-neutral language in documentation
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: row_security GUC, BYPASSRLS