Re: parallel distinct union and aggregate support patch

Поиск

Список

Период

Сортировка

От	Dilip Kumar
Тема	Re: parallel distinct union and aggregate support patch
Дата	22 октября 2020 г. 12:08:03
Msg-id	CAFiTN-s85CsefWxZnm=X7bh+unMdUng4XBOx7Zgpd1HFGd2fXA@mail.gmail.com обсуждение исходный текст
Ответ на	parallel distinct union and aggregate support patch ("bucoo@sohu.com" <bucoo@sohu.com>)
Ответы	Re: parallel distinct union and aggregate support patch (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Mon, Oct 19, 2020 at 8:19 PM bucoo@sohu.com <bucoo@sohu.com> wrote:
>
> Hi hackers,
> I write a path for soupport parallel distinct, union and aggregate using batch sort.
> steps:
>  1. generate hash value for group clauses values, and using mod hash value save to batch
>  2. end of outer plan, wait all other workers finish write to batch
>  3. echo worker get a unique batch number, call tuplesort_performsort() function finish this batch sort
>  4. return row for this batch
>  5. if not end of all batchs, got step 3
>
> BatchSort paln make sure same tuple(group clause) return in same range, so Unique(or GroupAggregate) plan can work.

Interesting idea.  So IIUC, whenever a worker is scanning the tuple it
will directly put it into the respective batch(shared tuple store),
based on the hash on grouping column and once all the workers are
doing preparing the batch then each worker will pick those baches one
by one, perform sort and finish the aggregation.  I think there is a
scope of improvement that instead of directly putting the tuple to the
batch what if the worker does the partial aggregations and then it
places the partially aggregated rows in the shared tuple store based
on the hash value and then the worker can pick the batch by batch.  By
doing this way, we can avoid doing large sorts.  And then this
approach can also be used with the hash aggregate, I mean the
partially aggregated data by the hash aggregate can be put into the
respective batch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Kyotaro Horiguchi
Дата: 22 октября 2020 г., 11:50:36
Сообщение: Re: [Patch] Optimize dropping of relation buffers using dlist

Следующее

От: Kyotaro Horiguchi
Дата: 22 октября 2020 г., 12:16:48
Сообщение: Re: Enumize logical replication message actions

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: parallel distinct union and aggregate support patch

Предыдущее

Следующее