Re: parallel distinct union and aggregate support patch

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: parallel distinct union and aggregate support patch
Дата
Msg-id CAFiTN-s85CsefWxZnm=X7bh+unMdUng4XBOx7Zgpd1HFGd2fXA@mail.gmail.com
обсуждение исходный текст
Ответ на parallel distinct union and aggregate support patch  ("bucoo@sohu.com" <bucoo@sohu.com>)
Ответы Re: parallel distinct union and aggregate support patch  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Mon, Oct 19, 2020 at 8:19 PM bucoo@sohu.com <bucoo@sohu.com> wrote:
>
> Hi hackers,
> I write a path for soupport parallel distinct, union and aggregate using batch sort.
> steps:
>  1. generate hash value for group clauses values, and using mod hash value save to batch
>  2. end of outer plan, wait all other workers finish write to batch
>  3. echo worker get a unique batch number, call tuplesort_performsort() function finish this batch sort
>  4. return row for this batch
>  5. if not end of all batchs, got step 3
>
> BatchSort paln make sure same tuple(group clause) return in same range, so Unique(or GroupAggregate) plan can work.

Interesting idea.  So IIUC, whenever a worker is scanning the tuple it
will directly put it into the respective batch(shared tuple store),
based on the hash on grouping column and once all the workers are
doing preparing the batch then each worker will pick those baches one
by one, perform sort and finish the aggregation.  I think there is a
scope of improvement that instead of directly putting the tuple to the
batch what if the worker does the partial aggregations and then it
places the partially aggregated rows in the shared tuple store based
on the hash value and then the worker can pick the batch by batch.  By
doing this way, we can avoid doing large sorts.  And then this
approach can also be used with the hash aggregate, I mean the
partially aggregated data by the hash aggregate can be put into the
respective batch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: [Patch] Optimize dropping of relation buffers using dlist
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Enumize logical replication message actions