Re: Re: parallel distinct union and aggregate support patch

Поиск

Список

Период

Сортировка

От	bucoo@sohu.com
Тема	Re: Re: parallel distinct union and aggregate support patch
Дата	28 октября 2020 г. 04:58:53
Msg-id	202010280958515947628@sohu.com обсуждение исходный текст
Ответ на	parallel distinct union and aggregate support patch ("bucoo@sohu.com" <bucoo@sohu.com>)
Список	pgsql-hackers

Дерево обсуждения

> On Tue, Oct 27, 2020 at 3:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

> >

> > On Fri, Oct 23, 2020 at 11:58 AM bucoo@sohu.com <bucoo@sohu.com> wrote:

> > >

> > > > Interesting idea. So IIUC, whenever a worker is scanning the tuple it

> > > > will directly put it into the respective batch(shared tuple store),

> > > > based on the hash on grouping column and once all the workers are

> > > > doing preparing the batch then each worker will pick those baches one

> > > > by one, perform sort and finish the aggregation. I think there is a

> > > > scope of improvement that instead of directly putting the tuple to the

> > > > batch what if the worker does the partial aggregations and then it

> > > > places the partially aggregated rows in the shared tuple store based

> > > > on the hash value and then the worker can pick the batch by batch. By

> > > > doing this way, we can avoid doing large sorts. And then this

> > > > approach can also be used with the hash aggregate, I mean the

> > > > partially aggregated data by the hash aggregate can be put into the

> > > > respective batch.

> > >

> > > Good idea. Batch sort suitable for large aggregate result rows,

> > > in large aggregate result using partial aggregation maybe out of memory,

> > > and all aggregate functions must support partial(using batch sort this is unnecessary).

> > >

> > > Actually i written a batch hash store for hash aggregate(for pg11) like this idea,

> > > but not write partial aggregations to shared tuple store, it's write origin tuple and hash value

> > > to shared tuple store, But it's not support parallel grouping sets.

> > > I'am trying to write parallel hash aggregate support using batch shared tuple store for PG14,

> > > and need support parallel grouping sets hash aggregate.

> >

> > I was trying to look into this patch to understand the logic in more

> > detail. Actually, there are no comments at all so it's really hard to

> > understand what the code is trying to do.

> >

> > I was reading the below functions, which is the main entry point for

> > the batch sort.

> >

> > +static TupleTableSlot *ExecBatchSortPrepare(PlanState *pstate)

> > +{

> > ...

> > + for (;;)

> > + {

> > ...

> > + tuplesort_puttupleslot(state->batches[hash%node->numBatches], slot);

> > + }

> > +

> > + for (i=node->numBatches;i>0;)

> > + tuplesort_performsort(state->batches[--i]);

> > +build_already_done_:

> > + if (parallel)

> > + {

> > + for (i=node->numBatches;i>0;)

> > + {

> > + --i;

> > + if (state->batches[i])

> > + {

> > + tuplesort_end(state->batches[i]);

> > + state->batches[i] = NULL;

> > + }

> >

> > I did not understand this part, that once each worker has performed

> > their local batch-wise sort why we are clearing the baches? I mean

> > individual workers have their on batches so eventually they supposed

> > to get merged. Can you explain this part and also it will be better

> > if you can add the comments.

> I think I got this, IIUC, each worker is initializing the shared

> short and performing the batch-wise sorting and we will wait on a

> barrier so that all the workers can finish with their sorting. Once

> that is done the workers will coordinate and pick the batch by batch

> and perform the final merge for the batch.

Yes, it is. Each worker open the shared sort as "worker" (nodeBatchSort.c:134),

end of all worker performing, pick one batch and open it as "leader"(nodeBatchSort.c:54).

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tomas Vondra
Дата: 28 октября 2020 г., 04:43:19
Сообщение: Re: Patch to fix FK-related selectivity estimates with constants

Следующее

От: Amit Kapila
Дата: 28 октября 2020 г., 05:34:31
Сообщение: Re: Resetting spilled txn statistics in pg_stat_replication

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Re: parallel distinct union and aggregate support patch

Предыдущее

Следующее