Re: Parallel Full Hash Join

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Parallel Full Hash Join
Дата
Msg-id CAAKRu_YX8PdxHbqV4wEk6v-ivMTwN+tcYG8AtTe3HT3gz=ewmA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel Full Hash Join  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: Parallel Full Hash Join  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
On Fri, Nov 26, 2021 at 3:11 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Sun, Nov 21, 2021 at 4:48 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > On Wed, Nov 17, 2021 at 01:45:06PM -0500, Melanie Plageman wrote:
> > > Yes, this looks like that issue.
> > >
> > > I've attached a v8 set with the fix I suggested in [1] included.
> > > (I added it to 0001).
> >
> > This is still crashing :(
> > https://cirrus-ci.com/task/6738329224871936
> > https://cirrus-ci.com/task/4895130286030848
>
> I added a core file backtrace to cfbot's CI recipe a few days ago, so
> now we have:
>
> https://cirrus-ci.com/task/5676480098205696
>
> #3 0x00000000009cf57e in ExceptionalCondition (conditionName=0x29cae8
> "BarrierParticipants(&accessor->shared->batch_barrier) == 1",
> errorType=<optimized out>, fileName=0x2ae561 "nodeHash.c",
> lineNumber=lineNumber@entry=2224) at assert.c:69
> No locals.
> #4 0x000000000071575e in ExecParallelScanHashTableForUnmatched
> (hjstate=hjstate@entry=0x80a60a3c8,
> econtext=econtext@entry=0x80a60ae98) at nodeHash.c:2224

I believe this assert can be safely removed.

It is possible for a worker to attach to the batch barrier after the
"last" worker was elected to scan and emit unmatched inner tuples. This
is safe because the batch barrier is already in phase PHJ_BATCH_SCAN
and this newly attached worker will simply detach from the batch
barrier and look for a new batch to work on.

The order of events would be as follows:

W1: advances batch to PHJ_BATCH_SCAN
W2: attaches to batch barrier in ExecParallelHashJoinNewBatch()
W1: calls ExecParallelScanHashTableForUnmatched() (2 workers attached to
barrier at this point)
W2: detaches from the batch barrier

The attached v10 patch removes this assert and updates the comment in
ExecParallelScanHashTableForUnmatched().

I'm not sure if I should add more detail about this scenario in
ExecParallelHashJoinNewBatch() under PHJ_BATCH_SCAN or if the detail in
ExecParallelScanHashTableForUnmatched() is sufficient.

- Melanie

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: row filtering for logical replication
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: Windows crash / abort handling