Re: Avoiding hash join batch explosions with extreme skew and weird stats

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Avoiding hash join batch explosions with extreme skew and weird stats
Дата
Msg-id CAAKRu_Zn91PvgyzD111i99bkZ==7zcBKP-+VvLD38aiT5kaSNg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Avoiding hash join batch explosions with extreme skew and weird stats  (Melanie Plageman <melanieplageman@gmail.com>)
Ответы Re: Avoiding hash join batch explosions with extreme skew and weird stats  (Melanie Plageman <melanieplageman@gmail.com>)
Список pgsql-hackers

On Tue, Apr 28, 2020 at 7:03 PM Melanie Plageman <melanieplageman@gmail.com> wrote:

There is a deadlock hazard in parallel hashjoin (pointed out by Thomas
Munro in the past). Workers attached to the stripe_barrier emit tuples
and then wait on that barrier.
I believe that that can be addressed starting with this
relatively unoptimized solution:
- after probing a stripe in a batch, a worker sets the status of that
  batch to "tentatively done" and saves the stripe_barrier phase
- if that worker is not the only worker attached to that batch, it
  detaches from both stripe and batch barriers and moves on to other
  batches
- if that worker is the only worker attached to the batch, it will
  proceed to load the next stripe of that batch, and, once it has
  finished loading, it will set the status of the batch back to "not
  done" for itself
- when the other worker encounters that batch again, if the
  stripe_barrier phase has not moved forward, it will mark that batch as
  done for itself. if the stripe_barrier phase has moved forward, it can
  join in in probing this batch for the current stripe.

 
Just to follow-up on the stripe barrier deadlock, I've implemented a
solution and attached it.

There are three solutions I've thought about so far:

1)  leaders don't participate in fallback batches
2)  serial after stripe 0
      no worker can join a batch after any worker has left and only one
      worker can work on stripes after stripe 0
3)  provisionally complete batches
      After the end of stripe 0, all workers except the last worker
      detach from the stripe barrier, mark the batch as provisionally
      done, save the stripe barrier phase, and move on to another batch.
      Later, when one of these workers returns to the batch, if it is
      not already done, the worker checks to see if the phase of the
      stripe barrier has advanced. If the phase has advanced, it means
      that no one is waiting for that worker. The worker can join that
      batch. If the phase hasn't advanced, the worker won't risk
      deadlock and will simply mark the batch as done. The last worker
      executes the normal path -- participating in each stripe.

I've attached a patch to implement solution 3
v7-0002-Provisionally-detach-unless-last-worker.patch

This isn't a very optimized version of this solution. It detaches from
the stripe barrier and closes the outer match status bitmap upon
provisional completion by a worker. However, I ran into some problems
keeping outer match status bitmaps open for multiple batches at a time.

I've also attached the original adaptive hashjoin patch with a couple
small tweaks (not quite meriting a patch version bump, but that seemed
like the easiest way).

--
Melanie Plageman
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: stat() on Windows might cause error if target file is largerthan 4GB
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: Back-patch is necessary? Re: Don't try fetching future segment ofa TLI.