Re: Inefficiency in parallel pg_restore with many tables

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Inefficiency in parallel pg_restore with many tables
Дата	16 июля 2023 г. 13:45:54
Msg-id	3787685.1689515154@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: Inefficiency in parallel pg_restore with many tables (Andrew Dunstan <andrew@dunslane.net>)
Ответы	Re: Inefficiency in parallel pg_restore with many tables
Список	pgsql-hackers

Дерево обсуждения

Andrew Dunstan <andrew@dunslane.net> writes:
> On 2023-07-15 Sa 13:47, Tom Lane wrote:
>> I wonder if we could replace the sorted ready-list with a priority heap,
>> although that might be complicated by the fact that pop_next_work_item
>> has to be capable of popping something that's not necessarily the
>> largest remaining job.  Another idea could be to be a little less eager
>> to sort the list every time; I think in practice scheduling wouldn't
>> get much worse if we only re-sorted every so often.

> Yeah, I think that last idea is reasonable. Something like if the number 
> added since the last sort is more than min(50, list_length/4) then sort. 
> That shouldn't be too invasive.

Actually, as long as we're talking about approximately-correct behavior:
let's make the ready_list be a priority heap, and then just make
pop_next_work_item scan forward from the array start until it finds an
item that's runnable per the lock heuristic.  If the heap root is
blocked, the next things we'll examine will be its two children.
We might pick the lower-priority of those two, but it's still known to
be higher priority than at least 50% of the remaining heap entries, so
it shouldn't be too awful as a choice.  The argument gets weaker the
further you go into the heap, but we're not expecting that having most
of the top entries blocked will be a common case.  (Besides which, the
priorities are pretty crude to begin with.)  Once selected, pulling out
an entry that is not the heap root is no problem: you just start the
sift-down process from there.

The main advantage of this over the only-sort-sometimes idea is that
we can guarantee that the largest ready item will always be dispatched
as soon as it can be (because it will be the heap root).  So cases
involving one big table (with big indexes) and a lot of little ones
should get scheduled sanely, which is the main thing we want this
algorithm to ensure.  With the other approach we can't really promise
much at all.

            regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Inefficiency in parallel pg_restore with many tables