Re: Recovery performance of standby for multiple concurrent truncateson large tables

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: Recovery performance of standby for multiple concurrent truncateson large tables
Дата	30 июля 2018 г. 23:01:53
Msg-id	CA+TgmoaP5sPbTocvRdRsqx+rcXt_Bs5GVWJ1tKyRARh67J1G3w@mail.gmail.com обсуждение исходный текст
Ответ на	RE: Recovery performance of standby for multiple concurrenttruncates on large tables ("Jamison, Kirk" <k.jamison@jp.fujitsu.com>)
Ответы	Re: Recovery performance of standby for multiple concurrenttruncates on large tables Re: Recovery performance of standby for multiple concurrent truncateson large tables RE: Recovery performance of standby for multiple concurrenttruncates on large tables
Список	pgsql-hackers

Дерево обсуждения

On Mon, Jul 30, 2018 at 1:22 AM, Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:
> 1. Because the multiple scans of the whole shared buffer per concurrent truncate/drop table was the cause of the
time-consumingbehavior, DURING the failover process while WAL is being applied, we temporary delay the scanning and
invalidatingof shared buffers. At the same time, we remember the relations/relfilenodes (of dropped/truncated tables)
byadding them in a hash table called "skip list". 
> 2. After WAL is applied, the checkpoint(or bg writer) scans the shared buffer only ONCE, compare the pages against
theskip list, and invalidates the relevant pages. After deleting the relevant pages on the shared memory, it will not
bewritten back to the disk. 
>
> Assuming the theory works, this design will only affect the behavior of checkpointer (or maybe bg writer) during
recoveryprocess / failover. Any feedback, thoughts? 

How would this work if a relfilenode number that belonged to an old
relation got recycled for a new relation?

I think something like this could be made to work -- both on the
master and the standby, and not just while waiting for a failover --
if we did something like this:

(1) Limit the number of deferred drops to a reasonably small number
(one cache line?  1kB?).
(2) Background writer regularly does scans do clean out everything
from the deferred-drop list.
(3) If a foreground process finds the deferred-drop list is full when
it needs to add to it, it forces a clean-out of the list contents +
whatever new stuff it has in a single pass.
(4) If we are about generate a relfilenode that's in the list, we
either force a clean out or generate a different relfilenode instead.
(5) Every buffer cleaning operation checks the deferred-drop list and
invalidates without write-out if the buffer is found.

It's not clear to me whether it would be worth the overhead of doing
something like this.  Making relation drops faster at the cost of
making buffer cleaning slower could be a loser.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Recovery performance of standby for multiple concurrent truncateson large tables