Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept
Дата
Msg-id CAPpHfdt+ugWjBj-1GxHQpSsWGug2dLqQRpKm282xs=z0tu_Y2w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept  (Dmitry Dolgov <9erthalion6@gmail.com>)
Список pgsql-hackers
On Thu, Nov 29, 2018 at 3:44 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> > On Fri, Aug 24, 2018 at 5:53 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
> >
> > Given I've no feedback on this idea yet, I'll try to implement a PoC
> > patch for that.  It doesn't look to be difficult.  And we'll see how
> > does it work.
>
> Unfortunately, current version of the patch doesn't pass the tests and fails on
> initdb. But maybe you already have this PoC you were talking about, that will
> also incorporate the feedback from this thread? For now I'll move it to the
> next CF.

Finally, I managed to write a PoC.

If look at the list of problems I've enumerated in [1], this PoC is
aimed for 1 and 3.

> 1) Data corruption on file truncation error (explained in [1]).
> 2) Expensive scanning of the whole shared buffers before file truncation.
> 3) Cancel of read-only queries on standby even if hot_standby_feedback
> is on, caused by replication of AccessExclusiveLock.

2 is pretty independent problem and could be addressed later.

Basically, this patch does following:
1. Introduces new flag BM_DIRTY_BARRIER, which prevents dirty buffer
from being written out.
2. Implements two-phase truncation of node buffers.  First phase is
prior to file truncation and marks past truncation point dirty buffers
as BM_DIRTY_BARRIER.  Second phase is post file truncation and
actually wipes out past truncation point buffers.
3. On exception happen during file truncation, BM_DIRTY_BARRIER flag
will be released from buffers.  Thus, no data corruption should
happens here.  If file truncation was partially complete, then file
might be extended by write of dirty buffer.  I'm not sure how likely
is it, but extension could lead to the errors again.  But this still
shouldn't cause a data corruption.
4. Having too many buffers marked as BM_DIRTY_BARRIER, would paralyze
buffer manager.  This is why we're keeping not more than NBuffers/2 to
be marked as BM_DIRTY_BARRIER.  If limit is exceeded, then dirty
buffers are just written at the first phase.
5. lazy_truncate_heap() now takes ExclusiveLock instead of
AccessExclusiveLock.  This part is not really complete.  At least, we
need to ensure that past truncation point reads, caused by real-only
queries concurrent to truncation, don't lead to real errors.

Any thoughts?

1. https://www.postgresql.org/message-id/CAPpHfdtD3U2DpGZQJNe21s9s1s-Va7NRNcP1isvdCuJxzYypcg%40mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: translatability tweaks
Следующее
От: Robert Haas
Дата:
Сообщение: Re: pg_dump is broken for partition tablespaces