Re: archive status ".ready" files may be created too early

Поиск
Список
Период
Сортировка
От Bossart, Nathan
Тема Re: archive status ".ready" files may be created too early
Дата
Msg-id EFF40306-8E8A-4259-B181-C84F3F06636C@amazon.com
обсуждение исходный текст
Ответ на Re: archive status ".ready" files may be created too early  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Ответы Re: archive status ".ready" files may be created too early  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
Apologies for the long delay.

I've spent a good amount of time thinking about this bug and trying
out a few different approaches for fixing it.  I've attached a work-
in-progress patch for my latest attempt.

On 10/13/20, 5:07 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
>           F0        F1
>         AAAAA  F  BBBBB
> |---------|---------|---------|
>    seg X    seg X+1   seg X+2
>
> Matsumura-san has a concern about the case where there are two (or
> more) partially-flushed segment-spanning records at the same time.
>
> This patch remembers only the last cross-segment record. If we were
> going to flush up to F0 after Record-B had been written, we would fail
> to hold-off archiving seg-X. This patch is based on a assumption that
> that case cannot happen because we don't leave a pending page at the
> time of segment switch and no records don't span over three or more
> segments.

I wonder if these are safe assumptions to make.  For your example, if
we've written record B to the WAL buffers, but neither record A nor B
have been written to disk or flushed, aren't we still in trouble?
Also, is there actually any limit on WAL record length that means that
it is impossible for a record to span over three or more segments?
Perhaps these assumptions are true, but it doesn't seem obvious to me
that they are, and they might be pretty fragile.

The attached patch doesn't make use of these assumptions.  Instead, we
track the positions of the records that cross segment boundaries in a
small hash map, and we use that to determine when it is safe to mark a
segment as ready for archival.  I think this approach resembles
Matsumura-san's patch from June.

As before, I'm not handling replication, archive_timeout, and
persisting latest-marked-ready through crashes yet.  For persisting
the latest-marked-ready segment through crashes, I was thinking of
using a new file that stores the segment number.

Nathan


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Naylor
Дата:
Сообщение: Re: cutting down the TODO list thread
Следующее
От: Tom Lane
Дата:
Сообщение: Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)