Re: BRIN summarization vs. WAL logging

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: BRIN summarization vs. WAL logging
Дата
Msg-id CA+TgmoYMeKNK6JkfE9Z3T3bsB5A+7i0eFWPVN7U9xPA-y-JR1A@mail.gmail.com
обсуждение исходный текст
Ответ на BRIN summarization vs. WAL logging  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: BRIN summarization vs. WAL logging  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
On Tue, Jan 25, 2022 at 10:12 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> 1) brin_desummarize_range()
>
> But if the cluster/VM/... crashes right after you ran the function (and
> it completed just fine, possibly even in an explicit transaciton), that
> change will get lost. Not really a serious data corruption/loss, and you
> can simply run it again, but IMHO rather surprising.

I don't have a big problem with inserting an XLogFlush() here, but I
also don't think there's a hard and fast rule that maintenance
operations have to have full transactional behavior. Because that's
violated in various different ways by various different DDL commands.
CREATE INDEX CONCURRENTLY can leave detritus around if it fails. At
least one variant of ALTER TABLE does a non-MVCC-aware rewrite of the
table contents. COPY FREEZE violates MVCC. Concurrent partition attach
and detach aren't fully serializable with concurrent transactions.
Years ago TRUNCATE didn't have MVCC semantics. We often make small
compromises in these areas for implementation simplicity or other
benefits.

> 2) brin_summarize_range()
>
> Now, the issue I think is more serious, more likely to happen, and
> harder to fix. When summarizing a range, we write two WAL records:
>
> INSERT heapBlk 2 pagesPerRange 2 offnum 2, blkref #0: rel 1663/63 ...
> SAMEPAGE_UPDATE offnum 2, blkref #0: rel 1663/63341/73957 blk 2
>
> So, what happens if we lost the second WAL record, e.g. due to a crash?

Ouch. As you say, XLogFlush() won't fix that. The WAL logging scheme
needs to be redesigned.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: refactoring basebackup.c
Следующее
От: Andres Freund
Дата:
Сообщение: Re: slowest tap tests - split or accelerate?