Re: pgsql: Validate page level checksums in base backups

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: pgsql: Validate page level checksums in base backups
Дата
Msg-id f23e92ec-118d-e6ea-0c81-876ad62a588a@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: pgsql: Validate page level checksums in base backups  (Magnus Hagander <magnus@hagander.net>)
Ответы Re: pgsql: Validate page level checksums in base backups  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: pgsql: Validate page level checksums in base backups  (David Steele <david@pgmasters.net>)
Список pgsql-hackers
Hi,

I think there's a bug in sendFile(). We do check checksums on all pages
that pass this LSN check:

    /*
     * Only check pages which have not been modified since the
     * start of the base backup. Otherwise, they might have been
     * written only halfway and the checksum would not be valid.
     * However, replaying WAL would reinstate the correct page in
     * this case.
     */
    if (PageGetLSN(page) < startptr)
    {
        ...
    }

Now, imagine the page is new, i.e. all-zeroes. That means the LSN is 0/0
too, and we'll try to verify the checksum - but we actually do not set
checksums on empty pages.

So I think it should be something like this:

    if ((!PageIsNew(page)) && (PageGetLSN(page) < startptr))
    {
        ...
    }

It might be worth verifying that the page is actually all-zeroes (and
not just with corrupted pd_upper value. Not sure it's worth it.

I've found this by fairly trivial stress testing - running pgbench and
pg_basebackup in a loop. It was failing pretty reliably (~75% of runs).
With the proposed change I see no further failures.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Chapman Flack
Дата:
Сообщение: Re: lazy detoasting
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] Runtime Partition Pruning