Re: Online enabling of checksums

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: Online enabling of checksums
Дата
Msg-id CABUevEzhkmkNCHzQ_MuuqmmXXNLbEu1P08URuoG3uCrnBg6MgA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Online enabling of checksums  (Andres Freund <andres@anarazel.de>)
Ответы Re: Online enabling of checksums  (Michael Banck <michael.banck@credativ.de>)
Re: Online enabling of checksums  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers


On Sat, Apr 7, 2018 at 6:26 AM, Andres Freund <andres@anarazel.de> wrote:
On 2018-04-06 17:59:28 -0700, Andres Freund wrote:
> +     /*
> +      * Create a database list.  We don't need to concern ourselves with
> +      * rebuilding this list during runtime since any database created after
> +      * this process started will be running with checksums turned on from the
> +      * start.
> +      */
>
> Why is this true? What if somebody runs CREATE DATABASE while the
> launcher / worker are processing a different database? It'll copy the
> template database on the filesystem level, and it very well might not
> yet have checksums set?  Afaict the second time we go through this list
> that's not cought.

*caught

It's indeed trivial to reproduce this, just slowing down a checksum run
and copying the database yields:
./pg_verify_checksums -D /srv/dev/pgdev-dev
pg_verify_checksums: checksum verification failed in file "/srv/dev/pgdev-dev/base/16385/2703", block 0: calculated checksum 45A7 but expected 0
pg_verify_checksums: checksum verification failed in file "/srv/dev/pgdev-dev/base/16385/2703", block 1: calculated checksum 8C7D but expected 0



further complaints:

The new isolation test cannot be re-run on an existing cluster. That's
because the first test expects isolationtests to be disabled. As even
remarked upon:
# The checksum_enable suite will enable checksums for the cluster so should
# not run before anything expecting the cluster to have checksums turned off

How's that ok? You can leave database wide objects around, but the
cluster-wide stuff needs to be cleaned up.


The tests don't actually make sure that no checksum launcher / apply is
running anymore. They just assume that it's gone once the GUC shows
checksums have been set.  If you wanted to make the tests stable, you'd
need to wait for that to show true *and* then check that no workers are
around anymore.


If it's not obvious: This isn't ready, should be reverted, cleaned up,
and re-submitted for v12.

While I do think that it's still definitely fixable in time for 11, I won't argue for it.Will revert.

Note however that I'm sans-laptop until Sunday, so I will revert it then or possibly Monday. 

--

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] path toward faster partition pruning
Следующее
От: Gaetano Mendola
Дата:
Сообщение: Re: Corrupted data due to system power failure