Re: Does PostgreSQL check database integrity at startup?

Поиск
Список
Период
Сортировка
От Edson Carlos Ericksson Richter
Тема Re: Does PostgreSQL check database integrity at startup?
Дата
Msg-id 2bdeab7c-dc99-eb60-84d7-2d47f3c9f180@simkorp.com.br
обсуждение исходный текст
Ответ на Re: Does PostgreSQL check database integrity at startup?  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-general
Em 28/12/2017 16:26, Stephen Frost escreveu:
> Greetings Brent,
>
> * Brent Wood (pcreso@yahoo.com) wrote:
>> A tool to calculate a checksum of sorts based on the table (file) content would provide a better surety of
duplicationthan simply checking file size - like differently vacuumed tables in each copy could have the same content
butbe different file sizes.
 
> PG has support for checksums and there are tools out there to validate
> that the checksum is correct for all pages which have one, but that
> wouldn't help in this case because the file is zero'd out (and a zero'd
> out file is actually a valid file in a PG data directory).
>
> Also, the files on the primary and the replica actually can be different
> when looked at with a complete-file checksum due to hint bits being set
> differently (at least, possibly other ways too).  That doesn't make them
> invalid or incorrect though.
>
> Rather than trying to compare a running primary to a running replica,
> such a check to verify that the files backed up during a backup have the
> same checksum as the files being restored from that backup can be done,
> and that *is* done in at least some of the PG backup tools already
> (pgBackRest has an independent manifest that it stores for each backup
> which contains the checksum of each file as-backed-up, and it verifies
> that checksum when performing a restore to make sure that the backed up
> file wasn't corrupted in place, other tools hopefully have similar).
>
>> I do wonder though - given the order of records in a table (file) is not necessarily identical (or is it?) event
thismay be problematic. Perhaps a checksum based on the result of a query output ordered by primary key could work?
 
> The order of records in a *file* should be the same in the heap on the
> primary as they are on the replica, but that doesn't mean the contents
> of those files will be exactly the same (as mentioned above, hint bits
> can differ).  We used to have cases where the indexes could also be
> different, but I believe that was changed so they should match.
>
> I've used the approach of doing a checksum across the results of an
> ordered query to compare between systems and that generally does work,
> but it's a bit tricky if you're trying to compare a table that's heavily
> modified- you need to determine the point in the WAL stream that you're
> at on the primary when you run the query and then replay the replica to
> that point in the WAL and then run the query on the replica, otherwise
> you could end up with differences that are just because of the ongoing
> transactions being run to update the table.
>
> Thanks!
>
> Stephen

Stephen, thanks for you detailed reasoning on why this would not be so 
simple.

I'll take all of that in consideration. I hope I find something that 
gives a bit more confidence that my replicas are enought reliable. At 
this moment, I just don't trust them.

Regards,

Edson


В списке pgsql-general по дате отправления:

Предыдущее
От: Edson Carlos Ericksson Richter
Дата:
Сообщение: Re: Does PostgreSQL check database integrity at startup?
Следующее
От: Paul A Jungwirth
Дата:
Сообщение: Temporal Foreign Keys