Re: Does PostgreSQL check database integrity at startup?
От | Edson Carlos Ericksson Richter |
---|---|
Тема | Re: Does PostgreSQL check database integrity at startup? |
Дата | |
Msg-id | 2bdeab7c-dc99-eb60-84d7-2d47f3c9f180@simkorp.com.br обсуждение исходный текст |
Ответ на | Re: Does PostgreSQL check database integrity at startup? (Stephen Frost <sfrost@snowman.net>) |
Список | pgsql-general |
Em 28/12/2017 16:26, Stephen Frost escreveu: > Greetings Brent, > > * Brent Wood (pcreso@yahoo.com) wrote: >> A tool to calculate a checksum of sorts based on the table (file) content would provide a better surety of duplicationthan simply checking file size - like differently vacuumed tables in each copy could have the same content butbe different file sizes. > PG has support for checksums and there are tools out there to validate > that the checksum is correct for all pages which have one, but that > wouldn't help in this case because the file is zero'd out (and a zero'd > out file is actually a valid file in a PG data directory). > > Also, the files on the primary and the replica actually can be different > when looked at with a complete-file checksum due to hint bits being set > differently (at least, possibly other ways too). That doesn't make them > invalid or incorrect though. > > Rather than trying to compare a running primary to a running replica, > such a check to verify that the files backed up during a backup have the > same checksum as the files being restored from that backup can be done, > and that *is* done in at least some of the PG backup tools already > (pgBackRest has an independent manifest that it stores for each backup > which contains the checksum of each file as-backed-up, and it verifies > that checksum when performing a restore to make sure that the backed up > file wasn't corrupted in place, other tools hopefully have similar). > >> I do wonder though - given the order of records in a table (file) is not necessarily identical (or is it?) event thismay be problematic. Perhaps a checksum based on the result of a query output ordered by primary key could work? > The order of records in a *file* should be the same in the heap on the > primary as they are on the replica, but that doesn't mean the contents > of those files will be exactly the same (as mentioned above, hint bits > can differ). We used to have cases where the indexes could also be > different, but I believe that was changed so they should match. > > I've used the approach of doing a checksum across the results of an > ordered query to compare between systems and that generally does work, > but it's a bit tricky if you're trying to compare a table that's heavily > modified- you need to determine the point in the WAL stream that you're > at on the primary when you run the query and then replay the replica to > that point in the WAL and then run the query on the replica, otherwise > you could end up with differences that are just because of the ongoing > transactions being run to update the table. > > Thanks! > > Stephen Stephen, thanks for you detailed reasoning on why this would not be so simple. I'll take all of that in consideration. I hope I find something that gives a bit more confidence that my replicas are enought reliable. At this moment, I just don't trust them. Regards, Edson
В списке pgsql-general по дате отправления:
Предыдущее
От: Edson Carlos Ericksson RichterДата:
Сообщение: Re: Does PostgreSQL check database integrity at startup?