Re: Online enabling of checksums

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: Online enabling of checksums
Дата
Msg-id CABUevExbt+fSHJHtZnTHOf5EiqQa+qDnNtiOdOq8pucyLT1K1A@mail.gmail.com
обсуждение исходный текст
Ответ на Online enabling of checksums  (Magnus Hagander <magnus@hagander.net>)
Список pgsql-hackers
Re-sending this one with proper formatting. Apologies for the horrible gmail-screws-up-the-text-part of the last one!


No change to patch or text, just the formatting.

//Magnus



Once more, here is an attempt to solve the problem of on-line enabling of checksums that me and Daniel have been hacking on for a bit. See for example https://www.postgresql.org/message-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp%3D-7OJWBbcg%40mail.gmail.com and https://www.postgresql.org/message-id/flat/FF393672-5608-46D6-9224-6620EC532693%40endpoint.com#FF393672-5608-46D6-9224-6620EC532693@endpoint.com for some previous discussions.


Base design:

Change the checksum flag to instead of on and off be an enum. off/inprogress/on. When checksums are off and on, they work like today. When checksums are in progress, checksums are *written* but not verified. State can go from “off” to “inprogress”, from “inprogress” to either “on” or “off”, or from “on” to “off”.


Two new functions are added, pg_enable_data_checksums() and pg_disable_data_checksums(). The disable one is easy -- it just changes to disable. The enable one will change the state to inprogress, and then start a background worker (the “checksumhelper launcher”). This worker in turn will start one sub-worker (“checksumhelper worker”) in each database (currently all done sequentially). This worker will enumerate all tables/indexes/etc in the database and validate their checksums. If there is no checksum, or the checksum is incorrect, it will compute a new checksum and write it out. When all databases have been processed, the checksum state changes to “on” and the launcher shuts down. At this point, the cluster has checksums enabled as if it was initdb’d with checksums turned on.


If the cluster shuts down while “inprogress”, the DBA will have to manually either restart the worker (by calling pg_enable_checksums()) or turn checksums off again. Checksums “in progress” only carries a cost and no benefit.


The change of the checksum state is WAL logged with a new xlog record. All the buffers written by the background worker are forcibly enabled full page writes to make sure the checksum is fully updated on the standby even if no actual contents of the buffer changed.


We’ve also included a small commandline tool, bin/pg_verify_checksums, that can be run against an offline cluster to validate all checksums. Future improvements includes being able to use the background worker/launcher to perform an online check as well. Being able to run more parallel workers in the checksumhelper might also be of interest.


The patch includes two sets of tests, an isolation test turning on checksums while one session is writing to the cluster and another is continuously reading, to simulate turning on checksums in a production database. There is also a TAP test which enables checksums with streaming replication turned on to test the new xlog record. The isolation test ran into the 1024 character limit of the isolation test lexer, with a separate patch and discussion at https://www.postgresql.org/message-id/8D628BE4-6606-4FF6-A3FF-8B2B0E9B43D0@yesql.se
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [bug fix] Cascaded standby cannot start after a clean shutdown
Следующее
От: David Rowley
Дата:
Сообщение: Re: [HACKERS] path toward faster partition pruning