Обсуждение: Online enabling of checksums

Поиск

Список

Период

Сортировка

Online enabling of checksums

От

Magnus Hagander

Дата:

22 февраля 2018 г., 02:53:31

Once more, here is an attempt to solve the problem of on-line enabling of checksums that me and Daniel have been hacking on for a bit. See for example https://www.postgresql.org/message-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp%3D-7OJWBbcg%40mail.gmail.com and https://www.postgresql.org/message-id/flat/FF393672-5608-46D6-9224-6620EC532693%40endpoint.com#FF393672-5608-46D6-9224-6620EC532693@endpoint.com for some previous discussions.
Base design:
Change the checksum flag to instead of on and off be an enum. off/inprogress/on. When checksums are off and on, they work like today. When checksums are in progress, checksums are *written* but not verified. State can go from “off” to “inprogress”, from “inprogress” to either “on” or “off”, or from “on” to “off”.
Two new functions are added, pg_enable_data_checksums() and pg_disable_data_checksums(). The disable one is easy -- it just changes to disable. The enable one will change the state to inprogress, and then start a background worker (the “checksumhelper launcher”). This worker in turn will start one sub-worker (“checksumhelper worker”) in each database (currently all done sequentially). This worker will enumerate all tables/indexes/etc in the database and validate their checksums. If there is no checksum, or the checksum is incorrect, it will compute a new checksum and write it out. When all databases have been processed, the checksum state changes to “on” and the launcher shuts down. At this point, the cluster has checksums enabled as if it was initdb’d with checksums turned on.
If the cluster shuts down while “inprogress”, the DBA will have to manually either restart the worker (by calling pg_enable_checksums()) or turn checksums off again. Checksums “in progress” only carries a cost and no benefit.
The change of the checksum state is WAL logged with a new xlog record. All the buffers written by the background worker are forcibly enabled full page writes to make sure the checksum is fully updated on the standby even if no actual contents of the buffer changed.
We’ve also included a small commandline tool, bin/pg_verify_checksums, that can be run against an offline cluster to validate all checksums. Future improvements includes being able to use the background worker/launcher to perform an online check as well. Being able to run more parallel workers in the checksumhelper might also be of interest.

The patch includes two sets of tests, an isolation test turning on checksums while one session is writing to the cluster and another is continuously reading, to simulate turning on checksums in a production database. There is also a TAP test which enables checksums with streaming replication turned on to test the new xlog record. The isolation test ran into the 1024 character limit of the isolation test lexer, with a separate patch and discussion at https://www.postgresql.org/message-id/8D628BE4-6606-4FF6-A3FF-8B2B0E9B43D0@yesql.se

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Online enabling of checksums

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения