Re: [DESIGN] Incremental checksums

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [DESIGN] Incremental checksums
Дата
Msg-id CAA4eK1+bVuact-dhUv1igSXe2E03QV+xw-j_ReO6FLssjd0bvQ@mail.gmail.com
обсуждение исходный текст
Ответ на [DESIGN] Incremental checksums  (David Christensen <david@endpoint.com>)
Ответы Re: [DESIGN] Incremental checksums  (Andres Freund <andres@anarazel.de>)
Re: [DESIGN] Incremental checksums  (David Christensen <david@endpoint.com>)
Список pgsql-hackers
On Tue, Jul 14, 2015 at 1:56 AM, David Christensen <david@endpoint.com> wrote:

>
> For any relation that it finds in the database which is not checksummed, it starts an actual worker to handle the checksum process for this table.  Since the state of the cluster is already either "enforcing" or "revalidating", any block writes will get checksums added automatically, so the only thing the bgworker needs to do is load each block in the relation and explicitly mark as dirty (unless that's not required for FlushBuffer() to do its thing).  After every block in the relation is visited this way and checksummed, its pg_class record will have "rellastchecksum" updated.
>

If during scan of a relation, after doing checksum for half of the
blocks in relation, system crashes, then in the above scheme a
restart would need to again read all the blocks even though some
of the blocks are already checksummed in previous cycle, this is
okay if it happens for few small or medium size relations, but assume
it happens when multiple large size relations are at same state
(half blocks are checksummed) when the crash occurs, then it could
lead to much more IO than required.

> ** Function API:
>
> Interface to the functionality will be via the following Utility functions:
>
>   - pg_enable_checksums(void) => turn checksums on for a cluster.  Will error if the state is anything but "disabled".  If this is the first time this cluster has run this, this will initialize ControlFile->data_checksum_version to the preferred built-in algorithm (since there's only one currently, we just set it to 1).  This increments the ControlFile->data_checksum_cycle variable, then sets the state to "enabling", which means that the next time the bgworker checks if there is anything to do it will see that state,  scan all the databases' "datlastchecksum" fields, and start kicking off the bgworker processes to handle the checksumming of the actual relation files.
>
>   - pg_disable_checksums(void) => turn checksums off for a cluster.  Sets the state to "disabled", which means bg_worker will not do anything.
>
>   - pg_request_checksum_cycle(void) => if checksums are "enabled", increment the data_checksum_cycle counter and set the state to "enabling".
>

If the cluster is already enabled for checksums, then what is
the need for any other action?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kouhei Kaigai
Дата:
Сообщение: Re: ctidscan as an example of custom-scan (Re: [v9.5] Custom Plan API)
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Could be improved point of UPSERT