Re: Enabling Checksums

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Enabling Checksums
Дата
Msg-id 5135BC52.3020609@vmware.com
обсуждение исходный текст
Ответ на Re: Enabling Checksums  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Enabling Checksums  (Jeff Davis <pgsql@j-davis.com>)
Re: Enabling Checksums  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Enabling Checksums  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On 04.03.2013 09:11, Simon Riggs wrote:
> On 3 March 2013 18:24, Greg Smith<greg@2ndquadrant.com>  wrote:
>
>> The 16-bit checksum feature seems functional, with two sources of overhead.
>> There's some CPU time burned to compute checksums when pages enter the
>> system.  And there's extra overhead for WAL logging hint bits.  I'll
>> quantify both of those better in another message.
>
> It's crunch time. Do you and Jeff believe this patch should be
> committed to Postgres core?
>
> Are there objectors?

In addition to my hostility towards this patch in general, there are 
some specifics in the patch I'd like to raise (read out in a grumpy voice):

If you enable checksums, the free space map never gets updated in a 
standby. It will slowly drift to be completely out of sync with reality, 
which could lead to significant slowdown and bloat after failover.

Since the checksums are an all-or-nothing cluster-wide setting, the 
three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and 
PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the 
code simpler, and leaves the bits free for future use. If we want to 
enable such per-page setting in the future, we can add it later. For a 
per-relation scheme, they're not needed.

> + * The checksum algorithm is a modified Fletcher 64-bit (which is
> + * order-sensitive). The modification is because, at the end, we have two
> + * 64-bit sums, but we only have room for a 16-bit checksum. So, instead of
> + * using a modulus of 2^32 - 1, we use 2^8 - 1; making it also resemble a
> + * Fletcher 16-bit. We don't use Fletcher 16-bit directly, because processing
> + * single bytes at a time is slower.

How does the error detection rate of this compare with e.g CRC-16? Is 
there any ill effect from truncating the Fletcher sums like this?

> +    /*
> +     * Store the sums as bytes in the checksum. We add one to shift the range
> +     * from 0..255 to 1..256, to make zero invalid for checksum bytes (which
> +     * seems wise).
> +     */
> +    p8Checksum[0] = (sum1 % 255) + 1;
> +    p8Checksum[1] = (sum2 % 255) + 1;

That's a bit odd. We don't avoid zero in the WAL crc, and I don't recall 
seeing that in other checksum implementations either. 16-bits is not 
very wide for a checksum, and this eats about 1% of the space of valid 
values.

I can see that it might be a handy debugging aid to avoid 0. But there's 
probably no need to avoid 0 in both bytes, it seems enough to avoid a 
completely zero return value.

XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN 
without a lock. That's not atomic, so it could incorrectly determine 
that a page doesn't need to be backed up. We used to always hold an 
exclusive lock on the buffer when it's called, which prevents 
modifications to the LSN, but that's no longer the case.

Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I 
think it will generate WAL records for unlogged tables as it is.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: 9.2.3 crashes during archive recovery
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Support for REINDEX CONCURRENTLY