Re: Cost of XLogInsert CRC calculations

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Cost of XLogInsert CRC calculations
Дата
Msg-id 1110239639.6117.197.camel@localhost.localdomain
обсуждение исходный текст
Ответ на Re: Cost of XLogInsert CRC calculations  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Cost of XLogInsert CRC calculations
Cost of XLogInsert CRC calculations
Re: Cost of XLogInsert CRC calculations
Список pgsql-hackers
On Mon, 2005-03-07 at 09:39 -0500, Tom Lane wrote:
> "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> > Wow, a 64-bit CRC does seem excessive, especially when going back to Zmodem
> > days where a 50-100k file seemed to be easily protected by a 32-bit CRC. I'm
> > sure there are some error rates somewhere dependent upon the polynomial and
> > the types of error detected.... Try the following link towards the bottom:
> > http://www.ee.unb.ca/tervo/ee4253/crc.htm for some theory on detection rates
> > vs. CRC size.
> 
> When the CRC size was decided, I recall someone arguing that it would
> really make a difference to have 1-in-2^64 chance of failure rather than
> 1-in-2^32.  I was dubious about this at the time, but didn't have any
> evidence showing that we shouldn't go for 64.  I suppose we ought to try
> the same example with a 32-bit CRC and see how much it helps.

I think some of the additional run-time may be coming from processor
stalls associated with some of the constants used in the CRC checks.
I'll come back with more info on that later.

Well, we're using the CRC in 3 separate places...
(1) for xlog records
(2) for complete blocks copied to xlog
(3) for control files

For (1), records are so short that probably CRC16 would be sufficient
without increasing the error rate noticeably.

I think I'd like to keep (3) at CRC64...its just too important. Plus
thats slightly less code to change.

My money is on (2) being the source of most of that run-time anyway,
since when we enclose a whole block it takes a lot longer to CRC64 all
BLCKSZ bytes than it would do to CRC a single record in (1). But of
course, longer stretches of data need better error detection rates.

If Ethernet is using CRC32, it seems somewhat strange to use anything
higher than that, seeing as we're very likely to be sending xlog files
across the net anyway. Packet size is mostly comparable to BLCKSZ isn't
it?

So, yes CRC32 seems more reasonable. 

One of the things I was thinking about was whether we could use up those
cycles more effectively. If we were to include a compression routine
before we calculated the CRC that would 
- reduce the size of the blocks to be written, hence reduce size of xlog
- reduce the following CRC calculation

I was thinking about using a simple run-length encoding to massively
shrink half-empty blocks with lots of zero padding, but we've already
got code to LZW the data down also.

Best Regards, Simon Riggs



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gaetano Mendola
Дата:
Сообщение: One vacuum full is not enough.
Следующее
От: Gaetano Mendola
Дата:
Сообщение: A bad plan