Re: cyclical redundancy checksum algorithm(s)?

Поиск
Список
Период
Сортировка
От Karen Hill
Тема Re: cyclical redundancy checksum algorithm(s)?
Дата
Msg-id 1159393768.405555.28730@d34g2000cwd.googlegroups.com
обсуждение исходный текст
Ответ на Re: cyclical redundancy checksum algorithm(s)?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: cyclical redundancy checksum algorithm(s)?  (John Sidney-Woollett <johnsw@wardbrook.com>)
Re: cyclical redundancy checksum algorithm(s)?  (Teodor Sigaev <teodor@sigaev.ru>)
Re: cyclical redundancy checksum algorithm(s)?  (Jonathan Leffler <jleffler@earthlink.net>)
Re: cyclical redundancy checksum algorithm(s)?  ("David Portas" <REMOVE_BEFORE_REPLYING_dportas@acm.org>)
Re: cyclical redundancy checksum algorithm(s)?  ("Cimode" <cimode@hotmail.com>)
Список pgsql-general
Tom Lane wrote:
> "Karen Hill" <karen_hill22@yahoo.com> writes:
> > Ralph Kimball states that this is a way to check for changes.  You just
> > have an extra column for the crc checksum.  When you go to update data,
> > generate a crc checksum and compare it to the one in the crc column.
> > If they are same, your data has not changed.
>
> You sure that's actually what he said?  A change in CRC proves the data
> changed, but lack of a change does not prove it didn't.


On page 100 in the book, "The Data Warehouse Toolkit" Second Edition,
Ralph Kimball writes the following:

"Rather than checking each field to see if something has changed, we
instead compute a checksum for the entire row all at once.  A cyclic
redundancy checksum (CRC) algorithm helps us quickly recognize that a
wide messy row has changed without looking at each of its constituent
fields."

On page 360 he writes:

"To quickly determine if rows have changed, we rely on a cyclic
redundancy checksum (CRC) algorithm.   If the CRC is identical for the
extracted record and the most recent row in the master table, then we
ignore the extracted record.  We don't need to check every column to be
certain that the two rows match exactly."

>
> People do sometimes use this logic in connection with much wider
> "summary" functions, such as an MD5 hash.  I wouldn't trust it at all
> with a 32-bit CRC, and not much with a 64-bit CRC.  Too much risk of
> collision.
>


В списке pgsql-general по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: 'pg_ctl -w' times out when unix_socket_directory is set
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: 'pg_ctl -w' times out when unix_socket_directory is