Обсуждение: [PATCH] Add crc32(text) & crc32(bytea)
Hi, While answering one of the recent questions [1] I wanted to use crc32(text) and discovered that it's missing out-of-the box. Of course, one can use `substr(md5(x), 1, 8)` with almost the same effect but it's less convenient and could be slower (I didn't do actual benchmarks though). Also it's incompatible with third-party software that may calculate crc32's and store the results in PostgreSQL. I vaguely recall that I faced this problem before. Supporting crc32 was requested on the mailing list [2] and a number of workarounds exist in PL/pgSQL [3][4]. Since there seems to be a demand and it costs us nothing to maintain crc32() I suggest adding it. The proposed patch exposes our internal crc32 implementation to the user. I chose to return a hex string similarly to md5(). In my humble experience this is most convenient in practical use. However if the majority believes that the function should return a bigint (in order to fit an unsigned int32) or a bytea (as SHA* functions do), I'm fine with whatever consensus the community reaches. [1]: https://www.postgresql.org/message-id/CAJ7c6TOurV4uA5Yz%3DaJ-ae4czL_zdFNqxbu47eyVrYFefrWoog%40mail.gmail.com [2]: https://www.postgresql.org/message-id/flat/auto-000557707157%40umail.ru [3]: https://stackoverflow.com/questions/28179335/crc32-function-with-pl-pgsql [4]: https://gist.github.com/cuber/bcf0a3a96fc9a790d96d -- Best regards, Aleksander Alekseev
Вложения
On Thu, Jul 18, 2024 at 02:24:23PM +0300, Aleksander Alekseev wrote: > I vaguely recall that I faced this problem before. Supporting crc32 > was requested on the mailing list [2] and a number of workarounds > exist in PL/pgSQL [3][4]. Since there seems to be a demand and it > costs us nothing to maintain crc32() I suggest adding it. This sounds generally reasonable to me, especially given the apparent demand. Should we also introduce crc32c() while we're at it? -- nathan
Hi, > This sounds generally reasonable to me, especially given the apparent > demand. Should we also introduce crc32c() while we're at it? Might be a good idea. However I didn't see a demand for crc32c() SQL function yet. Also I'm not sure whether the best interface for it would be crc32c() or crc32(x, version='c') or perhaps crc32(x, polinomial=...). I propose keeping the scope small this time. -- Best regards, Aleksander Alekseev
On Fri, Jul 26, 2024 at 12:01:40PM +0300, Aleksander Alekseev wrote: >> This sounds generally reasonable to me, especially given the apparent >> demand. Should we also introduce crc32c() while we're at it? > > Might be a good idea. However I didn't see a demand for crc32c() SQL > function yet. Also I'm not sure whether the best interface for it > would be crc32c() or crc32(x, version='c') or perhaps crc32(x, > polinomial=...). I propose keeping the scope small this time. I don't think adding crc32c() would sufficiently increase the scope. We'd use the existing implementations for both crc32() and crc32c(). And besides, this could be useful for adding tests for that code. + <function>crc32</function> ( <type>text</type> ) Do we need a version of the function that takes a text input? It's easy enough to cast to a bytea. + <returnvalue>text</returnvalue> My first reaction is that we should just have this return bytea like the SHA ones do, if for no other reason than commit 10cfce3 seems intended to move us away from returning text for these kinds of functions. Upthread, you mentioned the possibility of returning a bigint, too. I think I'd still prefer bytea in case we want to add, say, crc64() or crc16() in the future. That would allow us to keep all of these functions consistent instead of returning different types for each. However, I understand that returning the numeric types might be more convenient. I'm curious what others think about this. + Computes the CRC32 <link linkend="functions-hash-note">hash</link> of + the binary string, with the result written in hexadecimal. I'm not sure we should call the check values "hashes." Wikipedia does include them in the "List of hash functions" page [0], but it seems to deliberately avoid calling them hashes in the CRC page [1]. I'd suggest calling them "CRC32 values" instead. [0] https://en.wikipedia.org/wiki/List_of_hash_functions [1] https://en.wikipedia.org/wiki/Cyclic_redundancy_check -- nathan