Re: [HACKERS] Hash Functions

Поиск
Список
Период
Сортировка
От amul sul
Тема Re: [HACKERS] Hash Functions
Дата
Msg-id CAAJ_b96jYefFEXUgVqhLbNh=fM9NfXcx6RLktr9nZC-WNvsucg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Hash Functions  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Hash Functions  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, Aug 30, 2017 at 9:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 30, 2017 at 10:43 AM, amul sul <sulamul@gmail.com> wrote:
> Thanks for the suggestion, I have updated 0002-patch accordingly.
> Using this I found some strange behaviours as follow:
>
> 1) standard and extended0 output for the jsonb_hash case is not same.
> 2) standard and extended0 output for the hash_range case is not same when
> input
>    is int4range(550274, 1550274)  other case in the patch are fine. This can
> be
>    reproducible with other values as well, for e.g. int8range(1570275,
> 208112489).
>
> Will look into this tomorrow.

Those sound like bugs in your patch.  Specifically:

+    /* Same approach as hash_range */
+    result = hash_uint32_extended((uint32) flags, seed);
+    result ^= lower_hash;
+    result = (result << 1) | (result >> 63);
+    result ^= upper_hash;
Yes, you are correct.​
 

That doesn't give compatible results.  The easiest thing to do might
be to rotate the high 32 bits and the low 32 bits separately.
JsonbHashScalarValueExtended has the same problem.  Maybe create a
helper function that does something like this (untested):

((x << 1) & UINT64COUNT(0xfffffffefffffffe)) | ((x >> 31) &
UINT64CONST(0x100000001))
This working as expected,  also tested by executing the following SQL multiple times: 

SELECT v as value, hash_range(v)::bit(32) as standard,
       hash_range_extended(v, 0)::bit(32) as extended0,
       hash_range_extended(v, 1)::bit(32) as extended1
FROM   (VALUES (int8range(floor(random() * 100)::int8, floor(random() * 1000)::int8)),
        (int8range(floor(random() * 1000)::int8, floor(random() * 10000)::int8)),
        (int8range(floor(random() * 10000)::int8, floor(random() * 100000)::int8)),
         (int8range(floor(random() * 10000000)::int8, floor(random() * 100000000)::int8)),
         (int8range(floor(random() * 100000000)::int8, floor(random() * 1000000000)::int8))) x(v)
WHERE  hash_range(v)::bit(32) != hash_range_extended(v, 0)::bit(32)
       OR hash_range(v)::bit(32) = hash_range_extended(v, 1)::bit(32);  ​

 
> Another case which I want to discuss is, extended and standard version of
> hashfloat4, hashfloat8 & hash_numeric function will have the same output for
> zero
> value irrespective of seed value. Do you think we need a fix for this?

Yes, I think you should return the seed rather than 0 in the cases
where the current code hard-codes a 0 return.  That will give the same
behavior in the seed == 0 case while cheaply getting us something a
bit different when there is a seed.

​Fixed in the attached version.​

 
BTW, you should probably invent and use a PG_RETURN_UINT64 macro in this patch.

Added i
n the attached version
​.​

Thanks for your all the suggestions.

Regards,
Amul
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] log_destination=file
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: [HACKERS] pgbench tap tests & minor fixes.