Re: hash function in Postgres

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: hash function in Postgres
Дата
Msg-id 54D04808.3040609@BlueTreble.com
обсуждение исходный текст
Ответ на Re: hash function in Postgres  (Adrian Klaver <adrian.klaver@aklaver.com>)
Список pgsql-general
On 1/24/15 9:03 AM, Adrian Klaver wrote:
> On 01/23/2015 10:42 PM, Ravi Kiran wrote:
>> hi,
>>
>>
>> I want to know what kind of hash function postgresql uses while joining.
>> I was debugging through gdb, I found out that it is not using bob
>> jenkins hash function but a different hash function *hash_uint32() and
>> hash_any() *functions if the joining attribute is an integer, and a
>> different kind of hash function for a different type of joining
>> attribute.
>>
>> I want to know whether the hash functions will change if the number of
>> tuples in the table are very large or very low, and if it changes,
>> please tell me what hash function it uses if the tuples are very large
>> and what hash function it uses if the number is very low, also I came to
>> know that the hash function will change depending on the type of the
>> attribute on which the join takes place, but will it always remains the
>> same for the integer type of joining attribute or will it it change.
>
> Seems a good starting point is here:
>
>
http://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/backend/access/hash;h=6e44d337d3dc3e8d5439e2dbbc54ce0c33d3b5d4;hb=8ca336f4ac3f08a5f23e76c6e9a5f2c8064f5883

http://git.postgresql.org/gitweb/?p=postgresql.git;a=tree;f=src/backend/access/hash;hb=HEAD
is even better, since that's HEAD.

To answer your question, AFAIK there's no consideration made for the
number of tuples, and I rather doubt that would help much. If you want
to go mucking around with this I think it'd be more useful to look at
using a different hash algorithm for types that could be significant in
size (namely varchar, text and bytea). There are algorithms newer than
Jenkins that look to be significantly faster on larger volumes of data.

You might find https://github.com/decibel/hash_testing useful; it's way
to test hash performance of several hashes.
https://github.com/decibel/postgres/tree/xxhash is where I pulled xxhash
into Postgres to try using it for hashing shared buffer identifiers. It
didn't help there, but it might be faster than Jenkins for things like
hashing text fields.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


В списке pgsql-general по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Synchronous archiving
Следующее
От: Juan Pablo L
Дата:
Сообщение: Re: array in a store procedure in C