Олег Самойлов wrote:
> \set table_size 1000000
> begin;
> create table gender (gender varchar);
>
> insert into gender (gender) select case when random<0.50 then 'female' when random<0.99 then 'male' else 'other' end
from(select random() as random, generate_series(1,:table_size)) as subselect;
>
> create index gender_btree on gender using btree (gender);
> create index gender_hash on gender using hash (gender);
> commit;
> vacuum full analyze;
>
> Vacuum full is not necessary here, just a little vodoo programming. I expected that the hash index will be much
smallerand quicker than the btree index, because it doesn’t keep values inside itself, only hashes. But:
>
> => \d+
> List of relations
> Schema | Name | Type | Owner | Size | Description
> --------+--------+-------+-------+-------+-------------
> public | gender | table | olleg | 35 MB |
> (1 row)
>
> => \di+
> List of relations
> Schema | Name | Type | Owner | Table | Size | Description
> --------+--------------+-------+-------+--------+-------+-------------
> public | gender_btree | index | olleg | gender | 21 MB |
> public | gender_hash | index | olleg | gender | 47 MB |
> (2 rows)
>
> The hash index not only is more than the btree index, but also is bigger than the table itself. What is wrong with
thehash index?
I guess the problem here is that there are so few distinct values, so
all the index items end up in only three hash buckets, forming large
linked lists.
I can't tell off-hand why that would make the index so large though.
Anyway, indexes are pretty useless in such a case.
Is the behavior the same if you have many distinct values?
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com