Re: [HACKERS] Multicolumn hash indexes

Поиск
Список
Период
Сортировка
От Tomasz Ostrowski
Тема Re: [HACKERS] Multicolumn hash indexes
Дата
Msg-id 3d44a329-7ced-6d43-06c3-4a2d0bb62f38@ato.waw.pl
обсуждение исходный текст
Ответ на Re: [HACKERS] Multicolumn hash indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] Multicolumn hash indexes  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 09/27/2017 05:57 PM, Tom Lane wrote:
> If we follow GIST's lead that the leading column is "most important",
> the idea could be to require a search constraint on the first column,
> which produces the hash that determines the bucket assignment.  Hashes
> for additional columns would just be payload data in the index entry.
> If you have search constraint(s) on low-order column(s), you can check
> for hash matches before visiting the heap, but they don't reduce how
> much of the index you have to search.  Even btree works that way for
> many combinations of incomplete index constraints.

I feel that this would eliminate a large amount of potential gains from 
such an index. This would be usable only when a sufficiently variable 
column exists, in which case a simple hash index on the column wouldn't 
be much worse.

But I have an idea. What if there was a requirement for the search 
criteria to use tuple equality comparison:where (a,b,c)=(?,?,?)
orwhere (a,b,c) in ((?,?,?),(?,?,?),(?,?,?),(?,?,?))

Wouldn't it eliminate problems with disappearing conditions?

Actually maybe this could be implemented just like a functional index. 
So it would implement reasonably something that otherwise would be a 
terribly hackish and slow solution like:
create or replace function hashhack(a bytea, b bytea)returns bigintlanguage sqlimmutableas $$    -- uses 'x1e' (record
separator)   -- to ensure hashhack('a','')!=hashhack('','a')    select (      'x'      ||      substr(
md5($1||'\x1e'::bytea||$2),      1,       16      )    )::bit(64)::bigint;$$;
 
create index t_hashhack_a_b_idx  on t( hashhack(a::bytea,b::bytea) );
        select * from t  where a='a' and b='b'      and    hashhack(a::bytea, b::bytea)    =
hashhack('a'::bytea,'b'::bytea);

If if was automatic man could avoid the overhead of converting data to 
bytea/string, concatenating, truncating, converting back to bigint, 
rechecking condition etc. that make this kind of hack not very sane.

Even providing a specially crafted function or operator for queries 
specifically targeted for the index would be quite sufficient:where pg_equal( (a,b,c), (?,?,?) );

-- 
Tomasz "Tometzky" Ostrowski


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Use of RangeVar for partitioned tables in autovacuum
Следующее
От: Nico Williams
Дата:
Сообщение: Re: [HACKERS] Multicolumn hash indexes