Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

От: Mischa Sandberg
Тема: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL
Дата: ,
Msg-id: 1115748765.4280f99d52dd5@webmail.telus.net
(см: обсуждение, исходный текст)
Ответ на: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  ("Jim C. Nasby")
Список: pgsql-performance

Скрыть дерево обсуждения

"Hash index" vs. "b-tree index" (PostgreSQL 8.0)  (Ying Lu, )
 Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL 8.0)  (Neil Conway, )
  Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL 8.0)  (Christopher Petrilli, )
   Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway, )
    Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  ("Jim C. Nasby", )
     Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway, )
      Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  ("Jim C. Nasby", )
       Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway, )
        Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
         Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway, )
          Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
           Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway, )
           Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Greg Stark, )
            Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
             Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Greg Stark, )
              Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
               Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Greg Stark, )
                Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
                 Federated PG servers -- Was: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Mischa Sandberg, )
         Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  ("Jim C. Nasby", )
          Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
           Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  ("Jim C. Nasby", )
            Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Mischa Sandberg, )
        Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  ("Jim C. Nasby", )
         Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
         Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Mischa Sandberg, )
          Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Mark Lewis, )
          Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
           Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Mischa Sandberg, )
       Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Bruce Momjian, )
 Re: "Hash index" vs. "b-tree index" (PostgreSQL 8.0)  (David Roussel, )
 Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
  Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway, )
 Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Greg Stark, )

Quoting "Jim C. Nasby" <>:

> I'm not really familiar enough with hash indexes to know if this
> would
> work, but if the maximum bucket size was known you could use that to
> determine a maximum range of buckets to look at. In some cases, that
> range would include only one bucket, otherwise it would be a set of
> buckets. If you found a set of buckets, I think you could then just
> go
> to the specific one you need.
>
> If we assume that the maximum bucket size is one page it becomes
> more
> realistic to take an existing large bucket and split it into several
> smaller ones. This could be done on an update to the index page, or
> a
> background process could handle it.
>
> In any case, should this go on the TODO list?
>
> > Allowing a bucket size to be specified at CREATE INDEX doesn't seem
> out
> > of line though.  We'd have to think up a scheme for
> index-AM-specific
> > index parameters ...
> --
> Jim C. Nasby, Database Consultant               
> Give your computer some brain candy! www.distributed.net Team #1828

Google "dynamic hash" or "linear hash". It takes care of not needing to
have varying bucket sizes.

Hash indexes are useful if you ALWAYS require disk access; they behave
like worst-case random cache-thrash tests. That's probably why dbs have
gravitated toward tree indexes instead. On the other hand, there's more
(good) to hashing than initially meets the eye.

Dynamic multiway hashing has come a long way from just splicing the bits
together from multiple columns' hash values. If you can lay your hands
on Tim Merrett's old text "Relational Information Systems", it's an
eye-opener. Picture an efficient terabyte spreadsheet.

For one thing, unlike a btree, a multicolumn hash is symmetric: it
doesn't matter which column(s) you do not specify in a partial match.

For another, a multiway hash is useful for much lower selectivity than a
btree. I built such indexes for OLAP cubes, and some dimensions were
only 10 elements wide. At the point where btree indexing becomes worse
than seqscan, a multiway hash tells you which 10% of the disk to scan.



В списке pgsql-performance по дате сообщения:

От: Greg Stark
Дата:
Сообщение: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL
От: Christopher Kings-Lynne
Дата:
Сообщение: Re: Partitioning / Clustering