Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

От: Tom Lane
Тема: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL
Дата: ,
Msg-id: 4407.1115698257@sss.pgh.pa.us
(см: обсуждение, исходный текст)
Ответ на: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway)
Ответы: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway)
Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  ("Jim C. Nasby")
Список: pgsql-performance

Скрыть дерево обсуждения

"Hash index" vs. "b-tree index" (PostgreSQL 8.0)  (Ying Lu, )
 Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL 8.0)  (Neil Conway, )
  Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL 8.0)  (Christopher Petrilli, )
   Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway, )
    Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  ("Jim C. Nasby", )
     Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway, )
      Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  ("Jim C. Nasby", )
       Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway, )
        Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
         Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway, )
          Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
           Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway, )
           Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Greg Stark, )
            Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
             Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Greg Stark, )
              Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
               Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Greg Stark, )
                Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
                 Federated PG servers -- Was: Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Mischa Sandberg, )
         Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  ("Jim C. Nasby", )
          Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
           Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  ("Jim C. Nasby", )
            Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Mischa Sandberg, )
        Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  ("Jim C. Nasby", )
         Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
         Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Mischa Sandberg, )
          Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Mark Lewis, )
          Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
           Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Mischa Sandberg, )
       Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Bruce Momjian, )
 Re: "Hash index" vs. "b-tree index" (PostgreSQL 8.0)  (David Roussel, )
 Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Tom Lane, )
  Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Neil Conway, )
 Re: [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL  (Greg Stark, )

Neil Conway <> writes:
> Jim C. Nasby wrote:
>>> No, hash joins and hash indexes are unrelated.
>> I know they are now, but does that have to be the case?

> I mean, the algorithms are fundamentally unrelated. They share a bit of
> code such as the hash functions themselves, but they are really solving
> two different problems

In fact, up till fairly recently they didn't even share the hash
functions.  Which was a bug not a feature, but the fact remains ---
there's not a lot of commonality.

>> Like I said, I don't know the history, so I don't know why we even
>> have them to begin with.

I think it's largely because some Berkeley grad student had a need to
implement hash indexes for academic reasons ;-)

> As I said, the idea of using hash indexes for better performance on
> equality scans is perfectly valid, it is just the implementation that
> needs work.

I was thinking about that earlier today.  It seems to me there is a
window within which hash indexes are theoretically superior, but it
might be pretty narrow.  The basic allure of a hash index is that you
look at the search key, do some allegedly-trivial computations, and go
directly to the relevant index page(s); whereas a btree index requires
descending through several upper levels of index pages to reach the
target leaf page.  On the other hand, once you reach the target index
page, a hash index has no better method than linear scan through all
the page's index entries to find the actually wanted key(s); in fact you
have to search all the pages in that index bucket.  A btree index can
use binary search to find the relevant items within the page.

So it strikes me that important parameters include the index entry size
and the number of entries matching any particular key value.

btree will probably win for smaller keys, on two grounds: it will have
fewer tree levels to descend through, because of higher fan-out, and it
will be much more efficient at finding the target entries within the
target page when there are many entries per page.  (As against this,
it will have to work harder at each upper tree page to decide where to
descend to --- but I think that's a second-order consideration.)

hash will tend to win as the number of duplicate keys increases, because
its relative inefficiency at finding the matches within a particular
bucket will become less significant.  (The ideal situation for a hash
index is to have only one actual key value per bucket.  You can't really
afford to store only one index entry per bucket, because of the sheer
I/O volume that would result, so you need multiple entries that will all
be responsive to your search.)  (This also brings up the thought that
it might be interesting to support hash buckets smaller than a page ...
but I don't know how to make that work in an adaptive fashion.)

I suspect that people haven't looked hard for a combination of these
parameters within which hash can win.  Of course the real question for
us is whether the window is wide enough to justify the maintenance
effort for hash.

            regards, tom lane


В списке pgsql-performance по дате сообщения:

От: Alex Stapleton
Дата:
Сообщение: Partitioning / Clustering
От: Matt Olson
Дата:
Сообщение: Prefetch