Обсуждение: Re: On-disk bitmap index patch

Поиск
Список
Период
Сортировка

Re: On-disk bitmap index patch

От
"Luke Lonergan"
Дата:
I think we do know, have you reviewed the results in the briefing?


- Luke

Sent from my GoodLink synchronized handheld (www.good.com)

-----Original Message-----
From:     mark@mark.mielke.cc [mailto:mark@mark.mielke.cc]
Sent:    Tuesday, July 25, 2006 01:09 AM Eastern Standard Time
To:    Tom Lane
Cc:    Bruce Momjian; Jie Zhang; Hannu Krosing; Gavin Sherry; pgsql-hackers@postgresql.org; Luke Lonergan
Subject:    Re: [HACKERS] On-disk bitmap index patch

On Tue, Jul 25, 2006 at 12:36:42AM -0400, Tom Lane wrote:
> mark@mark.mielke.cc writes:
> > Reading 1/4, for a larger table, has a good chance of being faster than
> > reading 4/4 of the table. :-)
> Really?
>
> If you have to hit one tuple out of four, it's pretty much guaranteed
> that you will need to fetch every heap page.  So using an index provides
> zero I/O savings on the heap side, and any fetches needed to read the
> index are pure cost.  Now you have to demonstrate that the CPU costs
> involved in processing the index are significantly cheaper than the cost
> of just testing the WHERE qual at every heap tuple --- not a bet that's
> likely to win at a one-in-four ratio.

Haha. Of course - but that's assuming uniform spread of the values.
Next I would try clustering the table on the bitmap index... :-)

My databases aren't as large as many of yours. Most or all of them
will fit in 1 Gbytes of RAM. The I/O cost isn't substantial for these,
but the WHERE clause might be.

But yeah - we don't know. Waste of code or performance boost.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem... 
                          http://mark.mielke.cc/





Re: On-disk bitmap index patch

От
Tom Lane
Дата:
"Luke Lonergan" <LLonergan@greenplum.com> writes:
> I think we do know, have you reviewed the results in the briefing?

I find those results moderately unconvincing, primarily because they
are based on choosing the least efficient possible data representation
(viz char(n)), and thus the btree indexes suffer severe and quite
artificial bloat.  A database schema chosen with even minimal attention
to PG-specific tuning would probably have btree indexes half the size.
That wouldn't completely eliminate the performance differential shown,
but it would bring it down into the ballpark where you have to question
whether it makes sense to support another index AM.

The reason I have such high sales resistance is that we've carried the
hash and rtree AMs for years, hoping that someone would do the work to
make them actually worth using, with little result.  I don't want any
more second-class-citizen index AMs, and that's why I'm questioning
what the scope of applicability of bitmap indexes really is.  They need
to be popular enough that people will be motivated to work on them,
or they shouldn't be in core.
        regards, tom lane


Re: On-disk bitmap index patch

От
Hannu Krosing
Дата:
Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane:
> "Luke Lonergan" <LLonergan@greenplum.com> writes:
> > I think we do know, have you reviewed the results in the briefing?
> 
> I find those results moderately unconvincing, primarily because they
> are based on choosing the least efficient possible data representation
> (viz char(n)), and thus the btree indexes suffer severe and quite
> artificial bloat. 

hmm, maybe this should be fixed in btree then ?

do we really need to store padding blanks in btree ?

> A database schema chosen with even minimal attention
> to PG-specific tuning would probably have btree indexes half the size.
> That wouldn't completely eliminate the performance differential shown,
> but it would bring it down into the ballpark where you have to question
> whether it makes sense to support another index AM.

It still depends on your data volumes. if you spend lots and lots of $
on hardware just to store surplus index bloat, it may be worth it.

Remember, that the bizgres folks develop these things for real-world
datawarehousing, where saving a few (tens or hundreds of) terabytes of
storage and corresponging amount of RAM is a real win.

> The reason I have such high sales resistance is that we've carried the
> hash and rtree AMs for years, hoping that someone would do the work to
> make them actually worth using, with little result.

What would be the use-case for hash indexes ? And what should be done to
make them faster than btree ? I know that they are not wal-logged, but
this is beside the point for DWH apps.

and was'nt the rtree index recently deprecated in favour of GIST
implementation of the same ?

> I don't want any
> more second-class-citizen index AMs, and that's why I'm questioning
> what the scope of applicability of bitmap indexes really is. They need
> to be popular enough that people will be motivated to work on them,
> or they shouldn't be in core.

Is there an easy way to put them into contrib/ for some test period so
that they can become popular among mainstream postgresql users ?

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



Re: On-disk bitmap index patch

От
Tom Lane
Дата:
Hannu Krosing <hannu@skype.net> writes:
> Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane:
>> The reason I have such high sales resistance is that we've carried the
>> hash and rtree AMs for years, hoping that someone would do the work to
>> make them actually worth using, with little result.

> What would be the use-case for hash indexes ? And what should be done to
> make them faster than btree ?

If we knew, we'd do it ;-)  But no one's put enough effort into it
to find out.

> and was'nt the rtree index recently deprecated in favour of GIST
> implementation of the same ?

Yeah, we finally gave up on rtree entirely.  I don't want to see any
other index AMs languishing in the closet like that.  If bitmap can
hold its own to the extent that people are interested in working on
it/improving it, then great, but I'm worried that it's not going to
have a wide enough use-case to attract maintainers.
        regards, tom lane


Re: On-disk bitmap index patch

От
"Luke Lonergan"
Дата:
Tom,

On 7/25/06 1:31 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> Yeah, we finally gave up on rtree entirely.  I don't want to see any
> other index AMs languishing in the closet like that.  If bitmap can
> hold its own to the extent that people are interested in working on
> it/improving it, then great, but I'm worried that it's not going to
> have a wide enough use-case to attract maintainers.

How do we close the gap?

I think Jie is interested in maintaining it, and we're looking to extend the
range of applications for both the AM and extensions that use the raw bitmap
comparators made available to the executor.  This should be just the start
of some really great work on speedy access using bitmaps.

Even as it sits, the on-disk bitmap is over 100x faster in cases where it's
suited and the other commercial DBMS have had this popular feature for
years.

- Luke 




Re: On-disk bitmap index patch

От
"Ayush Parashar"
Дата:
> I find those results moderately unconvincing, primarily because they
> are based on choosing the least efficient possible data representation
> (viz char(n)), and thus the btree indexes suffer severe and quite
> artificial bloat.  A database schema chosen with even minimal attention
The specific data was chosen in presentation, because it comes from TPC-H
definition (data that everybody understands) and they were the only columns
where bitmap index will be more beneficial (i.e. the columns were low
cardinality ones).

> to PG-specific tuning would probably have btree indexes half the size.
> That wouldn't completely eliminate the performance differential shown,
> but it would bring it down into the ballpark where you have to question
> whether it makes sense to support another index AM.
Comparison of index creation time and index size for similar cardinalities
for *integer* and *numeric* data-types for b-tree and bitmap index. Benefits
are seen in all the cases, index creation time, space taken by index as well
as querying:
Total rows in the table: 2 million
Size of table in kbytes: 431976
Table definition and column cardinalities:Column |       Type        | Cardinality
--------+-------------------+-----------c1     | integer           |i10    | integer           | 10i50    | integer
     | 50n10    | numeric           | 10n50    | numeric           | 50ctext  | character varying | 


Note: Time in seconds and size in kbytes (as seen from select
pg_relation_size()):

-------------------------------------------------------------------------
Column  B-tree size     Bitmap size     B-tree time     Bitmap time
-------------------------------------------------------------------------
i10     35056           2392            11.8            6.0
i50     35056           4328            10.8            6.4
n10     52264           2656            34.8            9.6
n50     52616           4272            34.3            11.7
-------------------------------------------------------------------------

Query timings (in seconds):
-------------------------------------------------------------------------
Query                                           B-tree          Bitmap
-------------------------------------------------------------------------
select count(*) from btree_test where           4.08            1.61
i10=5 and i50=2 and  n10=0.1 and n50=0.05;
-------------------------------------------------------------------------

This test case fits in memory, the benefits seen will be more if we take
bigger test case.
I will like to re-iterate the benefits of bitmap index:
1. Size and time to create bitmap index is less than b-tree index for low
cardinality column of any data-type.
2. The gain in query performance can be attributed to ŒBitmap And¹ and
ŒBitmap Or¹ operations being more efficient for bitmap indexes as compared
to b-trees. Note that both bitmap and b-tree indexes use the bitmap index
scan access method, however the behavior is different for each. With a
b-tree index, the b-tree indexes are scanned to create a temporary in-memory
bitmap index. With the Bizgres on-disk bitmap index, the bitmap scan
retrieves several on-disk bitmap pages at once and provides them to the
ŒBitmap And¹ and ŒBitmap Or¹ operators. The smaller size of the bitmap
indexes combined with the efficiency of the And and Or operators are the
reasons for the increase in query speed.

Ayush




Hash indexes (was: On-disk bitmap index patch)

От
Jim Nasby
Дата:
On Jul 25, 2006, at 3:31 PM, Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
>> Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane:
>>> The reason I have such high sales resistance is that we've
>>> carried the
>>> hash and rtree AMs for years, hoping that someone would do the
>>> work to
>>> make them actually worth using, with little result.
>
>> What would be the use-case for hash indexes ? And what should be
>> done to
>> make them faster than btree ?
>
> If we knew, we'd do it ;-)  But no one's put enough effort into it
> to find out.

Do they use the same hash algorithm as hash joins/aggregation? If so,
wouldn't hash indexes be faster for those operations than regular
indexes?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461




Re: Hash indexes (was: On-disk bitmap index patch)

От
Alvaro Herrera
Дата:
Jim Nasby wrote:
> On Jul 25, 2006, at 3:31 PM, Tom Lane wrote:
> >Hannu Krosing <hannu@skype.net> writes:

> >>What would be the use-case for hash indexes ? And what should be
> >>done to make them faster than btree ?
> >
> >If we knew, we'd do it ;-)  But no one's put enough effort into it
> >to find out.
> 
> Do they use the same hash algorithm as hash joins/aggregation? If so,  
> wouldn't hash indexes be faster for those operations than regular  
> indexes?

The main problem doesn't seem to be in the hash algorithm (which I
understand to mean the hashing function), but in the protocol for
concurrent access of index pages, and the distribution of keys in pages
of a single hash key.

This is described in a README file or a code comment somewhere in the
hash AM code.  Someone needs to do some profiling to find out what the
bottleneck really is, and ideally find a way to fix it.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Hash indexes (was: On-disk bitmap index patch)

От
"Jim C. Nasby"
Дата:
On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote:
> Jim Nasby wrote:
> > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote:
> > >Hannu Krosing <hannu@skype.net> writes:
> 
> > >>What would be the use-case for hash indexes ? And what should be
> > >>done to make them faster than btree ?
> > >
> > >If we knew, we'd do it ;-)  But no one's put enough effort into it
> > >to find out.
> > 
> > Do they use the same hash algorithm as hash joins/aggregation? If so,  
> > wouldn't hash indexes be faster for those operations than regular  
> > indexes?
> 
> The main problem doesn't seem to be in the hash algorithm (which I
> understand to mean the hashing function), but in the protocol for
> concurrent access of index pages, and the distribution of keys in pages
> of a single hash key.
> 
> This is described in a README file or a code comment somewhere in the
> hash AM code.  Someone needs to do some profiling to find out what the
> bottleneck really is, and ideally find a way to fix it.

What I'm getting at is that I've never seen any explanation for the
theoretical use cases where a hash index would outperform a btree. If we
knew what kind of problems hash indexes were supposed to solve, we could
try and interest people who are solving those kinds of problems in
fixing hash indexes.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Hash indexes (was: On-disk bitmap index patch)

От
Alvaro Herrera
Дата:
Jim C. Nasby wrote:

> What I'm getting at is that I've never seen any explanation for the
> theoretical use cases where a hash index would outperform a btree. If we
> knew what kind of problems hash indexes were supposed to solve, we could
> try and interest people who are solving those kinds of problems in
> fixing hash indexes.

The btree index needs to descend potentially many pages before getting
to the leaf page, where the actual index is stored.  The hash index can
get at the "leaf" node in --supposedly-- one fetch.  Btree is O(logN) to
get a single key, while hash is O(1).  Our problem lies in the
constants; for btree they are smaller than for hash, so in practice
that O(logN) is always smaller than O(1).

I've heard other database systems manage to have hash indexes that are
actually faster than btree, so either (1) our btree absolutely rocks, or
(2) their hash implementations are better (probably both).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Hash indexes (was: On-disk bitmap index patch)

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> The btree index needs to descend potentially many pages before getting
> to the leaf page, where the actual index is stored.  The hash index can
> get at the "leaf" node in --supposedly-- one fetch.  Btree is O(logN) to
> get a single key, while hash is O(1).  Our problem lies in the
> constants; for btree they are smaller than for hash, so in practice
> that O(logN) is always smaller than O(1).

> I've heard other database systems manage to have hash indexes that are
> actually faster than btree, so either (1) our btree absolutely rocks, or
> (2) their hash implementations are better (probably both).

I think the problem may well be that we use hash buckets that are too
large (ie, whole pages).  After we fetch the page, we have to grovel
through every tuple on it to find the one(s) that really match the
query, whereas btree has a much more intelligent strategy (viz binary
search) to do its intrapage searches.  Smaller buckets would help make
up for this.

Another issue is that we don't store the raw hashcode in the index
tuples, so the only way to test a tuple is to actually invoke the
datatype equality function.  If we stored the whole 32-bit hashcode
we could eliminate non-matching hashcodes cheaply.  I'm not sure how
painful it'd be to do this though ... hash uses the same index tuple
layout as everybody else, and so there's no convenient place to put
the hashcode.

Anyway the bottom line here is that no one's tried hard to fix it,
but there are certainly things that might help.
        regards, tom lane


Re: Hash indexes (was: On-disk bitmap index patch)

От
"Jim C. Nasby"
Дата:
On Fri, Jul 28, 2006 at 03:14:33PM -0400, Alvaro Herrera wrote:
> Jim C. Nasby wrote:
> 
> > What I'm getting at is that I've never seen any explanation for the
> > theoretical use cases where a hash index would outperform a btree. If we
> > knew what kind of problems hash indexes were supposed to solve, we could
> > try and interest people who are solving those kinds of problems in
> > fixing hash indexes.
> 
> The btree index needs to descend potentially many pages before getting
> to the leaf page, where the actual index is stored.  The hash index can
> get at the "leaf" node in --supposedly-- one fetch.  Btree is O(logN) to
> get a single key, while hash is O(1).  Our problem lies in the
> constants; for btree they are smaller than for hash, so in practice
> that O(logN) is always smaller than O(1).
> 
> I've heard other database systems manage to have hash indexes that are
> actually faster than btree, so either (1) our btree absolutely rocks, or
> (2) their hash implementations are better (probably both).

In that case, perhaps this is something Greenplum might be interested
in, since it might fit nicely between bitmap and btree indexes.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: On-disk bitmap index patch

От
"Dann Corbit"
Дата:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Luke Lonergan
> Sent: Friday, July 28, 2006 12:18 PM
> To: Jim C. Nasby; Jie Zhang
> Cc: Tom Lane; Mark Kirkwood; Josh Berkus; Gavin Sherry; pgsql-
> hackers@postgresql.org
> Subject: Re: [HACKERS] On-disk bitmap index patch
>
> Jim,
>
> On 7/28/06 10:17 AM, "Jim C. Nasby" <jnasby@pervasive.com> wrote:
>
> > If the usefulness of bitmap indexes is still in doubt, could someone
at
> > Greenplum provide data from actual data warehouses from actual
> > customers?
>
> First, is anyone in doubt?

Others have looked into the usefulness of bitmap indexes.  Here is what
they found:
http://www.oracle.com/technology/pub/articles/sharma_indexes.html
http://citeseer.ist.psu.edu/stockinger02bitmap.html

Oracle, IBM, and even Microsoft[1] supports them.  Probably not just to
be trendy.

[1] Microsoft SQL Server creates temporary bitmap indexes during some
queries, though you cannot declaratively create a bitmap index.
> - Luke
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


Re: On-disk bitmap index patch

От
Tom Lane
Дата:
"Dann Corbit" <DCorbit@connx.com> writes:
> Others have looked into the usefulness of bitmap indexes.  Here is what
> they found:
> http://www.oracle.com/technology/pub/articles/sharma_indexes.html

I like this guy's style of argument: he admits a bitmap index on a
unique column will be much bigger than a btree, and then airily
dismisses it as not a problem.  Not real convincing.

> http://citeseer.ist.psu.edu/stockinger02bitmap.html

Both of these pages say up front that they are considering read-only
data.  So one of the questions that has to be answered (and the
submitters have been entirely mum about) is exactly how bad is the
update performance?  If it's really awful that's going to constrain
the use cases quite a lot, whereas merely "a bit slower than btree"
wouldn't be such a problem.

In any case, arguing that other DBs find it's a win will cut no ice
with me.  See adjacent discussion about hash indexes --- those *ought*
to be a win, but they aren't in Postgres, for reasons that are still
guesses.  The translation gap between other DBs' experience and ours
can be large.
        regards, tom lane


Re: On-disk bitmap index patch

От
Bruce Momjian
Дата:
What we don't want to happen is for us to release bitmapped indexes, and
find out later that btree is better in all cases.  Then we have to tell
people not to use bitmapped indexes until we fix it in the next major
releasse.  FYI, that is  basically where we are right now with hash
indexes.

---------------------------------------------------------------------------

Tom Lane wrote:
> "Dann Corbit" <DCorbit@connx.com> writes:
> > Others have looked into the usefulness of bitmap indexes.  Here is what
> > they found:
> > http://www.oracle.com/technology/pub/articles/sharma_indexes.html
> 
> I like this guy's style of argument: he admits a bitmap index on a
> unique column will be much bigger than a btree, and then airily
> dismisses it as not a problem.  Not real convincing.
> 
> > http://citeseer.ist.psu.edu/stockinger02bitmap.html
> 
> Both of these pages say up front that they are considering read-only
> data.  So one of the questions that has to be answered (and the
> submitters have been entirely mum about) is exactly how bad is the
> update performance?  If it's really awful that's going to constrain
> the use cases quite a lot, whereas merely "a bit slower than btree"
> wouldn't be such a problem.
> 
> In any case, arguing that other DBs find it's a win will cut no ice
> with me.  See adjacent discussion about hash indexes --- those *ought*
> to be a win, but they aren't in Postgres, for reasons that are still
> guesses.  The translation gap between other DBs' experience and ours
> can be large.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: On-disk bitmap index patch

От
"Luke Lonergan"
Дата:
Bruce,

On 7/28/06 1:25 PM, "Bruce Momjian" <bruce@momjian.us> wrote:

> What we don't want to happen is for us to release bitmapped indexes, and
> find out later that btree is better in all cases.  Then we have to tell
> people not to use bitmapped indexes until we fix it in the next major
> releasse.  FYI, that is  basically where we are right now with hash
> indexes.

On this thread people have presented results that show clear and irrefutable
evidence that there are use cases where bitmap indexes outperform Btree for
many datatypes on realistic problems, including the TPC-H benchmark.

In many cases the bitmap indexes outperform BTREE by a factor of 50 and are
a tiny fraction of the size and also take dramatically less time to build.

Of the cases presented, we need to have someone specifically address them
and point out why they aren't proof of bitmap index performance.  So far
this has not been done, rather there are some unsupported opinions about
other cases that might be problematic.

- Luke




Re: On-disk bitmap index patch

От
Andrew Dunstan
Дата:
Tom Lane wrote:
>
> Both of these pages say up front that they are considering read-only
> data.  So one of the questions that has to be answered (and the
> submitters have been entirely mum about) is exactly how bad is the
> update performance?  If it's really awful that's going to constrain
> the use cases quite a lot, whereas merely "a bit slower than btree"
> wouldn't be such a problem.
>
> In any case, arguing that other DBs find it's a win will cut no ice
> with me.  See adjacent discussion about hash indexes --- those *ought*
> to be a win, but they aren't in Postgres, for reasons that are still
> guesses.  The translation gap between other DBs' experience and ours
> can be large.
>   


Notwithstanding that, I have a couple of non-postgres data points / 
anecdotes on this.

Back in my days as an Ingres DBA in the mid 90s, our fairly highly tuned 
system used hash organised tables only for small fairly static 
lookup-type tables (state codes, postcodes, etc). Everything that was 
more dynamic was done with btree indexed tables.

A few years later, I was producing very large tables of email addresses 
using BDB. I quickly stopped using hash tables when I found that the 
reorganisation penalty was huge. Switching to btree worked just fine, 
with no sudden performance blip. This might not be directly relevant, 
but clearly the bucket size is.

I guess what we need to demonstrate is that the better hash performance 
will actually persist to a scale where it is actually worth it - surely 
for very small tables the index method won't matter much anyway.

cheers

andrew


Re: On-disk bitmap index patch

От
Hannu Krosing
Дата:
Ühel kenal päeval, R, 2006-07-28 kell 16:18, kirjutas Tom Lane:
> "Dann Corbit" <DCorbit@connx.com> writes:
> > Others have looked into the usefulness of bitmap indexes.  Here is what
> > they found:
> > http://www.oracle.com/technology/pub/articles/sharma_indexes.html
> 
> I like this guy's style of argument: he admits a bitmap index on a
> unique column will be much bigger than a btree, and then airily
> dismisses it as not a problem.  Not real convincing.

This problem can be easyly avoided by not creating bitmap indexes on
unique columns. So I think it is ok to dismiss it.

> > http://citeseer.ist.psu.edu/stockinger02bitmap.html
> 
> Both of these pages say up front that they are considering read-only
> data.  So one of the questions that has to be answered (and the
> submitters have been entirely mum about) is exactly how bad is the
> update performance?  If it's really awful that's going to constrain
> the use cases quite a lot, whereas merely "a bit slower than btree"
> wouldn't be such a problem.

May be.

OTOH, in OLAP databases you may be better off dropping the indexes
before data loading and rebuilding them after. And it has been shown
that bitmap indexes build a lot faster than btrees.

> In any case, arguing that other DBs find it's a win will cut no ice
> with me. 

How about a more general argument. I claim that an index that is small
and fits in RAM is faster than a big one that does not fit in RAM.

> See adjacent discussion about hash indexes --- those *ought*
> to be a win, but they aren't in Postgres, for reasons that are still
> guesses.  The translation gap between other DBs' experience and ours
> can be large.

IIRC the tests showing bitmap indexes being much faster on TPC-H were
done on postgresql, no ?

You pointed out that btree indexes are more bloated in this case as they
store padding spaces for all CHAR(N) fields whereas bitmap index stores
padding spaces only once for each distinct value. 

Are there any plans to start optimising btree storage model in
forseeable future ?

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



Re: On-disk bitmap index patch

От
Hannu Krosing
Дата:
Ühel kenal päeval, R, 2006-07-28 kell 16:25, kirjutas Bruce Momjian:
> What we don't want to happen is for us to release bitmapped indexes, and
> find out later that btree is better in all cases.  

Actually I'd love it if adding bitmap indexes to core pg would magically
make btree several times faster for the cases where bitmap indexes are
faster now :)

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




Re: On-disk bitmap index patch

От
mark@mark.mielke.cc
Дата:
On Fri, Jul 28, 2006 at 02:43:23PM -0700, Luke Lonergan wrote:
> On 7/28/06 1:25 PM, "Bruce Momjian" <bruce@momjian.us> wrote:
> > What we don't want to happen is for us to release bitmapped indexes, and
> > find out later that btree is better in all cases.  Then we have to tell
> > people not to use bitmapped indexes until we fix it in the next major
> > releasse.  FYI, that is  basically where we are right now with hash
> > indexes.
> On this thread people have presented results that show clear and irrefutable
> evidence that there are use cases where bitmap indexes outperform Btree for
> many datatypes on realistic problems, including the TPC-H benchmark.

Irrefutable is a little optimistic, don't you think? :-)

There is reason to believe that a bitmap index is useful in some
scenarios. We're not yet clear on what these are, whether they apply
to production use scenarios, or whether b-tree could not be optimized
to be better.

I support you - I want to see these great things for myself.

But irrefutable? Irrefutable is not true. :-)

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: On-disk bitmap index patch

От
"Luke Lonergan"
Дата:
Mark,

> -----Original Message-----
> From: mark@mark.mielke.cc [mailto:mark@mark.mielke.cc]
> Sent: Friday, July 28, 2006 9:26 PM
>
> But irrefutable? Irrefutable is not true. :-)

How about unrefuted.  The evidence has not been refuted, and not
directly discussed or discounted.

BTREE can not be optimized to produce the results we've presented, the
discussion about char(n) datatypes was irrelevant as we had shown
results for INT, numeric and char/varchar and they were all dramatically
better than BTREE.

I am hopeful this discussion takes a rapid turn toward the quantitative
assessment of the results.

- Luke



Re: On-disk bitmap index patch

От
Bruce Momjian
Дата:
Luke Lonergan wrote:
> Mark, 
> 
> > -----Original Message-----
> > From: mark@mark.mielke.cc [mailto:mark@mark.mielke.cc] 
> > Sent: Friday, July 28, 2006 9:26 PM
> > 
> > But irrefutable? Irrefutable is not true. :-)
> 
> How about unrefuted.  The evidence has not been refuted, and not
> directly discussed or discounted.
> 
> BTREE can not be optimized to produce the results we've presented, the
> discussion about char(n) datatypes was irrelevant as we had shown
> results for INT, numeric and char/varchar and they were all dramatically
> better than BTREE.
> 
> I am hopeful this discussion takes a rapid turn toward the quantitative
> assessment of the results.

Right.  People need a patch to test on their workloads, and analysis can
be done after feature freeze.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: On-disk bitmap index patch

От
"Luke Lonergan"
Дата:
Bruce,

On 7/29/06 6:31 AM, "Bruce Momjian" <bruce@momjian.us> wrote:

> Right.  People need a patch to test on their workloads, and analysis can
> be done after feature freeze.

Fair enough.

- Luke




Re: Hash indexes (was: On-disk bitmap index patch)

От
Kenneth Marshall
Дата:
On Fri, Jul 28, 2006 at 12:14:49PM -0500, Jim C. Nasby wrote:
> On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote:
> > Jim Nasby wrote:
> > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote:
> > > >Hannu Krosing <hannu@skype.net> writes:
> > 
> > > >>What would be the use-case for hash indexes ? And what should be
> > > >>done to make them faster than btree ?
> > > >
> > > >If we knew, we'd do it ;-)  But no one's put enough effort into it
> > > >to find out.
> > > 
> > > Do they use the same hash algorithm as hash joins/aggregation? If so,  
> > > wouldn't hash indexes be faster for those operations than regular  
> > > indexes?
> > 
> > The main problem doesn't seem to be in the hash algorithm (which I
> > understand to mean the hashing function), but in the protocol for
> > concurrent access of index pages, and the distribution of keys in pages
> > of a single hash key.
> > 
> > This is described in a README file or a code comment somewhere in the
> > hash AM code.  Someone needs to do some profiling to find out what the
> > bottleneck really is, and ideally find a way to fix it.
> 
> What I'm getting at is that I've never seen any explanation for the
> theoretical use cases where a hash index would outperform a btree. If we
> knew what kind of problems hash indexes were supposed to solve, we could
> try and interest people who are solving those kinds of problems in
> fixing hash indexes.

The big win for hash indexes is the idea that searching for a single
value should only take 1 I/O operation in a perfect world. Btree can
not do that.

Ken


Re: Hash indexes (was: On-disk bitmap index patch)

От
Gregory Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> I think the problem may well be that we use hash buckets that are too
> large (ie, whole pages).  After we fetch the page, we have to grovel
> through every tuple on it to find the one(s) that really match the
> query, whereas btree has a much more intelligent strategy (viz binary
> search) to do its intrapage searches.  Smaller buckets would help make
> up for this.

Hm, you would expect hash indexes to still be a win for very large indexes
where you're worried more about i/o than cpu resources.

> Another issue is that we don't store the raw hashcode in the index
> tuples, so the only way to test a tuple is to actually invoke the
> datatype equality function.  If we stored the whole 32-bit hashcode
> we could eliminate non-matching hashcodes cheaply.  I'm not sure how
> painful it'd be to do this though ... hash uses the same index tuple
> layout as everybody else, and so there's no convenient place to put
> the hashcode.

I looked a while back and was suspicious about the actual hash functions too.
It seemed like a lot of them were vastly suboptimal. That would mean we're
often dealing with mostly empty and mostly full buckets instead of well
distributed hash tables.


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: Hash indexes

От
Andrew Dunstan
Дата:
Gregory Stark wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>
>   
>> I think the problem may well be that we use hash buckets that are too
>> large (ie, whole pages).  After we fetch the page, we have to grovel
>> through every tuple on it to find the one(s) that really match the
>> query, whereas btree has a much more intelligent strategy (viz binary
>> search) to do its intrapage searches.  Smaller buckets would help make
>> up for this.
>>     
>
> Hm, you would expect hash indexes to still be a win for very large indexes
> where you're worried more about i/o than cpu resources.
>
>   
>> Another issue is that we don't store the raw hashcode in the index
>> tuples, so the only way to test a tuple is to actually invoke the
>> datatype equality function.  If we stored the whole 32-bit hashcode
>> we could eliminate non-matching hashcodes cheaply.  I'm not sure how
>> painful it'd be to do this though ... hash uses the same index tuple
>> layout as everybody else, and so there's no convenient place to put
>> the hashcode.
>>     
>
> I looked a while back and was suspicious about the actual hash functions too.
> It seemed like a lot of them were vastly suboptimal. That would mean we're
> often dealing with mostly empty and mostly full buckets instead of well
> distributed hash tables.
>
>
>   

This is now sounding like a lot of low hanging fruit ... highly 
performant hash indexed tables could possibly be a very big win.

cheers

andrew


Re: Hash indexes (was: On-disk bitmap index patch)

От
"Luke Lonergan"
Дата:
Jim,

On 7/28/06 12:27 PM, "Jim C. Nasby" <jnasby@pervasive.com> wrote:

> In that case, perhaps this is something Greenplum might be interested
> in, since it might fit nicely between bitmap and btree indexes.

I'm certainly following the thread.

We have talked about hash and btree organized tables both here, but haven't
gotten far enough along to evaluate what's already there in pg.

Looks like this thread has nicely characterized the problems with what's
there.

WRT hashing - we use FNV hash which is a very nice, very fast modern hash
algorithm.  We would contribute that if we worked on this.

- Luke




Re: Hash indexes (was: On-disk bitmap index patch)

От
Neil Conway
Дата:
On Tue, 2006-08-01 at 07:55 -0700, Luke Lonergan wrote:
> WRT hashing - we use FNV hash which is a very nice, very fast modern hash
> algorithm.  We would contribute that if we worked on this.

We currently use Bob Jenkins' hash function[1], which is apparently
faster than FNV on most architectures except the Pentium IV (because the
P4 has slow shifting -- see [2]). I haven't compared their collision
rates -- it may be that we could improve matters incrementally by
switching to FNV, but the fundamental problems with our hash index
implementation lie elsewhere.

-Neil

[1] http://burtleburtle.net/bob/hash/doobs.html
[2] http://www.azillionmonkeys.com/qed/hash.html



Re: Hash indexes

От
"Jim C. Nasby"
Дата:
On Tue, Aug 01, 2006 at 10:54:10AM -0400, Andrew Dunstan wrote:
> This is now sounding like a lot of low hanging fruit ... highly 
> performant hash indexed tables could possibly be a very big win.

So does someone want to 'take up the cause' for them?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Hash indexes

От
Hannu Krosing
Дата:
Ühel kenal päeval, T, 2006-08-01 kell 10:54, kirjutas Andrew Dunstan:
> Gregory Stark wrote:
> > 
> > I looked a while back and was suspicious about the actual hash functions too.
> > It seemed like a lot of them were vastly suboptimal. That would mean we're
> > often dealing with mostly empty and mostly full buckets instead of well
> > distributed hash tables.
> >
> >
> >   
> 
> This is now sounding like a lot of low hanging fruit ... highly 
> performant hash indexed tables could possibly be a very big win.
> 

Are you sure about the badness of our hash functions ?

I just tested and hashtext(text) has about 1.4% of collisions on about
120M distinct texts, which is not bad considering thet total space for
hashes is 4G, meaning that 120M covers itself already 3% of possible
hash space.


-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




Re: Hash indexes

От
Greg Stark
Дата:
Hannu Krosing <hannu@skype.net> writes:

> Ühel kenal päeval, T, 2006-08-01 kell 10:54, kirjutas Andrew Dunstan:
> > Gregory Stark wrote:
> > > 
> > > I looked a while back and was suspicious about the actual hash functions too.
> > > It seemed like a lot of them were vastly suboptimal. That would mean we're
> > > often dealing with mostly empty and mostly full buckets instead of well
> > > distributed hash tables.
> > >
> > >
> > >   
> > 
> > This is now sounding like a lot of low hanging fruit ... highly 
> > performant hash indexed tables could possibly be a very big win.
> > 
> 
> Are you sure about the badness of our hash functions ?
> 
> I just tested and hashtext(text) has about 1.4% of collisions on about
> 120M distinct texts, which is not bad considering thet total space for
> hashes is 4G, meaning that 120M covers itself already 3% of possible
> hash space.

I don't recall exactly what triggered my suspicions, but I think it had more
to do with things like how int4 and int8 were mapped to hash codes and what
the hash function looks like that compresses the hash codes into the actual
bucket. IIRC it's just the low order bits for int8 and it's the actual int4
which then only takes the low order bits. I seem to recall wondering what
would happen if you had, say, a list of /24 network addresses for example.

I didn't actually do any tests though so it's quite possible I missed
something that mitigated the issues I feared.


-- 
greg



Re: Hash indexes (was: On-disk bitmap index patch)

От
bob_jenkins@burtleburtle.net
Дата:
Kenneth Marshall wrote:
> On Fri, Jul 28, 2006 at 12:14:49PM -0500, Jim C. Nasby wrote:
> > On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote:
> > > Jim Nasby wrote:
> > > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote:
> > > > >Hannu Krosing <hannu@skype.net> writes:
> > >
> > > > >>What would be the use-case for hash indexes ? And what should be
> > > > >>done to make them faster than btree ?
> > > > >
> > > > >If we knew, we'd do it ;-)  But no one's put enough effort into it
> > > > >to find out.
> > > >
> > > > Do they use the same hash algorithm as hash joins/aggregation? If so,
> > > > wouldn't hash indexes be faster for those operations than regular
> > > > indexes?
> > >
> > > The main problem doesn't seem to be in the hash algorithm (which I
> > > understand to mean the hashing function), but in the protocol for
> > > concurrent access of index pages, and the distribution of keys in pages
> > > of a single hash key.
> > >
> > > This is described in a README file or a code comment somewhere in the
> > > hash AM code.  Someone needs to do some profiling to find out what the
> > > bottleneck really is, and ideally find a way to fix it.
> >
> > What I'm getting at is that I've never seen any explanation for the
> > theoretical use cases where a hash index would outperform a btree. If we
> > knew what kind of problems hash indexes were supposed to solve, we could
> > try and interest people who are solving those kinds of problems in
> > fixing hash indexes.
>
> The big win for hash indexes is the idea that searching for a single
> value should only take 1 I/O operation in a perfect world. Btree can
> not do that.

Hash indexes stored on disk still need a level of indirection -- you've
got to look up what range of blocks contains your hash value.  How big
your table of ranges is depends on how big the hash index is and how
big your ranges are.  Almost always you can fit that table into a block
cached in memory.  But, the root of a BTree is often cached in memory
too.  So there's no advantage for a hash index over a BTree index until
the BTree needs to grow to three levels deep, what is that, usually
10,000 or 100,000 records.  Beyond that, you're right, the BTree slowly
grows deeper while the hash index doesn't.



Re: On-disk bitmap index patch

От
Ron Mayer
Дата:
Tom Lane wrote:
> Both of these pages say up front that they are considering read-only
> data.  

Can I assume read-mostly partitions could use the read-I/O
efficient indexes on update-intensive partitions of the same
table could use b-tree indexes?

All of my larger (90GB+) tables can have partitions that are either
almost read-only (spatial data updated one state/country at
a time, about quarterly) or are totally read-only (previous
months and years historical data).

> So one of the questions that has to be answered (and the
> submitters have been entirely mum about) is exactly how bad is the
> update performance?  If it's really awful that's going to constrain
> the use cases quite a lot, whereas merely "a bit slower than btree"
> wouldn't be such a problem.

Once we have easy-to-use partitioning, would it be the case that
many larger tables will have near-read-only partitions that could
use I/O friendlier indexes (GIN?  Hash? Bitmap?)?  Any examples
of a large data set that can't be partitioned into hot areas and
read-mostly areas?



Re: Hash indexes (was: On-disk bitmap index patch)

От
Kenneth Marshall
Дата:
On Tue, Aug 01, 2006 at 02:26:18PM -0700, bob_jenkins@burtleburtle.net wrote:
> Kenneth Marshall wrote:
> > On Fri, Jul 28, 2006 at 12:14:49PM -0500, Jim C. Nasby wrote:
> > > On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote:
> > > > Jim Nasby wrote:
> > > > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote:
> > > > > >Hannu Krosing <hannu@skype.net> writes:
> > > >
> > > > > >>What would be the use-case for hash indexes ? And what should be
> > > > > >>done to make them faster than btree ?
> > > > > >
> > > > > >If we knew, we'd do it ;-)  But no one's put enough effort into it
> > > > > >to find out.
> > > > >
> > > > > Do they use the same hash algorithm as hash joins/aggregation? If so,
> > > > > wouldn't hash indexes be faster for those operations than regular
> > > > > indexes?
> > > >
> > > > The main problem doesn't seem to be in the hash algorithm (which I
> > > > understand to mean the hashing function), but in the protocol for
> > > > concurrent access of index pages, and the distribution of keys in pages
> > > > of a single hash key.
> > > >
> > > > This is described in a README file or a code comment somewhere in the
> > > > hash AM code.  Someone needs to do some profiling to find out what the
> > > > bottleneck really is, and ideally find a way to fix it.
> > >
> > > What I'm getting at is that I've never seen any explanation for the
> > > theoretical use cases where a hash index would outperform a btree. If we
> > > knew what kind of problems hash indexes were supposed to solve, we could
> > > try and interest people who are solving those kinds of problems in
> > > fixing hash indexes.
> >
> > The big win for hash indexes is the idea that searching for a single
> > value should only take 1 I/O operation in a perfect world. Btree can
> > not do that.
> 
> Hash indexes stored on disk still need a level of indirection -- you've
> got to look up what range of blocks contains your hash value.  How big
> your table of ranges is depends on how big the hash index is and how
> big your ranges are.  Almost always you can fit that table into a block
> cached in memory.  But, the root of a BTree is often cached in memory
> too.  So there's no advantage for a hash index over a BTree index until
> the BTree needs to grow to three levels deep, what is that, usually
> 10,000 or 100,000 records.  Beyond that, you're right, the BTree slowly
> grows deeper while the hash index doesn't.
> 

I have seen some clever hash techniques that used knowledge ofo the
file and directory structure to avoid the indirection allowing a single
I/O operation to retrieve the value of the index without needing another
layer of indirection. So it is possible given appropriate constraints.
Of course, postgresql would need to check for tuple validity unless that
could be incorporated into the index in some fashion.

Ken