Обсуждение: Re: On-disk bitmap index patch
I think we do know, have you reviewed the results in the briefing?
- Luke
Sent from my GoodLink synchronized handheld (www.good.com)
-----Original Message-----
From:     mark@mark.mielke.cc [mailto:mark@mark.mielke.cc]
Sent:    Tuesday, July 25, 2006 01:09 AM Eastern Standard Time
To:    Tom Lane
Cc:    Bruce Momjian; Jie Zhang; Hannu Krosing; Gavin Sherry; pgsql-hackers@postgresql.org; Luke Lonergan
Subject:    Re: [HACKERS] On-disk bitmap index patch
On Tue, Jul 25, 2006 at 12:36:42AM -0400, Tom Lane wrote:
> mark@mark.mielke.cc writes:
> > Reading 1/4, for a larger table, has a good chance of being faster than
> > reading 4/4 of the table. :-)
> Really?
>
> If you have to hit one tuple out of four, it's pretty much guaranteed
> that you will need to fetch every heap page.  So using an index provides
> zero I/O savings on the heap side, and any fetches needed to read the
> index are pure cost.  Now you have to demonstrate that the CPU costs
> involved in processing the index are significantly cheaper than the cost
> of just testing the WHERE qual at every heap tuple --- not a bet that's
> likely to win at a one-in-four ratio.
Haha. Of course - but that's assuming uniform spread of the values.
Next I would try clustering the table on the bitmap index... :-)
My databases aren't as large as many of yours. Most or all of them
will fit in 1 Gbytes of RAM. The I/O cost isn't substantial for these,
but the WHERE clause might be.
But yeah - we don't know. Waste of code or performance boost.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem... 
                          http://mark.mielke.cc/
			
		"Luke Lonergan" <LLonergan@greenplum.com> writes:
> I think we do know, have you reviewed the results in the briefing?
I find those results moderately unconvincing, primarily because they
are based on choosing the least efficient possible data representation
(viz char(n)), and thus the btree indexes suffer severe and quite
artificial bloat.  A database schema chosen with even minimal attention
to PG-specific tuning would probably have btree indexes half the size.
That wouldn't completely eliminate the performance differential shown,
but it would bring it down into the ballpark where you have to question
whether it makes sense to support another index AM.
The reason I have such high sales resistance is that we've carried the
hash and rtree AMs for years, hoping that someone would do the work to
make them actually worth using, with little result.  I don't want any
more second-class-citizen index AMs, and that's why I'm questioning
what the scope of applicability of bitmap indexes really is.  They need
to be popular enough that people will be motivated to work on them,
or they shouldn't be in core.
        regards, tom lane
			
		Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane: > "Luke Lonergan" <LLonergan@greenplum.com> writes: > > I think we do know, have you reviewed the results in the briefing? > > I find those results moderately unconvincing, primarily because they > are based on choosing the least efficient possible data representation > (viz char(n)), and thus the btree indexes suffer severe and quite > artificial bloat. hmm, maybe this should be fixed in btree then ? do we really need to store padding blanks in btree ? > A database schema chosen with even minimal attention > to PG-specific tuning would probably have btree indexes half the size. > That wouldn't completely eliminate the performance differential shown, > but it would bring it down into the ballpark where you have to question > whether it makes sense to support another index AM. It still depends on your data volumes. if you spend lots and lots of $ on hardware just to store surplus index bloat, it may be worth it. Remember, that the bizgres folks develop these things for real-world datawarehousing, where saving a few (tens or hundreds of) terabytes of storage and corresponging amount of RAM is a real win. > The reason I have such high sales resistance is that we've carried the > hash and rtree AMs for years, hoping that someone would do the work to > make them actually worth using, with little result. What would be the use-case for hash indexes ? And what should be done to make them faster than btree ? I know that they are not wal-logged, but this is beside the point for DWH apps. and was'nt the rtree index recently deprecated in favour of GIST implementation of the same ? > I don't want any > more second-class-citizen index AMs, and that's why I'm questioning > what the scope of applicability of bitmap indexes really is. They need > to be popular enough that people will be motivated to work on them, > or they shouldn't be in core. Is there an easy way to put them into contrib/ for some test period so that they can become popular among mainstream postgresql users ? -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Hannu Krosing <hannu@skype.net> writes:
> Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane:
>> The reason I have such high sales resistance is that we've carried the
>> hash and rtree AMs for years, hoping that someone would do the work to
>> make them actually worth using, with little result.
> What would be the use-case for hash indexes ? And what should be done to
> make them faster than btree ?
If we knew, we'd do it ;-)  But no one's put enough effort into it
to find out.
> and was'nt the rtree index recently deprecated in favour of GIST
> implementation of the same ?
Yeah, we finally gave up on rtree entirely.  I don't want to see any
other index AMs languishing in the closet like that.  If bitmap can
hold its own to the extent that people are interested in working on
it/improving it, then great, but I'm worried that it's not going to
have a wide enough use-case to attract maintainers.
        regards, tom lane
			
		Tom, On 7/25/06 1:31 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Yeah, we finally gave up on rtree entirely. I don't want to see any > other index AMs languishing in the closet like that. If bitmap can > hold its own to the extent that people are interested in working on > it/improving it, then great, but I'm worried that it's not going to > have a wide enough use-case to attract maintainers. How do we close the gap? I think Jie is interested in maintaining it, and we're looking to extend the range of applications for both the AM and extensions that use the raw bitmap comparators made available to the executor. This should be just the start of some really great work on speedy access using bitmaps. Even as it sits, the on-disk bitmap is over 100x faster in cases where it's suited and the other commercial DBMS have had this popular feature for years. - Luke
> I find those results moderately unconvincing, primarily because they
> are based on choosing the least efficient possible data representation
> (viz char(n)), and thus the btree indexes suffer severe and quite
> artificial bloat.  A database schema chosen with even minimal attention
The specific data was chosen in presentation, because it comes from TPC-H
definition (data that everybody understands) and they were the only columns
where bitmap index will be more beneficial (i.e. the columns were low
cardinality ones).
> to PG-specific tuning would probably have btree indexes half the size.
> That wouldn't completely eliminate the performance differential shown,
> but it would bring it down into the ballpark where you have to question
> whether it makes sense to support another index AM.
Comparison of index creation time and index size for similar cardinalities
for *integer* and *numeric* data-types for b-tree and bitmap index. Benefits
are seen in all the cases, index creation time, space taken by index as well
as querying:
Total rows in the table: 2 million
Size of table in kbytes: 431976
Table definition and column cardinalities:Column |       Type        | Cardinality
--------+-------------------+-----------c1     | integer           |i10    | integer           | 10i50    | integer
     | 50n10    | numeric           | 10n50    | numeric           | 50ctext  | character varying | 
Note: Time in seconds and size in kbytes (as seen from select
pg_relation_size()):
-------------------------------------------------------------------------
Column  B-tree size     Bitmap size     B-tree time     Bitmap time
-------------------------------------------------------------------------
i10     35056           2392            11.8            6.0
i50     35056           4328            10.8            6.4
n10     52264           2656            34.8            9.6
n50     52616           4272            34.3            11.7
-------------------------------------------------------------------------
Query timings (in seconds):
-------------------------------------------------------------------------
Query                                           B-tree          Bitmap
-------------------------------------------------------------------------
select count(*) from btree_test where           4.08            1.61
i10=5 and i50=2 and  n10=0.1 and n50=0.05;
-------------------------------------------------------------------------
This test case fits in memory, the benefits seen will be more if we take
bigger test case.
I will like to re-iterate the benefits of bitmap index:
1. Size and time to create bitmap index is less than b-tree index for low
cardinality column of any data-type.
2. The gain in query performance can be attributed to Bitmap And¹ and
Bitmap Or¹ operations being more efficient for bitmap indexes as compared
to b-trees. Note that both bitmap and b-tree indexes use the bitmap index
scan access method, however the behavior is different for each. With a
b-tree index, the b-tree indexes are scanned to create a temporary in-memory
bitmap index. With the Bizgres on-disk bitmap index, the bitmap scan
retrieves several on-disk bitmap pages at once and provides them to the
Bitmap And¹ and Bitmap Or¹ operators. The smaller size of the bitmap
indexes combined with the efficiency of the And and Or operators are the
reasons for the increase in query speed.
Ayush
			
		On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: >> Ühel kenal päeval, T, 2006-07-25 kell 13:06, kirjutas Tom Lane: >>> The reason I have such high sales resistance is that we've >>> carried the >>> hash and rtree AMs for years, hoping that someone would do the >>> work to >>> make them actually worth using, with little result. > >> What would be the use-case for hash indexes ? And what should be >> done to >> make them faster than btree ? > > If we knew, we'd do it ;-) But no one's put enough effort into it > to find out. Do they use the same hash algorithm as hash joins/aggregation? If so, wouldn't hash indexes be faster for those operations than regular indexes? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim Nasby wrote: > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > >Hannu Krosing <hannu@skype.net> writes: > >>What would be the use-case for hash indexes ? And what should be > >>done to make them faster than btree ? > > > >If we knew, we'd do it ;-) But no one's put enough effort into it > >to find out. > > Do they use the same hash algorithm as hash joins/aggregation? If so, > wouldn't hash indexes be faster for those operations than regular > indexes? The main problem doesn't seem to be in the hash algorithm (which I understand to mean the hashing function), but in the protocol for concurrent access of index pages, and the distribution of keys in pages of a single hash key. This is described in a README file or a code comment somewhere in the hash AM code. Someone needs to do some profiling to find out what the bottleneck really is, and ideally find a way to fix it. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote: > Jim Nasby wrote: > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > > >Hannu Krosing <hannu@skype.net> writes: > > > >>What would be the use-case for hash indexes ? And what should be > > >>done to make them faster than btree ? > > > > > >If we knew, we'd do it ;-) But no one's put enough effort into it > > >to find out. > > > > Do they use the same hash algorithm as hash joins/aggregation? If so, > > wouldn't hash indexes be faster for those operations than regular > > indexes? > > The main problem doesn't seem to be in the hash algorithm (which I > understand to mean the hashing function), but in the protocol for > concurrent access of index pages, and the distribution of keys in pages > of a single hash key. > > This is described in a README file or a code comment somewhere in the > hash AM code. Someone needs to do some profiling to find out what the > bottleneck really is, and ideally find a way to fix it. What I'm getting at is that I've never seen any explanation for the theoretical use cases where a hash index would outperform a btree. If we knew what kind of problems hash indexes were supposed to solve, we could try and interest people who are solving those kinds of problems in fixing hash indexes. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > What I'm getting at is that I've never seen any explanation for the > theoretical use cases where a hash index would outperform a btree. If we > knew what kind of problems hash indexes were supposed to solve, we could > try and interest people who are solving those kinds of problems in > fixing hash indexes. The btree index needs to descend potentially many pages before getting to the leaf page, where the actual index is stored. The hash index can get at the "leaf" node in --supposedly-- one fetch. Btree is O(logN) to get a single key, while hash is O(1). Our problem lies in the constants; for btree they are smaller than for hash, so in practice that O(logN) is always smaller than O(1). I've heard other database systems manage to have hash indexes that are actually faster than btree, so either (1) our btree absolutely rocks, or (2) their hash implementations are better (probably both). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes:
> The btree index needs to descend potentially many pages before getting
> to the leaf page, where the actual index is stored.  The hash index can
> get at the "leaf" node in --supposedly-- one fetch.  Btree is O(logN) to
> get a single key, while hash is O(1).  Our problem lies in the
> constants; for btree they are smaller than for hash, so in practice
> that O(logN) is always smaller than O(1).
> I've heard other database systems manage to have hash indexes that are
> actually faster than btree, so either (1) our btree absolutely rocks, or
> (2) their hash implementations are better (probably both).
I think the problem may well be that we use hash buckets that are too
large (ie, whole pages).  After we fetch the page, we have to grovel
through every tuple on it to find the one(s) that really match the
query, whereas btree has a much more intelligent strategy (viz binary
search) to do its intrapage searches.  Smaller buckets would help make
up for this.
Another issue is that we don't store the raw hashcode in the index
tuples, so the only way to test a tuple is to actually invoke the
datatype equality function.  If we stored the whole 32-bit hashcode
we could eliminate non-matching hashcodes cheaply.  I'm not sure how
painful it'd be to do this though ... hash uses the same index tuple
layout as everybody else, and so there's no convenient place to put
the hashcode.
Anyway the bottom line here is that no one's tried hard to fix it,
but there are certainly things that might help.
        regards, tom lane
			
		On Fri, Jul 28, 2006 at 03:14:33PM -0400, Alvaro Herrera wrote: > Jim C. Nasby wrote: > > > What I'm getting at is that I've never seen any explanation for the > > theoretical use cases where a hash index would outperform a btree. If we > > knew what kind of problems hash indexes were supposed to solve, we could > > try and interest people who are solving those kinds of problems in > > fixing hash indexes. > > The btree index needs to descend potentially many pages before getting > to the leaf page, where the actual index is stored. The hash index can > get at the "leaf" node in --supposedly-- one fetch. Btree is O(logN) to > get a single key, while hash is O(1). Our problem lies in the > constants; for btree they are smaller than for hash, so in practice > that O(logN) is always smaller than O(1). > > I've heard other database systems manage to have hash indexes that are > actually faster than btree, so either (1) our btree absolutely rocks, or > (2) their hash implementations are better (probably both). In that case, perhaps this is something Greenplum might be interested in, since it might fit nicely between bitmap and btree indexes. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Luke Lonergan > Sent: Friday, July 28, 2006 12:18 PM > To: Jim C. Nasby; Jie Zhang > Cc: Tom Lane; Mark Kirkwood; Josh Berkus; Gavin Sherry; pgsql- > hackers@postgresql.org > Subject: Re: [HACKERS] On-disk bitmap index patch > > Jim, > > On 7/28/06 10:17 AM, "Jim C. Nasby" <jnasby@pervasive.com> wrote: > > > If the usefulness of bitmap indexes is still in doubt, could someone at > > Greenplum provide data from actual data warehouses from actual > > customers? > > First, is anyone in doubt? Others have looked into the usefulness of bitmap indexes. Here is what they found: http://www.oracle.com/technology/pub/articles/sharma_indexes.html http://citeseer.ist.psu.edu/stockinger02bitmap.html Oracle, IBM, and even Microsoft[1] supports them. Probably not just to be trendy. [1] Microsoft SQL Server creates temporary bitmap indexes during some queries, though you cannot declaratively create a bitmap index. > - Luke > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
"Dann Corbit" <DCorbit@connx.com> writes: > Others have looked into the usefulness of bitmap indexes. Here is what > they found: > http://www.oracle.com/technology/pub/articles/sharma_indexes.html I like this guy's style of argument: he admits a bitmap index on a unique column will be much bigger than a btree, and then airily dismisses it as not a problem. Not real convincing. > http://citeseer.ist.psu.edu/stockinger02bitmap.html Both of these pages say up front that they are considering read-only data. So one of the questions that has to be answered (and the submitters have been entirely mum about) is exactly how bad is the update performance? If it's really awful that's going to constrain the use cases quite a lot, whereas merely "a bit slower than btree" wouldn't be such a problem. In any case, arguing that other DBs find it's a win will cut no ice with me. See adjacent discussion about hash indexes --- those *ought* to be a win, but they aren't in Postgres, for reasons that are still guesses. The translation gap between other DBs' experience and ours can be large. regards, tom lane
What we don't want to happen is for us to release bitmapped indexes, and find out later that btree is better in all cases. Then we have to tell people not to use bitmapped indexes until we fix it in the next major releasse. FYI, that is basically where we are right now with hash indexes. --------------------------------------------------------------------------- Tom Lane wrote: > "Dann Corbit" <DCorbit@connx.com> writes: > > Others have looked into the usefulness of bitmap indexes. Here is what > > they found: > > http://www.oracle.com/technology/pub/articles/sharma_indexes.html > > I like this guy's style of argument: he admits a bitmap index on a > unique column will be much bigger than a btree, and then airily > dismisses it as not a problem. Not real convincing. > > > http://citeseer.ist.psu.edu/stockinger02bitmap.html > > Both of these pages say up front that they are considering read-only > data. So one of the questions that has to be answered (and the > submitters have been entirely mum about) is exactly how bad is the > update performance? If it's really awful that's going to constrain > the use cases quite a lot, whereas merely "a bit slower than btree" > wouldn't be such a problem. > > In any case, arguing that other DBs find it's a win will cut no ice > with me. See adjacent discussion about hash indexes --- those *ought* > to be a win, but they aren't in Postgres, for reasons that are still > guesses. The translation gap between other DBs' experience and ours > can be large. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce, On 7/28/06 1:25 PM, "Bruce Momjian" <bruce@momjian.us> wrote: > What we don't want to happen is for us to release bitmapped indexes, and > find out later that btree is better in all cases. Then we have to tell > people not to use bitmapped indexes until we fix it in the next major > releasse. FYI, that is basically where we are right now with hash > indexes. On this thread people have presented results that show clear and irrefutable evidence that there are use cases where bitmap indexes outperform Btree for many datatypes on realistic problems, including the TPC-H benchmark. In many cases the bitmap indexes outperform BTREE by a factor of 50 and are a tiny fraction of the size and also take dramatically less time to build. Of the cases presented, we need to have someone specifically address them and point out why they aren't proof of bitmap index performance. So far this has not been done, rather there are some unsupported opinions about other cases that might be problematic. - Luke
Tom Lane wrote: > > Both of these pages say up front that they are considering read-only > data. So one of the questions that has to be answered (and the > submitters have been entirely mum about) is exactly how bad is the > update performance? If it's really awful that's going to constrain > the use cases quite a lot, whereas merely "a bit slower than btree" > wouldn't be such a problem. > > In any case, arguing that other DBs find it's a win will cut no ice > with me. See adjacent discussion about hash indexes --- those *ought* > to be a win, but they aren't in Postgres, for reasons that are still > guesses. The translation gap between other DBs' experience and ours > can be large. > Notwithstanding that, I have a couple of non-postgres data points / anecdotes on this. Back in my days as an Ingres DBA in the mid 90s, our fairly highly tuned system used hash organised tables only for small fairly static lookup-type tables (state codes, postcodes, etc). Everything that was more dynamic was done with btree indexed tables. A few years later, I was producing very large tables of email addresses using BDB. I quickly stopped using hash tables when I found that the reorganisation penalty was huge. Switching to btree worked just fine, with no sudden performance blip. This might not be directly relevant, but clearly the bucket size is. I guess what we need to demonstrate is that the better hash performance will actually persist to a scale where it is actually worth it - surely for very small tables the index method won't matter much anyway. cheers andrew
Ühel kenal päeval, R, 2006-07-28 kell 16:18, kirjutas Tom Lane: > "Dann Corbit" <DCorbit@connx.com> writes: > > Others have looked into the usefulness of bitmap indexes. Here is what > > they found: > > http://www.oracle.com/technology/pub/articles/sharma_indexes.html > > I like this guy's style of argument: he admits a bitmap index on a > unique column will be much bigger than a btree, and then airily > dismisses it as not a problem. Not real convincing. This problem can be easyly avoided by not creating bitmap indexes on unique columns. So I think it is ok to dismiss it. > > http://citeseer.ist.psu.edu/stockinger02bitmap.html > > Both of these pages say up front that they are considering read-only > data. So one of the questions that has to be answered (and the > submitters have been entirely mum about) is exactly how bad is the > update performance? If it's really awful that's going to constrain > the use cases quite a lot, whereas merely "a bit slower than btree" > wouldn't be such a problem. May be. OTOH, in OLAP databases you may be better off dropping the indexes before data loading and rebuilding them after. And it has been shown that bitmap indexes build a lot faster than btrees. > In any case, arguing that other DBs find it's a win will cut no ice > with me. How about a more general argument. I claim that an index that is small and fits in RAM is faster than a big one that does not fit in RAM. > See adjacent discussion about hash indexes --- those *ought* > to be a win, but they aren't in Postgres, for reasons that are still > guesses. The translation gap between other DBs' experience and ours > can be large. IIRC the tests showing bitmap indexes being much faster on TPC-H were done on postgresql, no ? You pointed out that btree indexes are more bloated in this case as they store padding spaces for all CHAR(N) fields whereas bitmap index stores padding spaces only once for each distinct value. Are there any plans to start optimising btree storage model in forseeable future ? -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Ühel kenal päeval, R, 2006-07-28 kell 16:25, kirjutas Bruce Momjian: > What we don't want to happen is for us to release bitmapped indexes, and > find out later that btree is better in all cases. Actually I'd love it if adding bitmap indexes to core pg would magically make btree several times faster for the cases where bitmap indexes are faster now :) -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
On Fri, Jul 28, 2006 at 02:43:23PM -0700, Luke Lonergan wrote:
> On 7/28/06 1:25 PM, "Bruce Momjian" <bruce@momjian.us> wrote:
> > What we don't want to happen is for us to release bitmapped indexes, and
> > find out later that btree is better in all cases.  Then we have to tell
> > people not to use bitmapped indexes until we fix it in the next major
> > releasse.  FYI, that is  basically where we are right now with hash
> > indexes.
> On this thread people have presented results that show clear and irrefutable
> evidence that there are use cases where bitmap indexes outperform Btree for
> many datatypes on realistic problems, including the TPC-H benchmark.
Irrefutable is a little optimistic, don't you think? :-)
There is reason to believe that a bitmap index is useful in some
scenarios. We're not yet clear on what these are, whether they apply
to production use scenarios, or whether b-tree could not be optimized
to be better.
I support you - I want to see these great things for myself.
But irrefutable? Irrefutable is not true. :-)
Cheers,
mark
-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/
			
		Mark, > -----Original Message----- > From: mark@mark.mielke.cc [mailto:mark@mark.mielke.cc] > Sent: Friday, July 28, 2006 9:26 PM > > But irrefutable? Irrefutable is not true. :-) How about unrefuted. The evidence has not been refuted, and not directly discussed or discounted. BTREE can not be optimized to produce the results we've presented, the discussion about char(n) datatypes was irrelevant as we had shown results for INT, numeric and char/varchar and they were all dramatically better than BTREE. I am hopeful this discussion takes a rapid turn toward the quantitative assessment of the results. - Luke
Luke Lonergan wrote: > Mark, > > > -----Original Message----- > > From: mark@mark.mielke.cc [mailto:mark@mark.mielke.cc] > > Sent: Friday, July 28, 2006 9:26 PM > > > > But irrefutable? Irrefutable is not true. :-) > > How about unrefuted. The evidence has not been refuted, and not > directly discussed or discounted. > > BTREE can not be optimized to produce the results we've presented, the > discussion about char(n) datatypes was irrelevant as we had shown > results for INT, numeric and char/varchar and they were all dramatically > better than BTREE. > > I am hopeful this discussion takes a rapid turn toward the quantitative > assessment of the results. Right. People need a patch to test on their workloads, and analysis can be done after feature freeze. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce, On 7/29/06 6:31 AM, "Bruce Momjian" <bruce@momjian.us> wrote: > Right. People need a patch to test on their workloads, and analysis can > be done after feature freeze. Fair enough. - Luke
On Fri, Jul 28, 2006 at 12:14:49PM -0500, Jim C. Nasby wrote: > On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote: > > Jim Nasby wrote: > > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > > > >Hannu Krosing <hannu@skype.net> writes: > > > > > >>What would be the use-case for hash indexes ? And what should be > > > >>done to make them faster than btree ? > > > > > > > >If we knew, we'd do it ;-) But no one's put enough effort into it > > > >to find out. > > > > > > Do they use the same hash algorithm as hash joins/aggregation? If so, > > > wouldn't hash indexes be faster for those operations than regular > > > indexes? > > > > The main problem doesn't seem to be in the hash algorithm (which I > > understand to mean the hashing function), but in the protocol for > > concurrent access of index pages, and the distribution of keys in pages > > of a single hash key. > > > > This is described in a README file or a code comment somewhere in the > > hash AM code. Someone needs to do some profiling to find out what the > > bottleneck really is, and ideally find a way to fix it. > > What I'm getting at is that I've never seen any explanation for the > theoretical use cases where a hash index would outperform a btree. If we > knew what kind of problems hash indexes were supposed to solve, we could > try and interest people who are solving those kinds of problems in > fixing hash indexes. The big win for hash indexes is the idea that searching for a single value should only take 1 I/O operation in a perfect world. Btree can not do that. Ken
Tom Lane <tgl@sss.pgh.pa.us> writes: > I think the problem may well be that we use hash buckets that are too > large (ie, whole pages). After we fetch the page, we have to grovel > through every tuple on it to find the one(s) that really match the > query, whereas btree has a much more intelligent strategy (viz binary > search) to do its intrapage searches. Smaller buckets would help make > up for this. Hm, you would expect hash indexes to still be a win for very large indexes where you're worried more about i/o than cpu resources. > Another issue is that we don't store the raw hashcode in the index > tuples, so the only way to test a tuple is to actually invoke the > datatype equality function. If we stored the whole 32-bit hashcode > we could eliminate non-matching hashcodes cheaply. I'm not sure how > painful it'd be to do this though ... hash uses the same index tuple > layout as everybody else, and so there's no convenient place to put > the hashcode. I looked a while back and was suspicious about the actual hash functions too. It seemed like a lot of them were vastly suboptimal. That would mean we're often dealing with mostly empty and mostly full buckets instead of well distributed hash tables. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > >> I think the problem may well be that we use hash buckets that are too >> large (ie, whole pages). After we fetch the page, we have to grovel >> through every tuple on it to find the one(s) that really match the >> query, whereas btree has a much more intelligent strategy (viz binary >> search) to do its intrapage searches. Smaller buckets would help make >> up for this. >> > > Hm, you would expect hash indexes to still be a win for very large indexes > where you're worried more about i/o than cpu resources. > > >> Another issue is that we don't store the raw hashcode in the index >> tuples, so the only way to test a tuple is to actually invoke the >> datatype equality function. If we stored the whole 32-bit hashcode >> we could eliminate non-matching hashcodes cheaply. I'm not sure how >> painful it'd be to do this though ... hash uses the same index tuple >> layout as everybody else, and so there's no convenient place to put >> the hashcode. >> > > I looked a while back and was suspicious about the actual hash functions too. > It seemed like a lot of them were vastly suboptimal. That would mean we're > often dealing with mostly empty and mostly full buckets instead of well > distributed hash tables. > > > This is now sounding like a lot of low hanging fruit ... highly performant hash indexed tables could possibly be a very big win. cheers andrew
Jim, On 7/28/06 12:27 PM, "Jim C. Nasby" <jnasby@pervasive.com> wrote: > In that case, perhaps this is something Greenplum might be interested > in, since it might fit nicely between bitmap and btree indexes. I'm certainly following the thread. We have talked about hash and btree organized tables both here, but haven't gotten far enough along to evaluate what's already there in pg. Looks like this thread has nicely characterized the problems with what's there. WRT hashing - we use FNV hash which is a very nice, very fast modern hash algorithm. We would contribute that if we worked on this. - Luke
On Tue, 2006-08-01 at 07:55 -0700, Luke Lonergan wrote: > WRT hashing - we use FNV hash which is a very nice, very fast modern hash > algorithm. We would contribute that if we worked on this. We currently use Bob Jenkins' hash function[1], which is apparently faster than FNV on most architectures except the Pentium IV (because the P4 has slow shifting -- see [2]). I haven't compared their collision rates -- it may be that we could improve matters incrementally by switching to FNV, but the fundamental problems with our hash index implementation lie elsewhere. -Neil [1] http://burtleburtle.net/bob/hash/doobs.html [2] http://www.azillionmonkeys.com/qed/hash.html
On Tue, Aug 01, 2006 at 10:54:10AM -0400, Andrew Dunstan wrote: > This is now sounding like a lot of low hanging fruit ... highly > performant hash indexed tables could possibly be a very big win. So does someone want to 'take up the cause' for them? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Ühel kenal päeval, T, 2006-08-01 kell 10:54, kirjutas Andrew Dunstan: > Gregory Stark wrote: > > > > I looked a while back and was suspicious about the actual hash functions too. > > It seemed like a lot of them were vastly suboptimal. That would mean we're > > often dealing with mostly empty and mostly full buckets instead of well > > distributed hash tables. > > > > > > > > This is now sounding like a lot of low hanging fruit ... highly > performant hash indexed tables could possibly be a very big win. > Are you sure about the badness of our hash functions ? I just tested and hashtext(text) has about 1.4% of collisions on about 120M distinct texts, which is not bad considering thet total space for hashes is 4G, meaning that 120M covers itself already 3% of possible hash space. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Hannu Krosing <hannu@skype.net> writes: > Ãhel kenal päeval, T, 2006-08-01 kell 10:54, kirjutas Andrew Dunstan: > > Gregory Stark wrote: > > > > > > I looked a while back and was suspicious about the actual hash functions too. > > > It seemed like a lot of them were vastly suboptimal. That would mean we're > > > often dealing with mostly empty and mostly full buckets instead of well > > > distributed hash tables. > > > > > > > > > > > > > This is now sounding like a lot of low hanging fruit ... highly > > performant hash indexed tables could possibly be a very big win. > > > > Are you sure about the badness of our hash functions ? > > I just tested and hashtext(text) has about 1.4% of collisions on about > 120M distinct texts, which is not bad considering thet total space for > hashes is 4G, meaning that 120M covers itself already 3% of possible > hash space. I don't recall exactly what triggered my suspicions, but I think it had more to do with things like how int4 and int8 were mapped to hash codes and what the hash function looks like that compresses the hash codes into the actual bucket. IIRC it's just the low order bits for int8 and it's the actual int4 which then only takes the low order bits. I seem to recall wondering what would happen if you had, say, a list of /24 network addresses for example. I didn't actually do any tests though so it's quite possible I missed something that mitigated the issues I feared. -- greg
Kenneth Marshall wrote: > On Fri, Jul 28, 2006 at 12:14:49PM -0500, Jim C. Nasby wrote: > > On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote: > > > Jim Nasby wrote: > > > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > > > > >Hannu Krosing <hannu@skype.net> writes: > > > > > > > >>What would be the use-case for hash indexes ? And what should be > > > > >>done to make them faster than btree ? > > > > > > > > > >If we knew, we'd do it ;-) But no one's put enough effort into it > > > > >to find out. > > > > > > > > Do they use the same hash algorithm as hash joins/aggregation? If so, > > > > wouldn't hash indexes be faster for those operations than regular > > > > indexes? > > > > > > The main problem doesn't seem to be in the hash algorithm (which I > > > understand to mean the hashing function), but in the protocol for > > > concurrent access of index pages, and the distribution of keys in pages > > > of a single hash key. > > > > > > This is described in a README file or a code comment somewhere in the > > > hash AM code. Someone needs to do some profiling to find out what the > > > bottleneck really is, and ideally find a way to fix it. > > > > What I'm getting at is that I've never seen any explanation for the > > theoretical use cases where a hash index would outperform a btree. If we > > knew what kind of problems hash indexes were supposed to solve, we could > > try and interest people who are solving those kinds of problems in > > fixing hash indexes. > > The big win for hash indexes is the idea that searching for a single > value should only take 1 I/O operation in a perfect world. Btree can > not do that. Hash indexes stored on disk still need a level of indirection -- you've got to look up what range of blocks contains your hash value. How big your table of ranges is depends on how big the hash index is and how big your ranges are. Almost always you can fit that table into a block cached in memory. But, the root of a BTree is often cached in memory too. So there's no advantage for a hash index over a BTree index until the BTree needs to grow to three levels deep, what is that, usually 10,000 or 100,000 records. Beyond that, you're right, the BTree slowly grows deeper while the hash index doesn't.
Tom Lane wrote: > Both of these pages say up front that they are considering read-only > data. Can I assume read-mostly partitions could use the read-I/O efficient indexes on update-intensive partitions of the same table could use b-tree indexes? All of my larger (90GB+) tables can have partitions that are either almost read-only (spatial data updated one state/country at a time, about quarterly) or are totally read-only (previous months and years historical data). > So one of the questions that has to be answered (and the > submitters have been entirely mum about) is exactly how bad is the > update performance? If it's really awful that's going to constrain > the use cases quite a lot, whereas merely "a bit slower than btree" > wouldn't be such a problem. Once we have easy-to-use partitioning, would it be the case that many larger tables will have near-read-only partitions that could use I/O friendlier indexes (GIN? Hash? Bitmap?)? Any examples of a large data set that can't be partitioned into hot areas and read-mostly areas?
On Tue, Aug 01, 2006 at 02:26:18PM -0700, bob_jenkins@burtleburtle.net wrote: > Kenneth Marshall wrote: > > On Fri, Jul 28, 2006 at 12:14:49PM -0500, Jim C. Nasby wrote: > > > On Thu, Jul 27, 2006 at 01:46:01PM -0400, Alvaro Herrera wrote: > > > > Jim Nasby wrote: > > > > > On Jul 25, 2006, at 3:31 PM, Tom Lane wrote: > > > > > >Hannu Krosing <hannu@skype.net> writes: > > > > > > > > > >>What would be the use-case for hash indexes ? And what should be > > > > > >>done to make them faster than btree ? > > > > > > > > > > > >If we knew, we'd do it ;-) But no one's put enough effort into it > > > > > >to find out. > > > > > > > > > > Do they use the same hash algorithm as hash joins/aggregation? If so, > > > > > wouldn't hash indexes be faster for those operations than regular > > > > > indexes? > > > > > > > > The main problem doesn't seem to be in the hash algorithm (which I > > > > understand to mean the hashing function), but in the protocol for > > > > concurrent access of index pages, and the distribution of keys in pages > > > > of a single hash key. > > > > > > > > This is described in a README file or a code comment somewhere in the > > > > hash AM code. Someone needs to do some profiling to find out what the > > > > bottleneck really is, and ideally find a way to fix it. > > > > > > What I'm getting at is that I've never seen any explanation for the > > > theoretical use cases where a hash index would outperform a btree. If we > > > knew what kind of problems hash indexes were supposed to solve, we could > > > try and interest people who are solving those kinds of problems in > > > fixing hash indexes. > > > > The big win for hash indexes is the idea that searching for a single > > value should only take 1 I/O operation in a perfect world. Btree can > > not do that. > > Hash indexes stored on disk still need a level of indirection -- you've > got to look up what range of blocks contains your hash value. How big > your table of ranges is depends on how big the hash index is and how > big your ranges are. Almost always you can fit that table into a block > cached in memory. But, the root of a BTree is often cached in memory > too. So there's no advantage for a hash index over a BTree index until > the BTree needs to grow to three levels deep, what is that, usually > 10,000 or 100,000 records. Beyond that, you're right, the BTree slowly > grows deeper while the hash index doesn't. > I have seen some clever hash techniques that used knowledge ofo the file and directory structure to avoid the indirection allowing a single I/O operation to retrieve the value of the index without needing another layer of indirection. So it is possible given appropriate constraints. Of course, postgresql would need to check for tuple validity unless that could be incorporated into the index in some fashion. Ken