Обсуждение: Hash index use presently(?) discouraged since 2005: revive or bury it?
The doc at http://www.postgresql.org/docs/current/interactive/indexes-types.html says: "Caution: Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash. They are also not replicated over streaming or file-based replication. For these reasons, hash index use is presently discouraged." I found a thread here http://archives.postgresql.org/pgsql-general/2005-05/msg00370.php about <<"Hash index" vs. "b-tree index" (PostgreSQL 8.0)>> mentioning some issues, like they * are not faster than B-trees even for = comparisons * aren't WAL safe * have poor concurrency (require coarser locks), * are significantly slower than creating a b+-tree index. In fact these statements seem to rely on the docs back in version 7.2 (see http://www.postgresql.org/docs/7.2/static/indexes-types.html ) Has this been verified on a recent release? I can't believe that hash performs so bad over all these points. Theory tells me otherwise and http://en.wikipedia.org/wiki/Hash_table seems to be a success. Are there any plans to give hash index another chance (or to bury it with a reason)? Stefan
On 14 September 2011 00:04, Stefan Keller <sfkeller@gmail.com> wrote: > Has this been verified on a recent release? I can't believe that hash > performs so bad over all these points. Theory tells me otherwise and > http://en.wikipedia.org/wiki/Hash_table seems to be a success. Hash indexes have been improved since 2005 - their performance was improved quite a bit in 9.0. Here's a more recent analysis: http://www.depesz.com/index.php/2010/06/28/should-you-use-hash-index/ -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > On 14 September 2011 00:04, Stefan Keller <sfkeller@gmail.com> wrote: >> Has this been verified on a recent release? I can't believe that hash >> performs so bad over all these points. Theory tells me otherwise and >> http://en.wikipedia.org/wiki/Hash_table seems to be a success. > Hash indexes have been improved since 2005 - their performance was > improved quite a bit in 9.0. Here's a more recent analysis: > http://www.depesz.com/index.php/2010/06/28/should-you-use-hash-index/ Yeah, looking into the git logs shows several separate major changes committed during 2008, including storing only the hash code not the whole indexed value (big win on wide values, and lets you index values larger than one index page, which doesn't work in btree). I think that the current state of affairs is still what depesz said, namely that there might be cases where they'd be a win to use, except the lack of WAL support is a killer. I imagine somebody will step up and do that eventually. The big picture though is that we're not going to remove hash indexes, even if they're nearly useless in themselves, because hash index opclasses provide the foundation for the system's knowledge of how to do the datatype-specific hashing needed for hash joins and hash aggregation. And those things *are* big wins, even if hash indexes themselves never become so. regards, tom lane
2011/9/14 Tom Lane <tgl@sss.pgh.pa.us>: > (...) I think that > the current state of affairs is still what depesz said, namely that > there might be cases where they'd be a win to use, except the lack of > WAL support is a killer. I imagine somebody will step up and do that > eventually. Should I open a ticket? Stefan 2011/9/14 Tom Lane <tgl@sss.pgh.pa.us>: > Peter Geoghegan <peter@2ndquadrant.com> writes: >> On 14 September 2011 00:04, Stefan Keller <sfkeller@gmail.com> wrote: >>> Has this been verified on a recent release? I can't believe that hash >>> performs so bad over all these points. Theory tells me otherwise and >>> http://en.wikipedia.org/wiki/Hash_table seems to be a success. > >> Hash indexes have been improved since 2005 - their performance was >> improved quite a bit in 9.0. Here's a more recent analysis: > >> http://www.depesz.com/index.php/2010/06/28/should-you-use-hash-index/ > > Yeah, looking into the git logs shows several separate major changes > committed during 2008, including storing only the hash code not the > whole indexed value (big win on wide values, and lets you index values > larger than one index page, which doesn't work in btree). I think that > the current state of affairs is still what depesz said, namely that > there might be cases where they'd be a win to use, except the lack of > WAL support is a killer. I imagine somebody will step up and do that > eventually. > > The big picture though is that we're not going to remove hash indexes, > even if they're nearly useless in themselves, because hash index > opclasses provide the foundation for the system's knowledge of how to > do the datatype-specific hashing needed for hash joins and hash > aggregation. And those things *are* big wins, even if hash indexes > themselves never become so. > > regards, tom lane >
Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
От
Heikki Linnakangas
Дата:
On 14.09.2011 09:39, Stefan Keller wrote: > Should I open a ticket? What ticket? With whom? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
От
Heikki Linnakangas
Дата:
On 14.09.2011 03:24, Tom Lane wrote: > The big picture though is that we're not going to remove hash indexes, > even if they're nearly useless in themselves, because hash index > opclasses provide the foundation for the system's knowledge of how to > do the datatype-specific hashing needed for hash joins and hash > aggregation. And those things *are* big wins, even if hash indexes > themselves never become so. We could drop the hash indexam code but keep the opclasses etc. I'm not sure that would gain us, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
От
Leonardo Francalanci
Дата:
>> Hash indexes have been improved since 2005 - their performance was >> improved quite a bit in 9.0. Here's a more recent analysis: > >> http://www.depesz.com/index.php/2010/06/28/should-you-use-hash-index/ > > The big picture though is that we're not going to remove hash indexes, > even if they're nearly useless in themselves Well, if they provide 3x the performance of btree indexes on index creation, I wouldn't call them "useless" just because they're not logged or they can't be unique. In fact, I think the docs should specify that in index creation they're actually better than btree (if, in fact, they are and the "depesz" test is not a corner case).
2011/9/14 Tom Lane <tgl@sss.pgh.pa.us> writes: > (...) I think that > the current state of affairs is still what depesz said, namely that > there might be cases where they'd be a win to use, except the lack of > WAL support is a killer. I imagine somebody will step up and do that > eventually. How much of work (in man days) do you estimate would this mean for someone who can program but has to learn PG internals first? Stefan 2011/9/14 Tom Lane <tgl@sss.pgh.pa.us>: > Peter Geoghegan <peter@2ndquadrant.com> writes: >> On 14 September 2011 00:04, Stefan Keller <sfkeller@gmail.com> wrote: >>> Has this been verified on a recent release? I can't believe that hash >>> performs so bad over all these points. Theory tells me otherwise and >>> http://en.wikipedia.org/wiki/Hash_table seems to be a success. > >> Hash indexes have been improved since 2005 - their performance was >> improved quite a bit in 9.0. Here's a more recent analysis: > >> http://www.depesz.com/index.php/2010/06/28/should-you-use-hash-index/ > > Yeah, looking into the git logs shows several separate major changes > committed during 2008, including storing only the hash code not the > whole indexed value (big win on wide values, and lets you index values > larger than one index page, which doesn't work in btree). I think that > the current state of affairs is still what depesz said, namely that > there might be cases where they'd be a win to use, except the lack of > WAL support is a killer. I imagine somebody will step up and do that > eventually. > > The big picture though is that we're not going to remove hash indexes, > even if they're nearly useless in themselves, because hash index > opclasses provide the foundation for the system's knowledge of how to > do the datatype-specific hashing needed for hash joins and hash > aggregation. And those things *are* big wins, even if hash indexes > themselves never become so. > > regards, tom lane >
Stefan Keller <sfkeller@gmail.com> writes: > 2011/9/14 Tom Lane <tgl@sss.pgh.pa.us> writes: >> (...) I think that >> the current state of affairs is still what depesz said, namely that >> there might be cases where they'd be a win to use, except the lack of >> WAL support is a killer. I imagine somebody will step up and do that >> eventually. > How much of work (in man days) do you estimate would this mean for > someone who can program but has to learn PG internals first? No idea ... I'm probably not the best person to estimate how long it would take someone to get up to speed on the relevant internals, but I'm sure that would take longer than actually doing the work. While it's not a trivial task, I think it fits the definition of "a small matter of programming": a piece of code whose anticipated length is significantly greater than its complexity. regards, tom lane
On Wed, Sep 14, 2011 at 4:03 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 14.09.2011 03:24, Tom Lane wrote: >> >> The big picture though is that we're not going to remove hash indexes, >> even if they're nearly useless in themselves, because hash index >> opclasses provide the foundation for the system's knowledge of how to >> do the datatype-specific hashing needed for hash joins and hash >> aggregation. And those things *are* big wins, even if hash indexes >> themselves never become so. > > We could drop the hash indexam code but keep the opclasses etc. I'm not sure > that would gain us, though. HM, what if you junked the current hash indexam, and just implemented a wrapper over btree so that the 'hash index' was just short hand for hashing the value into a standard index? merlin
On Thu, Sep 15, 2011 at 5:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > > HM, what if you junked the current hash indexam, and just implemented > a wrapper over btree so that the 'hash index' was just short hand for > hashing the value into a standard index? I'm doing this (only by hand, indexing on hash(blah)) on an application, and it works wonders. But... it's kinda not a hash table. It's still O(log N). However, it would be a *very* useful feature if it can be made transparent for applications. And I would prefer it over a true hashtable, in the end. Hashes are, in fact, O(N) worst case.
On Thu, Sep 15, 2011 at 3:28 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Thu, Sep 15, 2011 at 5:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> >> HM, what if you junked the current hash indexam, and just implemented >> a wrapper over btree so that the 'hash index' was just short hand for >> hashing the value into a standard index? > > I'm doing this (only by hand, indexing on hash(blah)) on an > application, and it works wonders. > But... it's kinda not a hash table. It's still O(log N). > > However, it would be a *very* useful feature if it can be made > transparent for applications. > And I would prefer it over a true hashtable, in the end. Hashes are, > in fact, O(N) worst case. yeah -- in my (limited) testing, with int4 or int8, btree handily meets or beats hash on creation, access time, and index size. this suggests to me that a separate index implementation for hash isn't buying us much -- the integer btree code is highly optimized. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > HM, what if you junked the current hash indexam, and just implemented > a wrapper over btree so that the 'hash index' was just short hand for > hashing the value into a standard index? Surely creating such a wrapper would be *more* work than adding WAL support to the hash AM. I'm not entirely following this eagerness to junk that AM, anyway. We've put a lot of sweat into it over the years, in the hopes that it would eventually be good for something. It's on the edge of being good for something now, and there's doubtless room for more improvements, so why are the knives out? regards, tom lane
On Fri, Sep 16, 2011 at 12:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not entirely following this eagerness to junk that AM, anyway. > We've put a lot of sweat into it over the years, in the hopes that > it would eventually be good for something. It's on the edge of > being good for something now, and there's doubtless room for more > improvements, so why are the knives out? There are lots of issues with hash tables. I'm not going to enumerate them, you probably know them better than I. But the reality of it is that btree on hash values is a very useful index kind. It has stable performance, is very compact, and supports any type, even user defined, if the hashing function can be customized. They're better than hashes in all my tests for my use case (which is indexing over a column with long strings), and the only drawback is that they have to be supported by application code. If PG could have a native implementation, I'm sure lots of people would find it useful. Maybe scrapping the hash index is too much, but support for indexing with btree with hashing would be very neat. I read recently hash removed the need to store the value in the index, so I don't expect such a wrapper to be difficult to write.
On Thu, Sep 15, 2011 at 5:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> HM, what if you junked the current hash indexam, and just implemented >> a wrapper over btree so that the 'hash index' was just short hand for >> hashing the value into a standard index? > > Surely creating such a wrapper would be *more* work than adding WAL > support to the hash AM. > > I'm not entirely following this eagerness to junk that AM, anyway. > We've put a lot of sweat into it over the years, in the hopes that > it would eventually be good for something. It's on the edge of > being good for something now, and there's doubtless room for more > improvements, so why are the knives out? Just making an observation. Some quick tests follow the sig. I think the point here is that something has to be done -- now that the replication train has left the station, not having WAL has gone from quirky annoyance to major functionality failure. The recent hash work has brought down index build times to a reasonable level, but they are still getting beat by btree. Of course, it's not quite apples to apples (I figure the timings will even out to an extent once you add in the hashing wrapper), but I can't help but wonder if the btree code is a better driver and consolidating code is a good thing. merlin postgres=# create table v as select generate_series(1,10000000) as x; SELECT 10000000 Time: 16750.961 ms postgres=# create index on v(x); CREATE INDEX Time: 15158.637 ms postgres=# create index on v using hash(x); CREATE INDEX Time: 22505.468 ms postgres=# \d v Table "public.v" Column | Type | Modifiers --------+---------+----------- x | integer | Indexes: "v_x_idx" btree (x) "v_x_idx1" hash (x) postgres=# select relname, relfilenode from pg_class where relname like 'v_x%'; relname | relfilenode ----------+------------- v_x_idx | 16525 v_x_idx1 | 16526 (2 rows) c:\Program Files\PostgreSQL\9.0\data>dir/s | grep 16525 09/15/2011 07:46 PM 224,641,024 16525 c:\Program Files\PostgreSQL\9.0\data>dir/s | grep 16526 09/15/2011 07:49 PM 268,451,840 16526
On Fri, Sep 16, 2011 at 3:00 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > > c:\Program Files\PostgreSQL\9.0\data>dir/s | grep 16525 > 09/15/2011 07:46 PM 224,641,024 16525 > > c:\Program Files\PostgreSQL\9.0\data>dir/s | grep 16526 > 09/15/2011 07:49 PM 268,451,840 16526 That's not surprising at all. Hashes need to be bigger to avoid collisions. What's more interesting than index creation, is index maintainance and access costs. In my experience, btree beats hash. I haven't tried with 9.1, though.
On Thu, Sep 15, 2011 at 8:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Thu, Sep 15, 2011 at 5:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Merlin Moncure <mmoncure@gmail.com> writes: >>> HM, what if you junked the current hash indexam, and just implemented >>> a wrapper over btree so that the 'hash index' was just short hand for >>> hashing the value into a standard index? >> >> Surely creating such a wrapper would be *more* work than adding WAL >> support to the hash AM. >> >> I'm not entirely following this eagerness to junk that AM, anyway. >> We've put a lot of sweat into it over the years, in the hopes that >> it would eventually be good for something. It's on the edge of >> being good for something now, and there's doubtless room for more >> improvements, so why are the knives out? > > Just making an observation. Some quick tests follow the sig. I think > the point here is that something has to be done -- now that the > replication train has left the station, not having WAL has gone from > quirky annoyance to major functionality failure. The recent hash work > has brought down index build times to a reasonable level, but they are > still getting beat by btree. Of course, it's not quite apples to > apples (I figure the timings will even out to an extent once you add > in the hashing wrapper), but I can't help but wonder if the btree code > is a better driver and consolidating code is a good thing. odd: I was pondering Claudio's point about maintenance of hash indexes vs btree and decided to do some more tests. Something very strange is happening: I decided to compare 'update v set x=x+1', historically one of postgres's weaker points, on the 10M table indexed hash vs btree. The btree typically muddled through in about 5 minutes: postgres=# update v set x=x+1; UPDATE 10000000 Time: 302341.466 ms recreating the table and hash index, I ran it again. 47 minutes into the query, I started to get curious and noticed that cpu time disk usage are hovering near zero but nothing is blocked. disk space on the index is *slowly* increasing, now at: 09/15/2011 11:08 PM 541,024,256 16531 this is obviously, uh, windows, and I don't have good tools set up. I'll repeat the test when i get into the office this morning. thought I'd point it out. hm, cancelled the query, dropped the index, and re-ran the update without any indexes, everything is normal -- the update whistled through the table in about 35 seconds. hm, recreated the hash index and looking more carefully now. still seeing the lousy behavior. this definitely bears more investigation (this is 9.0.4)... merlin
2011/9/16 Tom Lane <tgl@sss.pgh.pa.us>: > I'm not entirely following this eagerness to junk that AM, anyway. > We've put a lot of sweat into it over the years, in the hopes that > it would eventually be good for something. It's on the edge of > being good for something now, and there's doubtless room for more > improvements, so why are the knives out? No knives from my side. Sorry for the exaggerated subject title. I'm also in favor for an enhanced hash index for cases where only "=" tests are processed and where only few inserts/deletes will occur. Stefan
Dne 15.9.2011 01:40, Tom Lane napsal(a): > Stefan Keller <sfkeller@gmail.com> writes: >> 2011/9/14 Tom Lane <tgl@sss.pgh.pa.us> writes: >>> (...) I think that >>> the current state of affairs is still what depesz said, namely that >>> there might be cases where they'd be a win to use, except the lack of >>> WAL support is a killer. I imagine somebody will step up and do that >>> eventually. > >> How much of work (in man days) do you estimate would this mean for >> someone who can program but has to learn PG internals first? > > No idea ... I'm probably not the best person to estimate how long it > would take someone to get up to speed on the relevant internals, > but I'm sure that would take longer than actually doing the work. > While it's not a trivial task, I think it fits the definition of > "a small matter of programming": a piece of code whose anticipated > length is significantly greater than its complexity. We've been asked by a local university for PostgreSQL-related topics of theses and seminary works, so I'm wondering if adding WAL support to hash indexes would be a good fit ... Can anyone estimate if a student with reasonable C-knowledge a can implement this in about 4 months? It seems like a reasonable amount of research and work to me. Tomas PS: All interesting thesis ideas are welcome.
On Tue, Sep 13, 2011 at 5:04 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 14 September 2011 00:04, Stefan Keller <sfkeller@gmail.com> wrote: >> Has this been verified on a recent release? I can't believe that hash >> performs so bad over all these points. Theory tells me otherwise and >> http://en.wikipedia.org/wiki/Hash_table seems to be a success. My understanding is that a huge amount of work has gone into making btree what it is in PG, and not nearly as much work has gone into making hash indexes what they could be. > Hash indexes have been improved since 2005 - their performance was > improved quite a bit in 9.0. Here's a more recent analysis: > > http://www.depesz.com/index.php/2010/06/28/should-you-use-hash-index/ They are 3 time faster to build. But if you rip the WAL logging out of btree, how much faster would those get? Also, that link doesn't address concurrency of selects at all, only of inserts. Cheers, Jeff
On Wed, Sep 14, 2011 at 4:03 PM, Stefan Keller <sfkeller@gmail.com> wrote: > 2011/9/14 Tom Lane <tgl@sss.pgh.pa.us> writes: >> (...) I think that >> the current state of affairs is still what depesz said, namely that >> there might be cases where they'd be a win to use, except the lack of >> WAL support is a killer. I imagine somebody will step up and do that >> eventually. I think that adding WAL to hash indexes without first addressing the heavy-weight locking issue would be a mistake. Even if the WAL was fixed, the bad performance under concurrent selects would still make it at best a narrow niche thing. And fixing the locking *after* WAL is in place would probably be very much harder than the other order. > How much of work (in man days) do you estimate would this mean for > someone who can program but has to learn PG internals first? Are these 8 hour days? :) I think it could be several months at least and a high likelihood of not getting done at all. (depending on how good the person is, of course). They would first have to become familiar with the WAL log and replay system. This is quite hairy. Also, I think that adding WAL to hash indexes would be even harder than for other indexes, because of bucket-splits, which can touch an arbitrarily high number of pages. At least, that is what lead me to give up on this last time I looked into it seriously. I think that if it were not for those bucket-splits, it would be relatively easy to get rid of both the heavy-weight locks, and to add WAL logging. I had considered proposing making hash indexes have a fixed number of buckets specified at creation time. That would be an unfortunate limitation, but I think it would be a net win over non-WAL, non-highly-concurrent hash indexes that currently exist. Especially if the number of buckets could be enlarged by concurrently making a new, larger, index and then dropping the old one. I've only thought about proposing it, because currently I don't have time to do anything on it if the proposal was well received. Cheers, Jeff
On Sat, Sep 17, 2011 at 4:48 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Tue, Sep 13, 2011 at 5:04 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> On 14 September 2011 00:04, Stefan Keller <sfkeller@gmail.com> wrote: >>> Has this been verified on a recent release? I can't believe that hash >>> performs so bad over all these points. Theory tells me otherwise and >>> http://en.wikipedia.org/wiki/Hash_table seems to be a success. > > My understanding is that a huge amount of work has gone into making > btree what it is in > PG, and not nearly as much work has gone into making hash indexes what > they could be. > > >> Hash indexes have been improved since 2005 - their performance was >> improved quite a bit in 9.0. Here's a more recent analysis: >> >> http://www.depesz.com/index.php/2010/06/28/should-you-use-hash-index/ > > They are 3 time faster to build. But if you rip the WAL logging out > of btree, how much faster would those get? > > Also, that link doesn't address concurrency of selects at all, only of inserts. Of course hash indexes are faster to build than varlen string indexes :-). I use natural keys 50-80% of the time and hash indexing would remove some of the pain in cases where I don't need ordering and range operations. In fact, if they are made to properly support wal logging and uniqueness, I imagine they should supplant btree in a broad range of cases, so much so that it would be awful nice to be able to have syntax to choose hash for primary keys and unique constraints. @ Jeff: >I think that adding WAL to hash indexes without first addressing the heavy-weight locking issue would be a mistake. Even if the WAL was fixed, the bad performance under concurrent selects would still make it at best a narrow niche thing. And fixing the locking *after* WAL is in place would probably be very much harder than the other order. Here again, I think that any proposed improvement in the current hash index code should be measured against wrapping a btree index. You get wal logging and high concurrency for free if you decide to do that. merlin
On Thu, Sep 15, 2011 at 9:20 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > > odd: I was pondering Claudio's point about maintenance of hash indexes > vs btree and decided to do some more tests. Something very strange is > happening: I decided to compare 'update v set x=x+1', historically > one of postgres's weaker points, on the 10M table indexed hash vs > btree. The btree typically muddled through in about 5 minutes: > > postgres=# update v set x=x+1; > UPDATE 10000000 > Time: 302341.466 ms > > recreating the table and hash index, I ran it again. 47 minutes into > the query, I started to get curious and noticed that cpu time disk > usage are hovering near zero but nothing is blocked. disk space on the > index is *slowly* increasing, now at: > 09/15/2011 11:08 PM 541,024,256 16531 The way you created the table, I think the rows are basically going to be in order in the table, which means the btree index accesses are going to visit the same block over and over again before going to the next block. With hash indexes, it will jump all over the place. Cheers, Jeff
Merlin and Jeff, General remark again:It's hard for me to imagine that btree is superior for all the issues mentioned before. I still believe in hash index for primary keys and certain unique constraints where you need equality search and don't need ordering or range search. 2011/9/17 Jeff Janes <jeff.janes@gmail.com>: (...) > Also, that link doesn't address concurrency of selects at all, only of inserts. How would (or did) you test and benchmark concurrency of inserts and selects? Use pgbench with own config for a blackbox test? 2011/9/18 Merlin Moncure <mmoncure@gmail.com>: > Here again, I think that any proposed improvement in the current hash > index code should be measured against wrapping a btree index. You > get wal logging and high concurrency for free if you decide to do > that. As I understand, this would be an enhancement of btree. That's ok for btree but not really exploiting all advantages of a separate hash index, would'nt it? Stefan 2011/9/18 Merlin Moncure <mmoncure@gmail.com>: > On Sat, Sep 17, 2011 at 4:48 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Tue, Sep 13, 2011 at 5:04 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >>> On 14 September 2011 00:04, Stefan Keller <sfkeller@gmail.com> wrote: >>>> Has this been verified on a recent release? I can't believe that hash >>>> performs so bad over all these points. Theory tells me otherwise and >>>> http://en.wikipedia.org/wiki/Hash_table seems to be a success. >> >> My understanding is that a huge amount of work has gone into making >> btree what it is in >> PG, and not nearly as much work has gone into making hash indexes what >> they could be. >> >> >>> Hash indexes have been improved since 2005 - their performance was >>> improved quite a bit in 9.0. Here's a more recent analysis: >>> >>> http://www.depesz.com/index.php/2010/06/28/should-you-use-hash-index/ >> >> They are 3 time faster to build. But if you rip the WAL logging out >> of btree, how much faster would those get? >> >> Also, that link doesn't address concurrency of selects at all, only of inserts. > > Of course hash indexes are faster to build than varlen string indexes > :-). I use natural keys 50-80% of the time and hash indexing would > remove some of the pain in cases where I don't need ordering and range > operations. In fact, if they are made to properly support wal logging > and uniqueness, I imagine they should supplant btree in a broad range > of cases, so much so that it would be awful nice to be able to have > syntax to choose hash for primary keys and unique constraints. > > @ Jeff: >>I think that adding WAL to hash indexes without first > addressing the heavy-weight locking issue would be a mistake. > Even if the WAL was fixed, the bad performance under > concurrent selects would still make it at best a narrow > niche thing. And fixing the locking *after* WAL is in place would > probably be very much harder than the other order. > > Here again, I think that any proposed improvement in the current hash > index code should be measured against wrapping a btree index. You > get wal logging and high concurrency for free if you decide to do > that. > > merlin >
On Sun, Sep 18, 2011 at 7:59 AM, Stefan Keller <sfkeller@gmail.com> wrote: > Merlin and Jeff, > > General remark again:It's hard for me to imagine that btree is > superior for all the issues mentioned before. I still believe in hash > index for primary keys and certain unique constraints where you need > equality search and don't need ordering or range search. I certainly agree that hash indexes as implemented in PG could be improved on. > > 2011/9/17 Jeff Janes <jeff.janes@gmail.com>: > (...) >> Also, that link doesn't address concurrency of selects at all, only of inserts. > > How would (or did) you test and benchmark concurrency of inserts and selects? > Use pgbench with own config for a blackbox test? I used pgbench -S -M prepared with a scale that fits in shared_buffers, at various concurrencies. drop the pgbench_accounts primary key and build alternatingly a regular index and a hash index between runs. (If the scale doesn't fit in memory, that should advantage the hash, but I haven't seen a large one--I've never tested a size at which the branch blocks don't fit in memory) It is hard to see real differences here because the index is not the main bottleneck, regardless of which index is in use (at least on only 8 CPUs, with enough CPUs you might be able to drive the hash index over the edge) I also used a custom pgbench option -P, (a patch adding which feature I was supposed to submit to this commitfest, but missed). Cuts down on a lot of the network chatter, locking, and other overhead and so simulates an index look up occurring on the inside of a nested loop. The performance at -c 1 was roughly equal, but at -c 8 the hash was three times slower. I don't recall concurrent testing inserts (not for speed, anyway). Cheers, Jeff
On Sun, Sep 18, 2011 at 9:31 PM, Stefan Keller <sfkeller@gmail.com> wrote: > I'm simply referring to literature (like the intro Ramakrishnan & Gehrke). > I just know that Oracle an Mysql actually do have them too and use it > without those current implementation specific restrictions in > Postgres. Where exactly do you take that from that Oracle has hash indexes? I can't seem to find them: http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/indexiot.htm#sthref293 Are you mixing this up with hash partitioning? http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/schemaob.htm#sthref443 Or am I missing something? > IMHO by design Hash Index (e.g. linear hashing) work best when: > 1. only equal (=) tests are used (on whole values) > 2. columns (key values) have very-high cardinality > > And ideally but not necessarily when index values do not change and > number of rows are known ahead of time (avoiding O(N) worst case - but > there are approaches to chaining with dynamic resizing). > > I just collected this to encourage ourselves that enhancing hash > indexes could be worthwhile. There's still the locking issue Jeff mentioned. At least every time a table resize occurs the whole index must be locked. Or is there a more fine granular locking strategy which I am overlooking? Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
Robert Klemme, 19.09.2011 13:13: > On Sun, Sep 18, 2011 at 9:31 PM, Stefan Keller<sfkeller@gmail.com> wrote: >> I'm simply referring to literature (like the intro Ramakrishnan& Gehrke). >> I just know that Oracle an Mysql actually do have them too and use it >> without those current implementation specific restrictions in >> Postgres. > > Where exactly do you take that from that Oracle has hash indexes? I > can't seem to find them: > http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/indexiot.htm#sthref293 > > Are you mixing this up with hash partitioning? > http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/schemaob.htm#sthref443 > > Or am I missing something? Maybe he was referring to a hash cluster: http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/statements_5001.htm This is a storage option where you can store related rows (e.g. in a parent/child relationship) in the same phyiscal databaseblock based on a hash value. That enables the databse to read parent and child rows with just a single IO. In the background Oracle probably has something like a hash index to support that. Thomas
On Sun, Sep 18, 2011 at 9:59 AM, Stefan Keller <sfkeller@gmail.com> wrote: > Merlin and Jeff, > > General remark again:It's hard for me to imagine that btree is > superior for all the issues mentioned before. I still believe in hash > index for primary keys and certain unique constraints where you need > equality search and don't need ordering or range search. It is -- but please understand I'm talking about int32 tree vs hash. Hashing as a technique of course is absolutely going to cream btree for all kinds of data because of the advantages of working with decomposed data -- we are all taught that in comp-sci 101 :-). The debate here is not about the advantages of hashing, but the specific implementation of the hash index used. Postgres's hash index implementation used to be pretty horrible -- it stored the pre-hashed datum in the index which, while making it easier to do certain things, made it horribly slow, and, for all intents and purposes, useless. Somewhat recently,a lot of work was put in to fix that -- the index now packs the hash code only which made it competitive with btree and superior for larger keys. However, certain technical limitations like lack of WAL logging and uniqueness hold hash indexing back from being used like it really should be. In cases where I really *do* need hash indexing, I do it in userland. create table foo ( a_long_field text; ); create index on foo(hash(a_long_field)); select * from foo where hash(a_long_field) = hash(some_value) and a_long_field = some_value; This technique works fine -- the main disadvantage is that enforcing uniqueness is a PITA but since the standard index doesn't support it either it's no great loss. I also have the option of getting 'uniqueness' and being able to skip the equality operation if I sacrifice some performance and choose a strong digest. Until the hash index issues are worked out, I submit that this remains the go-to method to do this. Now, my point here is that I've noticed that even with the latest optimizations btree seems to still be superior to the hash indexing by most metrics, so that: create table foo ( an_int_field int; a_long_field text; ); create index on foo(an_int_field); create index on foo using hash(a_long_field); On performance grounds alone, the btree index seems to be (from my very limited testing) a better bet. So I'm conjecturing that the current hash implementation should be replaced with a formalization of the userland technique shown above -- when you request a hash index, the database will silently hash your field and weave it into a btree. It's a hybrid: a hashed btree. To truly demonstrate if the technique was effective though, it would have to be coded up -- it's only fair to compare if the btree based hash is also double checking the value in the heap which the standard hash index must do. The other way to go of course is to try and fix up the existing hash index code -- add wal logging, etc. In theory, a customized hash structure should be able to beat btree all day long which argues to continue in this direction. @ jeff: >The way you created the table, I think the rows are basically going to be in order in the table, which means the btree index accesses are going to visit the same block over and over again before going to the next block. This does not explain the behavior. Yeah -- it may take longer but your computer should not be sitting idle during create index operations :-). Unfortunately, I was not able to reproduce it on linux. I have to bite the bullet and get the mingw up if I want to try and diagnose -- perhaps it is stalling in the semop calls. merlin
On Mon, Sep 19, 2011 at 4:04 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Sun, Sep 18, 2011 at 9:59 AM, Stefan Keller <sfkeller@gmail.com> wrote: >> Merlin and Jeff, >> >> General remark again:It's hard for me to imagine that btree is >> superior for all the issues mentioned before. I still believe in hash >> index for primary keys and certain unique constraints where you need >> equality search and don't need ordering or range search. > > It is -- but please understand I'm talking about int32 tree vs hash. > Hashing as a technique of course is absolutely going to cream btree > for all kinds of data because of the advantages of working with > decomposed data -- we are all taught that in comp-sci 101 :-). The > debate here is not about the advantages of hashing, but the specific > implementation of the hash index used. > > Postgres's hash index implementation used to be pretty horrible -- it > stored the pre-hashed datum in the index which, while making it easier > to do certain things, made it horribly slow, and, for all intents and > purposes, useless. Somewhat recently,a lot of work was put in to fix > that -- the index now packs the hash code only which made it > competitive with btree and superior for larger keys. However, certain > technical limitations like lack of WAL logging and uniqueness hold > hash indexing back from being used like it really should be. In cases > where I really *do* need hash indexing, I do it in userland. > > create table foo > ( > a_long_field text; > ); > create index on foo(hash(a_long_field)); > > select * from foo where hash(a_long_field) = hash(some_value) and > a_long_field = some_value; > > This technique works fine -- the main disadvantage is that enforcing > uniqueness is a PITA but since the standard index doesn't support it > either it's no great loss. I also have the option of getting > 'uniqueness' and being able to skip the equality operation if I > sacrifice some performance and choose a strong digest. Until the hash > index issues are worked out, I submit that this remains the go-to > method to do this. Is this approach (storing the hash code in a btree) really faster than a regular btree index on "a_long_field"? And if so, for which kind of data and load? > Now, my point here is that I've noticed that even with the latest > optimizations btree seems to still be superior to the hash indexing by > most metrics, so that: > create table foo > ( > an_int_field int; > a_long_field text; > ); > > create index on foo(an_int_field); > create index on foo using hash(a_long_field); > > On performance grounds alone, the btree index seems to be (from my > very limited testing) a better bet. So I'm conjecturing that the > current hash implementation should be replaced with a formalization of > the userland technique shown above -- when you request a hash index, > the database will silently hash your field and weave it into a btree. > It's a hybrid: a hashed btree. I'd rather call it a "btreefied hash" because you are storing a hash but in a btree structure. :-) But that's a detail. What I find worrying is that then there is a certain level of obscuring the real nature since "create index ... using hash" is not exactly creating a hash table. > To truly demonstrate if the technique > was effective though, it would have to be coded up -- it's only fair > to compare if the btree based hash is also double checking the value > in the heap which the standard hash index must do. Right. > The other way to go of course is to try and fix up the existing hash > index code -- add wal logging, etc. In theory, a customized hash > structure should be able to beat btree all day long which argues to > continue in this direction. I still haven't seen a solution to locking when a hash table needs resizing. All hashing algorithms I can think of at the moment would require a lock on the whole beast during the resize which makes this type of index impractical for certain loads (heavy updating). One solution would be to apply partitioning to the hash table itself (e.g. have four partitions for the two least significant bits or 16 for the four lest significant bits) and treat them independently. How that would interact with WAL I have no idea though. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
От
Vitalii Tymchyshyn
Дата:
19.09.11 18:19, Robert Klemme написав(ла): > On Mon, Sep 19, 2011 at 4:04 PM, Merlin Moncure<mmoncure@gmail.com> wrote: >> >> Postgres's hash index implementation used to be pretty horrible -- it >> stored the pre-hashed datum in the index which, while making it easier >> to do certain things, made it horribly slow, and, for all intents and >> purposes, useless. Somewhat recently,a lot of work was put in to fix >> that -- the index now packs the hash code only which made it >> competitive with btree and superior for larger keys. However, certain >> technical limitations like lack of WAL logging and uniqueness hold >> hash indexing back from being used like it really should be. In cases >> where I really *do* need hash indexing, I do it in userland. >> >> create table foo >> ( >> a_long_field text; >> ); >> create index on foo(hash(a_long_field)); >> >> select * from foo where hash(a_long_field) = hash(some_value) and >> a_long_field = some_value; >> >> This technique works fine -- the main disadvantage is that enforcing >> uniqueness is a PITA but since the standard index doesn't support it >> either it's no great loss. I also have the option of getting >> 'uniqueness' and being able to skip the equality operation if I >> sacrifice some performance and choose a strong digest. Until the hash >> index issues are worked out, I submit that this remains the go-to >> method to do this. > Is this approach (storing the hash code in a btree) really faster than > a regular btree index on "a_long_field"? And if so, for which kind of > data and load? Actually sometimes the field in [potentially] so long, you can't use regular b-tree because it won't fit in the page. Say, it is "text" type. If you will create regular index, you will actually limit column value size to few KB. I am using md5(text) indexes in this case coupled with rather ugly queries (see above). Native support would be nice. Best regards, Vitalii Tymchyshyn.
Robert Klemme <shortcutter@googlemail.com> writes: > I still haven't seen a solution to locking when a hash table needs > resizing. All hashing algorithms I can think of at the moment would > require a lock on the whole beast during the resize which makes this > type of index impractical for certain loads (heavy updating). That seems rather drastically overstated. The existing hash index code only needs to hold an index-scope lock for a short interval while it updates the bucket mapping information after a bucket split. All other locks are per-bucket or per-page. The conflicting share-lockers of the index-wide lock also only need to hold it for a short time, not for their whole indexscans. So that doesn't seem to me to be materially worse than the locking situation for a btree, where we also sometimes need exclusive lock on the btree root page, thus blocking incoming indexscans for a short time. regards, tom lane
Re: Hash index use presently(?) discouraged since 2005: revive or bury it?
От
Vitalii Tymchyshyn
Дата:
19.09.11 18:19, Robert Klemme написав(ла): > > I still haven't seen a solution to locking when a hash table needs > resizing. All hashing algorithms I can think of at the moment would > require a lock on the whole beast during the resize which makes this > type of index impractical for certain loads (heavy updating). Sorry for the second reply, I should have not start writing until I've read all your post. Anyway. Do you need read lock? I'd say readers could use "old" copy of hash table up until the moment new bigger copy is ready. This will simply look like the update is not started yet, which AFAIK is OK for MVCC. Yep, all the writers will wait. Another option could be to start background build of larger hash - for some time your performance will be degraded since you are writing to two indexes instead of one plus second one is rebuilding, but I'd say low latency solution is possible here. One more: I don't see actually why can't you have a "rolling" expand of hash table. I will try to describe it correct me if I am wrong: 1) The algorithm I am talking about will take "n" bits from hash code to for hash table. So, during expansion it will double number of baskets. 2) Say, we are going from 2^n = n1 to 2^(n+1) = n2 = n1 * 2 baskets. Each new pair of baskets will take data from single source basket depending on the value of new hash bit used. E.g. if n were 2, we've had 4 baskets and new table will have 8 baskets. Everything from old basket #1 will go into new baskets #2 and #3 depending on hash value. 3) So, we can have a counter on number of baskets processed. Any operation on any lower numbered basket will go to "new set". Any operation on any higher numbered basket will go to "old set". Any operation on currently converting basket will block until conversion is done. P.S. Sorry for a lot of possibly dumb thoughts, I don't know why I've got such a though stream on this topic :) Best regards, Vitalii Tymchyshyn.
On Mon, Sep 19, 2011 at 12:54 PM, Vitalii Tymchyshyn <tivv00@gmail.com> wrote: > 19.09.11 18:19, Robert Klemme написав(ла): >> >> I still haven't seen a solution to locking when a hash table needs >> resizing. All hashing algorithms I can think of at the moment would >> require a lock on the whole beast during the resize which makes this >> type of index impractical for certain loads (heavy updating). > > Sorry for the second reply, I should have not start writing until I've read > all your post. Anyway. > Do you need read lock? I'd say readers could use "old" copy of hash table up > until the moment new bigger copy is ready. This will simply look like the > update is not started yet, which AFAIK is OK for MVCC. > Yep, all the writers will wait. All this would get solved if there's no automatic hash index resizing. DBAs would have to recreate (possibly concurrently) the hash to make it bigger. Still, hash has lots of issues. I'm not sure how the hash is implemented in PG, but usually, for low collision rates pseudorandom walks are used to traverse collision chains. But pseudorandom collision chains mean random I/O which is awful for a DB. Those techniques have not been designed to work with secondary memory. So, they would have to be adapted to working with secondary memory, and that means a lot of R&D. It's not impossible, it's just a lot of work. I subscribe to the idea that, *in the meanwhile*, without scrapping the hash index and in parallel to improving it, an option for transparently-hashed btrees would be valuable.
On Mon, Sep 19, 2011 at 8:19 AM, Robert Klemme <shortcutter@googlemail.com> wrote: > On Mon, Sep 19, 2011 at 4:04 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > >> The other way to go of course is to try and fix up the existing hash >> index code -- add wal logging, etc. In theory, a customized hash >> structure should be able to beat btree all day long which argues to >> continue in this direction. > > I still haven't seen a solution to locking when a hash table needs > resizing. All hashing algorithms I can think of at the moment would > require a lock on the whole beast during the resize which makes this > type of index impractical for certain loads (heavy updating). The current implementation doesn't EX lock the whole beast during resizing, except a brief one at the beginning (and maybe at the end?) of the split. It does EX lock the entire bucket being split for the duration of the split, though. The main problem that I see is that due to the potential for deadlocks, the locks involved have to be heavy-weight. Which means the shared locks which non-splitting processes have to use to block against those EX locks have to be heavy-weight too, and getting even shared heavy-weight locks means exclusive light-weight locks, which means contention. One way out would be to have a special process (probably vacuum) do all the resizing/splitting, rather than having regular backends doing it. It should be possible to make this dead-lock free. Another would be to make hash indexes only support bit-map scans. Then the scanning hash-code would never have a need to pass control back to the executor while still holding a bucket lock. Another would be to make the current position of the hash index scan be "refindable" if a split should occur during the scan, so that you don't need to inhibit splits during a scan of the same bucket. This would probably be easy if there were no overflow pages. But the overflow pages get shuffled in with each other and with the main bucket page during a split. It would take quite some gymnastics to get around that. Cheers, Jeff
On Mon, Sep 19, 2011 at 10:19 AM, Robert Klemme <shortcutter@googlemail.com> wrote: > On Mon, Sep 19, 2011 at 4:04 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Sun, Sep 18, 2011 at 9:59 AM, Stefan Keller <sfkeller@gmail.com> wrote: >>> Merlin and Jeff, >>> >>> General remark again:It's hard for me to imagine that btree is >>> superior for all the issues mentioned before. I still believe in hash >>> index for primary keys and certain unique constraints where you need >>> equality search and don't need ordering or range search. >> >> It is -- but please understand I'm talking about int32 tree vs hash. >> Hashing as a technique of course is absolutely going to cream btree >> for all kinds of data because of the advantages of working with >> decomposed data -- we are all taught that in comp-sci 101 :-). The >> debate here is not about the advantages of hashing, but the specific >> implementation of the hash index used. >> >> Postgres's hash index implementation used to be pretty horrible -- it >> stored the pre-hashed datum in the index which, while making it easier >> to do certain things, made it horribly slow, and, for all intents and >> purposes, useless. Somewhat recently,a lot of work was put in to fix >> that -- the index now packs the hash code only which made it >> competitive with btree and superior for larger keys. However, certain >> technical limitations like lack of WAL logging and uniqueness hold >> hash indexing back from being used like it really should be. In cases >> where I really *do* need hash indexing, I do it in userland. >> >> create table foo >> ( >> a_long_field text; >> ); >> create index on foo(hash(a_long_field)); >> >> select * from foo where hash(a_long_field) = hash(some_value) and >> a_long_field = some_value; >> >> This technique works fine -- the main disadvantage is that enforcing >> uniqueness is a PITA but since the standard index doesn't support it >> either it's no great loss. I also have the option of getting >> 'uniqueness' and being able to skip the equality operation if I >> sacrifice some performance and choose a strong digest. Until the hash >> index issues are worked out, I submit that this remains the go-to >> method to do this. > > Is this approach (storing the hash code in a btree) really faster than > a regular btree index on "a_long_field"? And if so, for which kind of > data and load? > >> Now, my point here is that I've noticed that even with the latest >> optimizations btree seems to still be superior to the hash indexing by >> most metrics, so that: >> create table foo >> ( >> an_int_field int; >> a_long_field text; >> ); >> >> create index on foo(an_int_field); >> create index on foo using hash(a_long_field); >> >> On performance grounds alone, the btree index seems to be (from my >> very limited testing) a better bet. So I'm conjecturing that the >> current hash implementation should be replaced with a formalization of >> the userland technique shown above -- when you request a hash index, >> the database will silently hash your field and weave it into a btree. >> It's a hybrid: a hashed btree. > > I'd rather call it a "btreefied hash" because you are storing a hash > but in a btree structure. :-) But that's a detail. What I find > worrying is that then there is a certain level of obscuring the real > nature since "create index ... using hash" is not exactly creating a > hash table. > >> To truly demonstrate if the technique >> was effective though, it would have to be coded up -- it's only fair >> to compare if the btree based hash is also double checking the value >> in the heap which the standard hash index must do. > > Right. so, i was curious, and decided to do some more performance testing. I created a table like this: create table foo as select r, r::text || 'acbdefghijklmnop' as b from generate_series(1,10000000) r; create index on foo(r); create index on foo using hash(b); to simulate the btree/hash hybrid, I cut a pgbench file like so (btree.sql): \setrandom x 1 100000 select * from foo where r = :x and b=:x::text || 'acbdefghijklmnop' and this: for the standard hash (hash.sql): \setrandom x 1 100000 select * from foo where b=:x::text || 'acbdefghijklmnop' pgbench -n -c2 -T 10 -f hash.sql etc On my test machine, hybrid hash eeks out a slight win on in-cache tests: merlin@mmoncure-ubuntu:~$ pgbench -n -c2 -T 100 -f btree.sql -p 5490 tps = 3250.793656 (excluding connections establishing) vs merlin@mmoncure-ubuntu:~$ pgbench -n -c2 -T 100 -f hash.sql -p 5490 tps = 3081.730400 (excluding connections establishing) To make the test into i/o bound, I change the setrandom from 100000 to 10000000; this produced some unexpected results. The hash index is pulling about double the tps (~80 vs ~ 40) over the hybrid version. Well, unless my methodology is wrong, it's unfair to claim btree is beating hash in 'all cases'. hm. merlin
On Mon, Sep 19, 2011 at 3:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > To make the test into i/o bound, I change the setrandom from 100000 to > 10000000; this produced some unexpected results. The hash index is > pulling about double the tps (~80 vs ~ 40) over the hybrid version. > Well, unless my methodology is wrong, it's unfair to claim btree is > beating hash in 'all cases'. hm. Is this only selects? Hash performs badly with updates, IIRC. I haven't tried in a long while, though.
On Mon, Sep 19, 2011 at 1:53 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Mon, Sep 19, 2011 at 3:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> To make the test into i/o bound, I change the setrandom from 100000 to >> 10000000; this produced some unexpected results. The hash index is >> pulling about double the tps (~80 vs ~ 40) over the hybrid version. >> Well, unless my methodology is wrong, it's unfair to claim btree is >> beating hash in 'all cases'. hm. > > Is this only selects? > Hash performs badly with updates, IIRC. > I haven't tried in a long while, though. just selects. update test is also very interesting -- the only test I did for for updates is 'update foo set x=x+1' which was a win for btree (20-30% faster typically). perhaps this isn't algorithm induced though -- lack of wal logging could actually hurt time to commit because it deserializes i/o. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > just selects. update test is also very interesting -- the only test I > did for for updates is 'update foo set x=x+1' which was a win for > btree (20-30% faster typically). perhaps this isn't algorithm induced > though -- lack of wal logging could actually hurt time to commit > because it deserializes i/o. In 9.1+, you could remove WAL from the comparison by doing the tests on an unlogged table. regards, tom lane