Re: hash index concurrency
От | Jeff Janes |
---|---|
Тема | Re: hash index concurrency |
Дата | |
Msg-id | CAMkU=1xUy52xnn3qt+Briqt_7M259QEqLQqRNeBkPjGv0a3-pQ@mail.gmail.com обсуждение исходный текст |
Ответ на | hash index concurrency (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: hash index concurrency
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
On Tue, May 29, 2012 at 5:19 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I ran a SELECT-only pgbench test today on the IBM POWER7 box with 64 > concurrent clients and got roughly 305,000 tps. Then, I created a > hash index on pgbench_accounts (aid), dropped the primary key, and > reran the test. I got roughly 104,000 tps. 'perf -g -e cs' suggested > lock contention in _hash_first(), so I whacked out the calls to > _hash_getlock(rel, 0, HASH_SHARE) and _hash_droplock(rel, 0, > HASH_SHARE). With that change, I got roughly 270,000 TPS. Taking a > little further, I then changed the definition of USELOCKING() to 0, > effectively removing all the heavyweight page locks. With that > change, I got 324,000 tps. Also, with this change, the CPU is fully > saturated - about 77% user time, 23% system time. Did the data fit in shared_buffers, or only in RAM? > > I don't immediately have a proposal to deal with this, but it seems > that if we want good hash index performance under high concurrency, we > need to work a bit harder. This was a hobby horse of mine a couple of years ago, but I never got much traction. The main question I have is, what do we even want hash indexes to be? NBTree is very good, has been extensively optimized, and extensively tested. If there is a niche left for hash indexes, what is it? Is it just very large keys which don't do well in BTrees, or something else? If we really want to embrace a niche other than very large keys, I would say rip out the dynamic bucket splitting altogether. Specify the number of buckets when the index is built, and if you later change your mind then use concurrent index to make a new larger one and then drop the old one. Not needing to deal with dynamic buffer splitting would make both highly concurrent access and Write-Ahead Logging far simpler to implement. (It seems to me that solving either concurrency or WAL is not going to get very far, they both need to be solved together--or at least the design of the first to be implemented needs to take into account the future implementation of the other). Barring that: 1) on each main bucket page, record what the level of split for that bucket is. That way you don't need to interlock the heavy locks between the meta page and the main bucket page. Rather, when you leave the meta page you remember not just what bucket you are heading for but also how far it has been split. Then when you get there, if the split level you find doesn't match what you remember, you have to go back to the meta page and try again. Indeed, this way you not only don't need to heavy-lock the meta page, you don't even need to visit it at all. Just store the look-up table in the relcache, and then you only need to reload it from the metapage when you realize it is out of date. But, you still need to heavy lock the bucket itself. But implementing this does require an on-disk change to the hash index format. 2) Only support bitmap scans and not ordinary tid scans (the way gin indexes already do). Now you never need to pass execution back out of the hash am while you are still in a bucket, so you don't need heavy weight locks at all. You would still need to interlock LWLocks as you move down the chain of overflow pages, but that should be much more concurrent. Cheers, Jeff
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Peter GeogheganДата:
Сообщение: Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)