Re: Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)
От | Bruce Momjian |
---|---|
Тема | Re: Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch) |
Дата | |
Msg-id | 200709260836.l8Q8aUd11470@momjian.us обсуждение исходный текст |
Ответ на | Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch) ("Simon Riggs" <simon@2ndquadrant.com>) |
Список | pgsql-hackers |
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Simon Riggs wrote: > On Sat, 2007-07-21 at 12:20 +0100, Simon Riggs wrote: > > > I'd like to help where I can if nobody else is currently doing this. I > > would focus initially on some analysis of the various use cases to give > > a better view on what we would need B-tree, clustered indexes and bitmap > > indexes to do for us. > > I've done some further analysis of bitmap indexes in preparation for a > comparison with clustered indexes (GIT), to help understand the use > cases for each. > > Overall, my conclusion is that BMI and GIT have separate use cases, > almost opposite use cases or at least orthogonal ones. I would > eventually like both. BMI optimises for high numbers of rows per value, > whilst GIT optimises for clustering of values. BMI is not useful at all > for PKs, whilst GIT is specifically designed to handle them. Both handle > INSERTs well, though GIT handles growing numbers of values easily, BMI > prefers to keep the distribution more constant. GIT needs HOT to > continue to operate effectively for long periods, whereas BMI doesn't > seem to handle UPDATEs well at all (but more testing required on that > one). > > --- > > Neither the latest bitmap index nor the latest GIT patch applied > cleanly. The bitmap patch was close, but GIT needs an update yet to > integrate Alexey's recent work. > > My test case was a table with 10 million rows, with columns with varying > numbers of unique values. So Ndistinct = 100 means 100,000 rows per > value. > > BITMAP INDEXES > > Ndistinct Best time Size in blocks > 1 10.6s 100 > 10 10.4s 102 > 100 11.7s 2002 > 1000 15.1s 6006 > 10000 19.8s 10046 > 100000 82.1s 100442 > 1000000 - >450000 > > Size exactly equivalent for both Integer and Text (same values). Build > time was similar also. > > The test for 1 million distinct values didn't return after over 4 CPU > minutes expended with the disk going crazy. After a number of minutes I > decided to cancel the index build. Multiple cancels didn't stop the > build, so after some more time I decided to kill it, which then crashed > the server. Automatic restart crashed as well with a "could not find > transaction id 0" error. Clearly some WAL-weirdness to investigate... > > Overall, I'd have to say that's quite enough for me to say bitmap is not > quite ready yet without clear health warnings. I had hopes... > > B-TREE INDEXES (Integers) > > Rows/value Best time Size in blocks > 10000000 49s 21899 > 1000000 49s 21899 > 100000 49s 21899 > 10000 47s 21899 > 1000 43s 21899 > 100 38s 21899 > 10 38s 21899 > 1 33s 21899 > > Build time for Integers shown. Build time for Text ~x5-6 times as long. > > Testing against equivalent b-tree builds, the fastest b-tree build I > could get was 33s on a unique integer index. So BMI build time is > certainly optimised for low numbers of distinct values, but doesn't have > any optimisation for when the BMI is built on a poor candidate column. > GIT does degrade down to a normal b-tree when clustering isn't > sufficient to give reduction in index size. > > The cross-over point was between 10^4 and 10^5 distinct values for both > size and build time; on that test around 100-1000 rows per value. So > BMIs are probably still useful with varying number of rows per value, > but overall high Ndistinct proves inefficient in both build time and > space allocation. This isn't such a surprise since we know that b-tree > build uses a sort-based plan whereas BMI uses a hash based plan; neither > will win all of the time, we know that from the executor. > > GIT works well even with unique indexes, since each grouped tuple covers > a range of values. I'll re-run the tests when I can to get timings. GIT > can compress typically down to 1-5% with clustered data, not quite as > good as bitmap's 0.5% best. > > GIT's design was to have an index that was tuned for clustered data, yet > degrades cleanly to a standard b-tree when conditions are not right. > This makes me think that a hybrid b-tree should be possible, even > desirable. When the data is clustered, use the grouping technique to > reduce he number of tuples stored and when the data is highly non-unique > use the bitmap technique to reduce numbers of tuples. Using both > techniques in the same index would offer even wider flexibility, since > we'd be able to cater for real-world data more easily. Both GIT and BMI > use bitmaps, just in different ways. > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
В списке pgsql-hackers по дате отправления: