Обсуждение: encourging bitmap AND
hello -- i have a schema similar to the following create table foo ( id integer not null, val integer not null, s integer not null, e integer not null ); create index foo_s_idx on foo using btree (s); create index foo_e_idx on foo using btree (e); i want to do queries like select * from foo where 150 between s and e; this usually gives me index or bitmap scans on one of the indices, plus a filter for the other condition. this is not terriblyefficient as the table is large (billions of rows), but there should only be a few thousand rows with s < k < e forany k. the data is id, value, interval (s, e), with s < e, and e - s is "small". i am experimenting and would like to see the effect of using a bitmap index "AND" scan using both indices. as far as i cantell, there are no easy ways to force or encourage this -- there are no switches like enable_seqscan and such which forcethe use of bitmap AND, and i don't know how to tell the query planner about the structure of the data (i don't thinkthis is adequately captured in any of the statistics it generates, i would need multi-column statistics.) any clues? best regards, ben
Ben <midfield@gmail.com> writes: > i have a schema similar to the following > create index foo_s_idx on foo using btree (s); > create index foo_e_idx on foo using btree (e); > i want to do queries like > select * from foo where 150 between s and e; That index structure is really entirely unsuited to what you want to do, so it's not surprising that the planner isn't impressed with the idea of a bitmap AND. I'd suggest setting up something involving a gist index over an interval-ish datatype. The PERIOD datatype that Jeff Davis is fooling with would do what you want --- see http://pgfoundry.org/projects/temporal http://thoughts.j-davis.com/2009/11/08/temporal-keys-part-2/ If you don't want any add-on parts involved, you could fake it by using a box or possibly lseg. regards, tom lane
On Dec 23, 2010, at 12:52 PM, Tom Lane wrote: > Ben <midfield@gmail.com> writes: >> i have a schema similar to the following > >> create index foo_s_idx on foo using btree (s); >> create index foo_e_idx on foo using btree (e); > >> i want to do queries like > >> select * from foo where 150 between s and e; > > That index structure is really entirely unsuited to what you want to do, > so it's not surprising that the planner isn't impressed with the idea of > a bitmap AND. > > I'd suggest setting up something involving a gist index over an > interval-ish datatype. The PERIOD datatype that Jeff Davis is fooling > with would do what you want --- see > http://pgfoundry.org/projects/temporal > http://thoughts.j-davis.com/2009/11/08/temporal-keys-part-2/ > If you don't want any add-on parts involved, you could fake it by using > a box or possibly lseg. Thanks for the quick response. I've already played a lot with the PERIOD datatype and GIST, it works pretty good, but Ifound that the lack of statistics and real selectivity functions hurt me. I was experimenting with the two column setupas an alternative, but if you think this is a dead end I'll look elsewhere. Best regards, Ben
On Thu, Dec 23, 2010 at 22:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ben <midfield@gmail.com> writes: >> i have a schema similar to the following > >> create index foo_s_idx on foo using btree (s); >> create index foo_e_idx on foo using btree (e); > >> i want to do queries like > >> select * from foo where 150 between s and e; > > That index structure is really entirely unsuited to what you want to do, > so it's not surprising that the planner isn't impressed with the idea of > a bitmap AND. Why is it unsuited for this query? It expands to (150 < s AND 150 > e) which should work nicely with bitmap AND as far as I can tell. Regards, Marti
Marti Raudsepp <marti@juffo.org> writes: > On Thu, Dec 23, 2010 at 22:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That index structure is really entirely unsuited to what you want to do, >> so it's not surprising that the planner isn't impressed with the idea of >> a bitmap AND. > Why is it unsuited for this query? It expands to (150 < s AND 150 > e) > which should work nicely with bitmap AND as far as I can tell. Well, maybe for small values of "nicely". If you do it like that, then on average each indexscan will scan about half of its index and return a bitmap representing about half the rows in the table. That's an expensive indexscan, and an expensive bitmap-AND operation, even if the final number of rows out of the AND is small. Plus you're at serious risk that the bitmaps will become lossy, which degrades the performance of the final bitmap heapscan. If you're doing interval queries enough to worry about having an index for them, you really want an indexing structure that is designed to do interval queries efficiently. regards, tom lane
On Dec 26, 2010, at 11:24 AM, Tom Lane wrote: > If you're doing interval queries enough to worry about having an index > for them, you really want an indexing structure that is designed to do > interval queries efficiently. BTW, one way to accomplish that is to transform your data into geometric shapes and then index them accordingly. Prior tothe work Jeff Davis has done on time intervals it was common to treat time as points and ranges as lines or boxes. Whilewe no longer need to play those games for time, I don't think there's an equivalent for non-time datatypes. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net