Re: [GENERAL] Yet another btree gotcha
От | Mike Mascari |
---|---|
Тема | Re: [GENERAL] Yet another btree gotcha |
Дата | |
Msg-id | 19991006071831.4990.rocketmail@web2102.mail.yahoo.com обсуждение исходный текст |
Ответы |
Re: [GENERAL] Yet another btree gotcha
("Gene Selkov, Jr." <selkovjr@mcs.anl.gov>)
|
Список | pgsql-general |
I realize that this is somewhat off topic, but I have seen far better improvments in query speeds when using a multifield index instead of individual ones. So, for the given select below: SELECT km.km, su.su FROM km, su WHERE km.id = su.id AND km.rel = su.rel AND km.item = su.item AND su ~ '^ethanol'; multifield indexes should be created: CREATE INDEX k_km on km(id, rel, item); CREATE INDEX k_su on su(id, rel, item); and the query should be rewritten as: SELECT km.km, su.su FROM km, su WHERE (km.id,km.rel,km.item) = (su.id,su.rel,su.item) AND su ~ '^ethanol'; Perhaps the distribution of your data is more "normal" using multikey indexes (I hope). Note that if you have a multifield index, and you still submit your original query, the planner/optimizer doesn't appear smart enough to use the multikey index except for the first join condition, so to realize all of the benefits, you have to use the WHERE (t1.field1,t1.field2) = (t2.field1,t2.field2) construction instead of: WHERE t1.field1 = t2.field1 AND t1.field2 = t2.field2 On another note, looking through the backend sources reveals that, at one point, partial indices were once allowed: CREATE INDEX k_km on km(item) WHERE item <> 1; but the grammer has been removed and is now illegal (although the code to handle it still exists). Perhaps someday this feature will be restored. Hope the above helps some, Mike Mascari (mascarim@yahoo.com) --- "Gene Selkov, Jr." <selkovjr@mcs.anl.gov> wrote: > I am wondering whether it's normal to see dropping a > btree improve the > query which could never complete enough that it > completes in a blink? > > I realize the data I have here represent the worst > possible input to > btree, and I'm probably better off without any index > at all, but I > guess it is something that the optimizer or the > access method itself > should be able to decide. > > I am joining two tables, "km" and "su" on an int2 > attribute, "item". > Just take a look at the value histograms for item: > > emp=> SELECT item, count (*) AS count FROM km GROUP > BY item; > item|count > ----+----- > 1|31262 > 2| 110 > 3| 3 > 4| 1 > (4 rows) > > emp=> SELECT item, count (*) AS count FROM su GROUP > BY item; > item|count > ----+----- > 1|94108 > 2| 1697 > 3| 773 > 4| 482 > 5| 237 > 6| 146 > 7| 105 > 8| 68 > 9| 41 > 10| 29 > 11| 22 > 12| 15 > 13| 13 > 14| 8 > 15| 7 > 16| 6 > 17| 5 > 18| 4 > 19| 4 > 20| 4 > 21| 4 > 22| 3 > 23| 3 > 24| 3 > 25| 1 > 26| 1 > 27| 1 > 28| 1 > 29| 1 > 30| 1 > 31| 1 > > As a default rule, I used to create the btree > indices for all integer > types, regardless of their values. Not anymore. It > took me quite a > while to figure that the following query did not > work because of the > faulty btree index on "item" (other joined > attributes are char()): > > SELECT km.km, su.su > FROM km, su > WHERE km.id = su.id > AND km.rel = su.rel > AND km.item = su.item > AND su ~ '^ethanol'; > > Can the btree or any other AM be smart enough and > bail out from CREATE > INDEX saying, "your data isn't worth indexing"? > > --Gene ===== __________________________________________________ Do You Yahoo!? Bid and sell for free at http://auctions.yahoo.com
В списке pgsql-general по дате отправления: