Обсуждение: Planner create a slow plan without an available index
Hi All, I got a weird problem with the planner which cause my queries to take ages... ill try to explain it shortly and summarized... :) I got the following table (which got 1.2 million rows): Table "public.items" Column | Type | Modifiers ----------------------------+--------------+--------------------- items_id | text | not null price | numeric(8,2) | not null left | integer | right | integer | Indexes: "items_items_id_key" UNIQUE, btree (items_id) "items_left" btree (left) "items_left_right" btree (left, right) From that table I created the next table in order to save "ORDER BY price" at the queries: bh.com=# CREATE TABLE items_price AS SELECT * FROM items ORDER BY price; After the creation of the table I created indexes which are exactly the same as the items table has (the source table). Later I ran on both tables "VACUUM FULL ANALYZE". Now here start the weird stuff.... bh.com=# EXPLAIN ANALYZE SELECT * FROM items WHERE left>=(SELECT left FROM category WHERE category_id=821) AND right<=(SELECT right FROM category WHERE category_id=821) OFFSET 24 LIMIT 13; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=58.27..86.55 rows=13 width=619) (actual time=0.811..130.993 rows=9 loops=1) InitPlan -> Index Scan using category_pkey on category (cost=0.00..3.03 rows=1 width=4) (actual time=0.118..0.124 rows=1 loops=1) Index Cond: (category_id = 821) -> Index Scan using category_pkey on category (cost=0.00..3.03 rows=1 width=4) (actual time=0.012..0.016 rows=1 loops=1) Index Cond: (category_id = 821) -> Index Scan using items_left_right on items (cost=0.00..294897.72 rows=135553 width=619) (actual time=0.314..130.815 rows=33 loops=1) Index Cond: ((left >= $0) AND (right <= $1)) Total runtime: 131.140 ms (9 rows) bh.com=# ANALYZE items; ANALYZE bh.com=# EXPLAIN ANALYZE SELECT * FROM items WHERE left>=(SELECT left FROM category WHERE category_id=821) AND right<=(SELECT right FROM category WHERE category_id=821) OFFSET 24 LIMIT 13; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=57.11..84.77 rows=13 width=626) (actual time=45.512..145316.423 rows=9 loops=1) InitPlan -> Index Scan using category_pkey on category (cost=0.00..3.03 rows=1 width=4) (actual time=0.185..0.191 rows=1 loops=1) Index Cond: (category_id = 821) -> Index Scan using category_pkey on category (cost=0.00..3.03 rows=1 width=4) (actual time=0.026..0.032 rows=1 loops=1) Index Cond: (category_id = 821) -> Index Scan using items_left on items (cost=0.00..293408.52 rows=137924 width=626) (actual time=45.008..145316.246 rows=33 loops=1) Index Cond: (left >= $0) Filter: (right <= $1) Total runtime: 145316.590 ms (10 rows) The "ANALYZE items" actually made the planner work without the INDEX and by that the query became a lot slower! after running VACUUM ANALYZE on the items table I receive good results back again. Now I do know the diffrence between the 2 actions (VACUUM ANALYZE vs. ANALYZE) but whats bug me is that when I do the exact same operations on items_price (which is the same table exactly with the same indexes just ordered diffrently) I receive a slow result no matter what I do! I tried to mess with "ALTER TABLE items_price ALTER right SET STATISTICS ;" (and also on left) with diffrent values up to even 1000 but that didnt help a bit (I did ran VACUUM ANALYZE after each change). I'm quite clueless and also quite in a hurry to finish this project so any help or a piece of clue will be welcomed gladly! Thanks alot in advance (even only for reading what I wrote :P), Ben-Nes Yonatan Canaan Surfing ltd. http://www.canaan.net.il
Ben-Nes Yonatan <da@canaan.co.il> writes: > Indexes: > "items_items_id_key" UNIQUE, btree (items_id) > "items_left" btree (left) > "items_left_right" btree (left, right) You could get rid of the items_left index --- it's redundant with the first column of the combined index anyway. > bh.com=# EXPLAIN ANALYZE SELECT * FROM items WHERE left>=(SELECT left > FROM category WHERE category_id=821) AND right<=(SELECT right FROM > category WHERE category_id=821) OFFSET 24 LIMIT 13; Doing OFFSET/LIMIT without an ORDER BY is just asking for trouble. If you were to specify "ORDER BY left, right" that would probably convince the planner to use the index you want. However ... this query is basically going to suck with any btree index, because btree can't usefully do range checks on two separate variables. There's an exactly similar problem being discussed over in pgsql-novice: http://archives.postgresql.org/pgsql-novice/2005-08/msg00243.php regards, tom lane
Tom Lane wrote: > Ben-Nes Yonatan <da@canaan.co.il> writes: > >>Indexes: >> "items_items_id_key" UNIQUE, btree (items_id) >> "items_left" btree (left) >> "items_left_right" btree (left, right) > > > You could get rid of the items_left index --- it's redundant with the > first column of the combined index anyway. > > >>bh.com=# EXPLAIN ANALYZE SELECT * FROM items WHERE left>=(SELECT left >>FROM category WHERE category_id=821) AND right<=(SELECT right FROM >>category WHERE category_id=821) OFFSET 24 LIMIT 13; > > > Doing OFFSET/LIMIT without an ORDER BY is just asking for trouble. > If you were to specify "ORDER BY left, right" that would probably > convince the planner to use the index you want. > > However ... this query is basically going to suck with any btree index, > because btree can't usefully do range checks on two separate variables. > There's an exactly similar problem being discussed over in pgsql-novice: > http://archives.postgresql.org/pgsql-novice/2005-08/msg00243.php > > regards, tom lane First of all thanks I did succed to use the index that way and to receive less then 80ms responds, but if imporvement is possible I would like to do it. If btree index is not suitable for this query then which index is? as far as I understand the rtree index doesnt support range checks and the hash index is not recommended by almost everyone (including the manual) so the only one left is the gist, is that the most suitable index for this query? if so can you give me a link as to where I can learn how to use such an index efficently? (by the way the only link that worked at the postgresql manual "Chapter 48. GiST Indexes" is the one which direct to "the University of California at Berkeley's GiST Indexing Project web site" the other 2 links direct to 404 pages and I guess that they should be removed). Thanks alot, Ben-Nes Yonatan
On Tue, Aug 30, 2005 at 11:25:26AM +0200, Ben-Nes Yonatan wrote: > Tom Lane wrote: > >However ... this query is basically going to suck with any btree index, > >because btree can't usefully do range checks on two separate variables. > >There's an exactly similar problem being discussed over in pgsql-novice: > >http://archives.postgresql.org/pgsql-novice/2005-08/msg00243.php > > If btree index is not suitable for this query then which index is? as > far as I understand the rtree index doesnt support range checks and the > hash index is not recommended by almost everyone (including the manual) > so the only one left is the gist, is that the most suitable index for > this query? if so can you give me a link as to where I can learn how to > use such an index efficently? (by the way the only link that worked at > the postgresql manual "Chapter 48. GiST Indexes" is the one which direct > to "the University of California at Berkeley's GiST Indexing Project web > site" the other 2 links direct to 404 pages and I guess that they should > be removed). Basically, no index is really setup for the query as you wrote it. Indexes generally improve performance by taking advantage of order. You have two constants (the two subqueries) and two variables (left and right). Andyou applying range range (not equality) to each one. There is no way to order an index that would give the answer you want. rtree works on multidimesional (geometric) data. It can do range tests (is object A to the left of object B) but it's only applicable if your conditions can be interpreted that way. GiST is for creating custom index types, hardly likely to be useful in your case. I can't tell from your query (and PostgreSQL certainly can't) but are there other constraints such left <= right or something similar for the constants. If so, maybe adding that will help PostgreSQL optimise your query. Maybe indexing on (left+right) might work for you but it all depends on the structure of your data. If you want more help, you're going to need to provide information on your database including what left, right and items actually are. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Вложения
On Tue, Aug 30, 2005 at 11:25:26 +0200, Ben-Nes Yonatan <da@canaan.co.il> wrote: > > If btree index is not suitable for this query then which index is? as > far as I understand the rtree index doesnt support range checks and the > hash index is not recommended by almost everyone (including the manual) > so the only one left is the gist, is that the most suitable index for > this query? if so can you give me a link as to where I can learn how to > use such an index efficently? (by the way the only link that worked at > the postgresql manual "Chapter 48. GiST Indexes" is the one which direct > to "the University of California at Berkeley's GiST Indexing Project web > site" the other 2 links direct to 404 pages and I guess that they should > be removed). rtree indexes allow you to quickly check for containment. Range checking is one dimensional containment.
Martijn van Oosterhout <kleptog@svana.org> writes: > rtree works on multidimesional (geometric) data. It can do range tests > (is object A to the left of object B) but it's only applicable if your > conditions can be interpreted that way. > GiST is for creating custom index types, hardly likely to be useful > in your case. Actually either rtree or GIST should be able to do something useful with this, since it's basically a 1-D overlap query. The main problem with GIST is to find a suitable opclass, since there aren't any in the core system. Possibly contrib/seg could be used. regards, tom lane
Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > >>rtree works on multidimesional (geometric) data. It can do range tests >>(is object A to the left of object B) but it's only applicable if your >>conditions can be interpreted that way. > > >>GiST is for creating custom index types, hardly likely to be useful >>in your case. > > > Actually either rtree or GIST should be able to do something useful with > this, since it's basically a 1-D overlap query. The main problem with > GIST is to find a suitable opclass, since there aren't any in the core > system. Possibly contrib/seg could be used. > > regards, tom lane Ok first of all thanks guys as always for your help, and I will try to use rtree to improve my query (hopefuly ill be able to come back and say that it worked :)). I got another question which is not connected and probably its just me being paranoid late at night but still... :) at chapter "21.1.3. Preventing transaction ID wraparound failures" of the postgresql manual its written the following info: "Prior to PostgreSQL 7.2, the only defense against XID wraparound was to re-initdb at least every 4 billion transactions. This of course was not very satisfactory for high-traffic sites, so a better solution has been devised. The new approach allows a server to remain up indefinitely, without initdb or any sort of restart. The price is this maintenance requirement: every table in the database must be vacuumed at least once every billion transactions." My postgresql version is 8.01 (I should have mentioned that at start no? :)) Now again im probably just paranoid but when I'm starting a transaction and in it im making more then 4 billions diffrent queries (select,insert,update,truncate...) and then im closing it, its counted as only one transaction right? (should I duck to avoid the manual? ;)) As always thanks alot! Ben-Nes Yonatan
On Wed, Aug 31, 2005 at 01:27:30AM +0200, Ben-Nes Yonatan wrote: > Now again im probably just paranoid but when I'm starting a transaction > and in it im making more then 4 billions diffrent queries > (select,insert,update,truncate...) and then im closing it, its counted > as only one transaction right? (should I duck to avoid the manual? ;)) Yes, one transaction. You cannot do more than 4 billion commands -- the limit is 2^32 anyway. -- Alvaro Herrera <alvherre[]alvh.no-ip.org> Architect, www.EnterpriseDB.com "Los románticos son seres que mueren de deseos de vida"
On Wed, Aug 31, 2005 at 01:27:30 +0200, Ben-Nes Yonatan <da@canaan.co.il> wrote: > > Now again im probably just paranoid but when I'm starting a transaction > and in it im making more then 4 billions diffrent queries > (select,insert,update,truncate...) and then im closing it, its counted > as only one transaction right? (should I duck to avoid the manual? ;)) I believe there is a limit on the number of queries in a transaction of 2 or 4 billion (though this may be just in functions). Ignoring subtransactions, all these queries count as just one transaction. I am not sure how subtransactions are counted.
On Tue, Aug 30, 2005 at 10:39:57PM -0500, Bruno Wolff III wrote: > On Wed, Aug 31, 2005 at 01:27:30 +0200, > Ben-Nes Yonatan <da@canaan.co.il> wrote: > > > > Now again im probably just paranoid but when I'm starting a transaction > > and in it im making more then 4 billions diffrent queries > > (select,insert,update,truncate...) and then im closing it, its counted > > as only one transaction right? (should I duck to avoid the manual? ;)) > > I believe there is a limit on the number of queries in a transaction of > 2 or 4 billion (though this may be just in functions). > > Ignoring subtransactions, all these queries count as just one transaction. > I am not sure how subtransactions are counted. If the subtransaction writes at least a tuple, it counts as another transaction. Else it doesn't count. -- Alvaro Herrera <alvherre[]alvh.no-ip.org> Architect, www.EnterpriseDB.com "I'm always right, but sometimes I'm more right than other times." (Linus Torvalds)
Alvaro Herrera wrote: > On Tue, Aug 30, 2005 at 10:39:57PM -0500, Bruno Wolff III wrote: > >>On Wed, Aug 31, 2005 at 01:27:30 +0200, >> Ben-Nes Yonatan <da@canaan.co.il> wrote: >> >>>Now again im probably just paranoid but when I'm starting a transaction >>>and in it im making more then 4 billions diffrent queries >>>(select,insert,update,truncate...) and then im closing it, its counted >>>as only one transaction right? (should I duck to avoid the manual? ;)) >> >>I believe there is a limit on the number of queries in a transaction of >>2 or 4 billion (though this may be just in functions). >> >>Ignoring subtransactions, all these queries count as just one transaction. >>I am not sure how subtransactions are counted. > > > If the subtransaction writes at least a tuple, it counts as another > transaction. Else it doesn't count. > Oh crap I fear that now im in serious troubles.... Where can I read about this limitation? and beside that what if I count the number of queries and every 900,000 or so I create a subtransaction and continue my process with it, will that work or I'm just trying to be a smart ass with the db? As always thanks alot, Ben-Nes Yonatan
On Wed, Aug 31, 2005 at 09:19:05AM +0200, Ben-Nes Yonatan wrote: > >If the subtransaction writes at least a tuple, it counts as another > >transaction. Else it doesn't count. > > > > Oh crap I fear that now im in serious troubles.... > Where can I read about this limitation? and beside that what if I count > the number of queries and every 900,000 or so I create a subtransaction > and continue my process with it, will that work or I'm just trying to be > a smart ass with the db? Um, 1 billion transactions is 1 thousand million. So 900,000 inserts/updates are not even one tenth of one percent of the limit for one transaction. Are you really approaching a billion inserts/updates per transaction? That's alot of diskspace being used... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Вложения
Martijn van Oosterhout wrote: > On Wed, Aug 31, 2005 at 09:19:05AM +0200, Ben-Nes Yonatan wrote: > >>>If the subtransaction writes at least a tuple, it counts as another >>>transaction. Else it doesn't count. >>> >> >>Oh crap I fear that now im in serious troubles.... >>Where can I read about this limitation? and beside that what if I count >>the number of queries and every 900,000 or so I create a subtransaction >>and continue my process with it, will that work or I'm just trying to be >>a smart ass with the db? > > > Um, 1 billion transactions is 1 thousand million. So 900,000 > inserts/updates are not even one tenth of one percent of the limit for > one transaction. > > Are you really approaching a billion inserts/updates per transaction? > That's alot of diskspace being used... > > Have a nice day, No apprantly I just lack a decent sleep.... I think that ill stop ask you guys questions before you will decide to get your clubs out... :P In other words I was mistaken and thought about a million and not a billion :) With hopes that this is the end of my bugging :) Thanks alot, Ben-Nes Yonatan