Обсуждение: Slow query: bitmap scan troubles
Hi guys (and girls) I've been banging my head over this for a few days now so if any of you kind souls could take a minute to take a look at this I would be eternally grateful. I have a pretty straightforward query that is very slow by default, and about 70 times faster when I set enable_bitmapscan=off. I would like to convince the planner to use my lovely indexes. The scenario is this; I have two tables, trade and position_effect. A trade is a deal we do with somebody to exchange something for something else. It has a time it was done, and is associated with a particular book for accounting purposes. A position effect records changes to our position (e.g. how much we have) of an particular asset. One trade can many position effects (usually only 1,2 or 3) For example, I do a trade of USD/GBP and I get two position effects, +1000 GBP and -1200USD SCHEMA: ------- The actual schema is a bit more complicated but I will put the important parts here (if you think it important, the full schema for the two tables is here: http://pastebin.com/6Y52aDFL): CREATE TABLE trade ( id bigserial NOT NULL, time_executed timestamp with time zone NOT NULL, id_book integer NOT NULL, CONSTRAINT cons_trade_primary_key PRIMARY KEY (id), ) CREATE INDEX idx_trade_id_book ON trade USING btree (id_book, time_executed, id); CREATE TABLE position_effect ( id bigserial NOT NULL, id_trade bigint NOT NULL, id_asset integer NOT NULL, quantity double precision NOT NULL, CONSTRAINT cons_pe_primary_key PRIMARY KEY (id_trade, id_asset), ) SETUP: ------ These tables are relatively large (~100 million rows in position effect). The box is a pretty beastly affair with 512Mb of ram and 4x10 2Ghz cores. The postgres configuration is here: http://pastebin.com/48uyiak7 I am using a 64bit postgresql 9.2.1, hand compiled on a RedHat 6.2 box. QUERY: ------ What I want to do is sum all of the position effects, for a particular asset while joined to the trade table to filter for the time it was executed and the book it was traded into: SELECT sum(position_effect.quantity) FROM trade, position_effect WHERE trade.id = position_effect.id_trade AND position_effect.id_asset = 1837 AND trade.time_executed >= '2012-10-28 00:00:00' AND trade.id_book = 41 In this case there are only 11 rows that need to be summed. If I just let postgres do its thing, that query takes 5000ms (Which when multiplied over many books and assets gets very slow). I think this is because it is bitmapping the whole position_effect table which is very large. If I disable bitmap scans: set enable_bitmapscan = off; The query takes 43ms, and properly uses the indexes I have set up. Slow version with bitmapscan enabled: http://explain.depesz.com/s/6I7 Fast version with bitmapscan disabled: http://explain.depesz.com/s/4MWG
Bad form to reply to yourself I know but just check-reading that for the third time I noticed two mistakes - The box has 128Gb of ram, not 512Mb - There is an additional constraint on the position_effect table (though I don't think it matters for this discussion): CONSTRAINT cons_pe_trade FOREIGN KEY (id_trade) REFERENCES trade (id) Sorry to clog your inboxes further! Regards, Philip -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of postgresql@foo.me.uk Sent: 04 December 2012 15:07 To: pgsql-performance@postgresql.org Subject: [PERFORM] Slow query: bitmap scan troubles Hi guys (and girls) I've been banging my head over this for a few days now so if any of you kind souls could take a minute to take a look at this I would be eternally grateful. I have a pretty straightforward query that is very slow by default, and about 70 times faster when I set enable_bitmapscan=off. I would like to convince the planner to use my lovely indexes. The scenario is this; I have two tables, trade and position_effect. A trade is a deal we do with somebody to exchange something for something else. It has a time it was done, and is associated with a particular book for accounting purposes. A position effect records changes to our position (e.g. how much we have) of an particular asset. One trade can many position effects (usually only 1,2 or 3) For example, I do a trade of USD/GBP and I get two position effects, +1000 GBP and -1200USD SCHEMA: ------- The actual schema is a bit more complicated but I will put the important parts here (if you think it important, the full schema for the two tables is here: http://pastebin.com/6Y52aDFL): CREATE TABLE trade ( id bigserial NOT NULL, time_executed timestamp with time zone NOT NULL, id_book integer NOT NULL, CONSTRAINT cons_trade_primary_key PRIMARY KEY (id), ) CREATE INDEX idx_trade_id_book ON trade USING btree (id_book, time_executed, id); CREATE TABLE position_effect ( id bigserial NOT NULL, id_trade bigint NOT NULL, id_asset integer NOT NULL, quantity double precision NOT NULL, CONSTRAINT cons_pe_primary_key PRIMARY KEY (id_trade, id_asset), ) SETUP: ------ These tables are relatively large (~100 million rows in position effect). The box is a pretty beastly affair with 512Mb of ram and 4x10 2Ghz cores. The postgres configuration is here: http://pastebin.com/48uyiak7 I am using a 64bit postgresql 9.2.1, hand compiled on a RedHat 6.2 box. QUERY: ------ What I want to do is sum all of the position effects, for a particular asset while joined to the trade table to filter for the time it was executed and the book it was traded into: SELECT sum(position_effect.quantity) FROM trade, position_effect WHERE trade.id = position_effect.id_trade AND position_effect.id_asset = 1837 AND trade.time_executed >= '2012-10-28 00:00:00' AND trade.id_book = 41 In this case there are only 11 rows that need to be summed. If I just let postgres do its thing, that query takes 5000ms (Which when multiplied over many books and assets gets very slow). I think this is because it is bitmapping the whole position_effect table which is very large. If I disable bitmap scans: set enable_bitmapscan = off; The query takes 43ms, and properly uses the indexes I have set up. Slow version with bitmapscan enabled: http://explain.depesz.com/s/6I7 Fast version with bitmapscan disabled: http://explain.depesz.com/s/4MWG -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
On Tue, Dec 4, 2012 at 12:06 PM, <postgresql@foo.me.uk> wrote: > Slow version with bitmapscan enabled: http://explain.depesz.com/s/6I7 > Fast version with bitmapscan disabled: http://explain.depesz.com/s/4MWG If you check the "fast" plan, it has a higher cost compared against the "slow" plan. The difference between cost estimation and actual cost of your queries, under relatively precise row estimates, seems to suggest your e_c_s or r_p_c aren't a reflection of your hardware's performance. First, make sure caching isn't interfering with your results. Run each query several times. Then, if the difference persists, you may have to tweak effective_cache_size first, maybe random_page_cost too, to better match your I/O subsystem's actual performance
On Tue, Dec 4, 2012 at 7:27 AM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Tue, Dec 4, 2012 at 12:06 PM, <postgresql@foo.me.uk> wrote: >> Slow version with bitmapscan enabled: http://explain.depesz.com/s/6I7 >> Fast version with bitmapscan disabled: http://explain.depesz.com/s/4MWG > > If you check the "fast" plan, it has a higher cost compared against > the "slow" plan. > > The difference between cost estimation and actual cost of your > queries, under relatively precise row estimates, seems to suggest your > e_c_s or r_p_c aren't a reflection of your hardware's performance. But the row estimates are not precise at the top of the join/filter. It thinks there will 2120 rows, but there are only 11. So it seems like there is a negative correlation between the two tables which is not recognized. > First, make sure caching isn't interfering with your results. Run each > query several times. If that is not how the production system works (running the same query over and over) then you want to model the cold cache, not the hot one. But in any case, the posted explains indicates that all buffers were cached. Cheers, Jeff
On Tue, Dec 4, 2012 at 2:22 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Tue, Dec 4, 2012 at 7:27 AM, Claudio Freire <klaussfreire@gmail.com> wrote: >> On Tue, Dec 4, 2012 at 12:06 PM, <postgresql@foo.me.uk> wrote: >>> Slow version with bitmapscan enabled: http://explain.depesz.com/s/6I7 >>> Fast version with bitmapscan disabled: http://explain.depesz.com/s/4MWG >> >> If you check the "fast" plan, it has a higher cost compared against >> the "slow" plan. >> >> The difference between cost estimation and actual cost of your >> queries, under relatively precise row estimates, seems to suggest your >> e_c_s or r_p_c aren't a reflection of your hardware's performance. > > But the row estimates are not precise at the top of the join/filter. > It thinks there will 2120 rows, but there are only 11. Ah... I didn't spot that one...
>> But the row estimates are not precise at the top of the join/filter. >> It thinks there will 2120 rows, but there are only 11. >Ah... I didn't spot that one... Yes, you are right there - this is probably a slightly atypical query of this sort actually, 2012 is a pretty good guess. On Claudio's suggestion I have found lots more things to read up on and am eagerly awaiting 6pm when I can bring the DB down and start tweaking. The effective_work_mem setting is going from 6Gb->88Gb which I think will make quite a difference. I still can't quite wrap around my head why accessing an index is expected to use more disk access than doing a bitmap scan of the table itself, but I guess it does make a bit of sense if postgres assumes the table is more likely to be cached. It's all quite, quite fascinating :) I'll let you know how it goes. - Phil
> But the row estimates are not precise at the top of the join/filter. > It thinks there will 2120 rows, but there are only 11. > So it seems like there is a negative correlation between the two tables which is not recognized. Yes, you are right there. I am only just beginning to understand how to parse these explain reports.. As I mentioned above, I probably picked a bad example to run that query on 11 is an unusually low number of results to get back, a few thousand would be more normal. Though that doesn't account for the 70x difference between the speed of the two queries in actuality given a pretty similar expected speed (does it?). It does go some way to explaining why a bad choice of plan was made. Is there some nice bit of literature somewhere that explains what sort of costs are associated with the different types of lookup? I have found bits and bobs online but I still don't have a really clear idea in my head what the difference is between a bitmap index scan and index only scan is, though I can sort of guess I don't see why one would be considered more likely to use the disk than the other. On the 'slow' query (with the better predicted score) >> First, make sure caching isn't interfering with your results. Run each >> query several times. > If that is not how the production system works (running the same query over and over) then you want to model the cold cache, not the hot one. > But in any case, the posted explains indicates that all buffers were cached. We are in the rather pleasant situation here in that we are willing to spend money on the box (up to a point, but quite a large point) to get it up to the spec so that it should hardly ever need to touch the disk, the trick is figuring out how to let our favourite database server know that. I've just discovered pgtune and am having some fun with that too. Cheers, Phil
On Tue, Dec 4, 2012 at 3:03 PM, <postgresql@foo.me.uk> wrote: > > Though that doesn't account for the 70x difference between the speed of the > two queries in actuality given a pretty similar expected speed (does it?). > It does go some way to explaining why a bad choice of plan was made. I still don't think it does. I still think the problem is the GUC settings. The slow plan joins in a way that processes all 3M rows in both sides of the join, and pg knows it. The fast plan only processes 5k of them. And pg knows it. Why is it choosing to process 3M rows? If there's negative correlation, it only means less rows will be produced, but the nested loop and and the right-hand index scan still happens.
On Tue, Dec 4, 2012 at 3:31 PM, Philip Scott <pscott@foo.me.uk> wrote: > r_p_c 2-> 1 (s_p_c 1->0.5): Is this really necessary? (looks like a no-op, unless your CPU is slow)
Yes, you are right there - this is probably a slightly atypical query of
>> But the row estimates are not precise at the top of the join/filter.
>> It thinks there will 2120 rows, but there are only 11.
>Ah... I didn't spot that one...
this sort actually, 2012 is a pretty good guess.
On Claudio's suggestion I have found lots more things to read up on and am
eagerly awaiting 6pm when I can bring the DB down and start tweaking. The
effective_work_mem setting is going from 6Gb->88Gb which I think will make
quite a difference.
I still can't quite wrap around my head why accessing an index is expected
to use more disk access than doing a bitmap scan of the table itself, but I
guess it does make a bit of sense if postgres assumes the table is more
likely to be cached.
It's all quite, quite fascinating :)
I'll let you know how it goes.
- Phil
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Best regards,
Vitalii Tymchyshyn
Ah, okay - my reasoning was there's a big fancy-pants raid array behind it that makes disk operations faster relative to CPU ones. I'll test it and see if it actually makes any difference. -----Original Message----- From: Claudio Freire [mailto:klaussfreire@gmail.com] Sent: 04 December 2012 18:33 To: Philip Scott Cc: postgresql@foo.me.uk; postgres performance list Subject: Re: [PERFORM] Slow query: bitmap scan troubles On Tue, Dec 4, 2012 at 3:31 PM, Philip Scott <pscott@foo.me.uk> wrote: > r_p_c 2-> 1 (s_p_c 1->0.5): Is this really necessary? (looks like a no-op, unless your CPU is slow)
Ah okay, thanks. I knew I could set various things but not effective_work_mem (I tried reloading the edited config file but it didn’t seem to pick it up)
From: Vitalii Tymchyshyn [mailto:tivv00@gmail.com]
Sent: 04 December 2012 18:51
To: postgresql@foo.me.uk
Cc: postgres performance list
Subject: Re: [PERFORM] Slow query: bitmap scan troubles
Well, you don't need to put anything down. Most settings that change planner decisions can be tuned on per-quey basis by issuing set commands in given session. This should not affect other queries more than it is needed to run query in the way planner chooses.
Best regards, Vitalii Tymchyshyn
2012/12/4 <postgresql@foo.me.uk>
>> But the row estimates are not precise at the top of the join/filter.
>> It thinks there will 2120 rows, but there are only 11.
>Ah... I didn't spot that one...
Yes, you are right there - this is probably a slightly atypical query of
this sort actually, 2012 is a pretty good guess.
On Claudio's suggestion I have found lots more things to read up on and am
eagerly awaiting 6pm when I can bring the DB down and start tweaking. The
effective_work_mem setting is going from 6Gb->88Gb which I think will make
quite a difference.
I still can't quite wrap around my head why accessing an index is expected
to use more disk access than doing a bitmap scan of the table itself, but I
guess it does make a bit of sense if postgres assumes the table is more
likely to be cached.
It's all quite, quite fascinating :)
I'll let you know how it goes.
- Phil
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
--
Best regards,
Vitalii Tymchyshyn
postgresql@foo.me.uk wrote: > Ah okay, thanks. I knew I could set various things but not > effective_work_mem (I tried reloading the edited config file but > it didn't seem to pick it up) Check the server log, maybe there was a typo or capitalization error. To test on a single connection you should be able to just run: SET effective_cache_size = '88GB'; By the way, one other setting that I have found a need to adjust to get good plans is cpu_tuple_cost. In my experience, better plans are chosen when this is in the 0.03 to 0.05 range than with the default of 0.01. -Kevin
On Tue, Dec 4, 2012 at 9:47 AM, <postgresql@foo.me.uk> wrote: > eagerly awaiting 6pm when I can bring the DB down and start tweaking. The > effective_work_mem setting is going from 6Gb->88Gb which I think will make > quite a difference. I also wonder if increasing (say x10) of default_statistics_target or just doing ALTER TABLE SET STATISTICS for particular tables will help. It will make planned to produce more precise estimations. Do not forget ANALYZE afer changing it. > > I still can't quite wrap around my head why accessing an index is expected > to use more disk access than doing a bitmap scan of the table itself, but I > guess it does make a bit of sense if postgres assumes the table is more > likely to be cached. > > It's all quite, quite fascinating :) > > I'll let you know how it goes. > > - Phil > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sergey Konoplev Database and Software Architect http://www.linkedin.com/in/grayhemp Phones: USA +1 415 867 9984 Russia, Moscow +7 901 903 0499 Russia, Krasnodar +7 988 888 1979 Skype: gray-hemp Jabber: gray.ru@gmail.com
On Tue, Dec 4, 2012 at 9:47 AM, <postgresql@foo.me.uk> wrote: > >>> But the row estimates are not precise at the top of the join/filter. >>> It thinks there will 2120 rows, but there are only 11. > >>Ah... I didn't spot that one... > > Yes, you are right there - this is probably a slightly atypical query of > this sort actually, 2012 is a pretty good guess. What do the timings look like on a more realistic example? > On Claudio's suggestion I have found lots more things to read up on and am > eagerly awaiting 6pm when I can bring the DB down and start tweaking. The > effective_work_mem setting is going from 6Gb->88Gb which I think will make > quite a difference. You can change effective_cache_size just in your own session, or do it globally with a "reload" or SIGHUP, no need to bring down the server. However, I don't think it will make much difference. Even though it thinks it is hitting the index 14,085 times, that is still small compared to the overall size of the table. > I still can't quite wrap around my head why accessing an index is expected > to use more disk access than doing a bitmap scan of the table itself, It is only doing an bitmap scan of those parts of the table which contain relevant data, and it is doing them in physical order, so it thinks that much of the IO which it thinks it is going to do is largely sequential. > but I > guess it does make a bit of sense if postgres assumes the table is more > likely to be cached. Unfortunately, postgres's planner doesn't know anything about that. From your "explain" I can see in hindsight that everything you needed was cached, but that is not information that the planner can use (currently). And I don't know if *everything* is cached, or if just those particular blocks are because you already ran the same query with the same parameters recently. Also, your work_mem is pretty low given the amount of RAM you have. work_mem = 1MB I don't think the current planner attempts to take account of the fact that a bitmap scan which overflows work_mem and so becomes "lossy" is quite a performance set-back. Nor does it look like explain analyze informs you of this happening. But maybe I'm just looking in the wrong places. Cheers, Jeff
On Tue, Dec 4, 2012 at 10:03 AM, <postgresql@foo.me.uk> wrote: > > Though that doesn't account for the 70x difference between the speed of the > two queries in actuality given a pretty similar expected speed (does it?). It kind of does. The expected speed is predicated on the number of rows being 200 fold higher. If the number of rows actually was that much higher, the two speeds might be closer together. That is why it would be interesting to see a more typical case where the actual number of rows is closer to the 2000 estimate. But I am curious about how the cost estimate for the primary key look up is arrived at: Index Scan using cons_pe_primary_key on position_effect (cost=0.00..42.96 rows=1 width=16) There should be a random page for the index leaf page, and a random page for the heap page. Since you set random_page_cost to 2, that comes up to 4. Then there would be some almost negligible CPU costs. Where the heck is the extra 38 cost coming from? > It does go some way to explaining why a bad choice of plan was made. > > Is there some nice bit of literature somewhere that explains what sort of > costs are associated with the different types of lookup? I've heard good things about Greg Smith's book, but I don't know if it covers this particular thing. Otherwise, I don't know of a good single place which is a tutorial rather than a reference (or the code itself) >>> First, make sure caching isn't interfering with your results. Run each >>> query several times. >> If that is not how the production system works (running the same query > over and over) then you want to model the cold cache, not the hot one. >> But in any case, the posted explains indicates that all buffers were > cached. > > We are in the rather pleasant situation here in that we are willing to spend > money on the box (up to a point, but quite a large point) to get it up to > the spec so that it should hardly ever need to touch the disk, the trick is > figuring out how to let our favourite database server know that. Well, that part is fairly easy. Make random_page_cost and seq_page_cost much smaller than their defaults. Like, 0.04 and 0.03, for example. I think the *_page_cost should strictly an estimate of actually doing IO, with a separate parameter to reflect likelihood of needing to do the IO, like *_page_cachedness. But that isn't the way it is done currently. Cheers, Jeff
On Tue, Dec 4, 2012 at 3:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote: (Regarding http://explain.depesz.com/s/4MWG, wrote) > > But I am curious about how the cost estimate for the primary key look > up is arrived at: > > Index Scan using cons_pe_primary_key on position_effect > (cost=0.00..42.96 rows=1 width=16) > > There should be a random page for the index leaf page, and a random > page for the heap page. Since you set random_page_cost to 2, that > comes up to 4. Then there would be some almost negligible CPU costs. > Where the heck is the extra 38 cost coming from? I now see where the cost is coming from. In commit 21a39de5809 (first appearing in 9.2) the "fudge factor" cost estimate for large indexes was increased by about 10 fold, which really hits this index hard. This was fixed in commit bf01e34b556 "Tweak genericcostestimate's fudge factor for index size", by changing it to use the log of the index size. But that commit probably won't be shipped until 9.3. I'm not sure that this change would fix your problem, because it might also change the costs of the alternative plans in a way that neutralizes things. But I suspect it would fix it. Of course, a correct estimate of the join size would also fix it--you have kind of a perfect storm here. Cheers, Jeff
On Wed, Dec 5, 2012 at 2:39 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > I'm not sure that this change would fix your problem, because it might > also change the costs of the alternative plans in a way that > neutralizes things. But I suspect it would fix it. Of course, a > correct estimate of the join size would also fix it--you have kind of > a perfect storm here. As far as I can see on the explain, the misestimation is 3x~4x not 200x.
Jeff Janes <jeff.janes@gmail.com> writes: > I now see where the cost is coming from. In commit 21a39de5809 (first > appearing in 9.2) the "fudge factor" cost estimate for large indexes > was increased by about 10 fold, which really hits this index hard. > This was fixed in commit bf01e34b556 "Tweak genericcostestimate's > fudge factor for index size", by changing it to use the log of the > index size. But that commit probably won't be shipped until 9.3. Hm. To tell you the truth, in October I'd completely forgotten about the January patch, and was thinking that the 1/10000 cost had a lot of history behind it. But if we never shipped it before 9.2 then of course that idea is false. Perhaps we should backpatch the log curve into 9.2 --- that would reduce the amount of differential between what 9.2 does and what previous branches do for large indexes. It would definitely be interesting to know if applying bf01e34b556 helps the OP's example. regards, tom lane
That is very interesting indeed, these indexes are quite large! I will apply that patch and try it out this evening and let you know. Thank you very much everyone for your time, the support has been amazing. PS: Just looked at this thread on the archives page and realised I don't have my name in FROM: field, which is a misconfiguration of my email client, but figured I would leave it to prevent confusion, sorry about that. All the best, Philip Scott -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: 05 December 2012 18:05 To: Jeff Janes Cc: postgresql@foo.me.uk; postgres performance list Subject: Re: [PERFORM] Slow query: bitmap scan troubles Jeff Janes <jeff.janes@gmail.com> writes: > I now see where the cost is coming from. In commit 21a39de5809 (first > appearing in 9.2) the "fudge factor" cost estimate for large indexes > was increased by about 10 fold, which really hits this index hard. > This was fixed in commit bf01e34b556 "Tweak genericcostestimate's > fudge factor for index size", by changing it to use the log of the > index size. But that commit probably won't be shipped until 9.3. Hm. To tell you the truth, in October I'd completely forgotten about the January patch, and was thinking that the 1/10000 cost had a lot of history behind it. But if we never shipped it before 9.2 then of course that idea is false. Perhaps we should backpatch the log curve into 9.2 --- that would reduce the amount of differential between what 9.2 does and what previous branches do for large indexes. It would definitely be interesting to know if applying bf01e34b556 helps the OP's example. regards, tom lane
> I also wonder if increasing (say x10) of default_statistics_target or just doing ALTER TABLE SET STATISTICS for particular tables will help. > It will make planned to produce more precise estimations. Do not forget ANALYZE afer changing it. Thanks Sergey, I will try this too. I think the bother here is that this statistics are pretty good (we do analyse regularly and default_statistics_target is already 1000), but once I start filtering the two tables the correlations alter quite a bit. I don't think there is that much that can be done about that :) - Phil
Hi Jeff > It kind of does. The expected speed is predicated on the number of rows being 200 fold higher. If the number of rows actually was that much higher, the two speeds might be closer together. That is why it would be interesting to see a more typical case where the actual number of rows is closer to the 2000 estimate. Ah, I see of course. Makes a lot of sense when you think about it. This has been quite an enlightening adventure into the guts of postgres for me :) > But I am curious about how the cost estimate for the primary key look up is arrived at: ( Delt with in your next reply, thanks for figuring that out! I will certainly try the patch) > I've heard good things about Greg Smith's book, but I don't know if it covers this particular thing. A copy is on its way, thank you. >> We are in the rather pleasant situation here in that we are willing to >> spend money on the box (up to a point, but quite a large point) to get >> it up to the spec so that it should hardly ever need to touch the >> disk, the trick is figuring out how to let our favourite database server know that. > Well, that part is fairly easy. Make random_page_cost and seq_page_cost much smaller than their defaults. Like, 0.04 and 0.03, for example. Yes, I have been playing a lot with that it makes a lot of difference. When I tweak them down I end up getting a lot of nested loops instead of hash or merge joins and they are much faster (presumably we might have gotten a nested loop out of the planner if it could correctly estimate the low number of rows returned). I've got plenty of ammunition now to dig deeper, you guys have been invaluable. Cheers, Phil
On Wed, Dec 5, 2012 at 9:43 AM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Wed, Dec 5, 2012 at 2:39 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> I'm not sure that this change would fix your problem, because it might >> also change the costs of the alternative plans in a way that >> neutralizes things. But I suspect it would fix it. Of course, a >> correct estimate of the join size would also fix it--you have kind of >> a perfect storm here. > > As far as I can see on the explain, the misestimation is 3x~4x not 200x. It is 3x (14085 vs 4588) for selectivity on one of the tables, "Index Only Scan using idx_trade_id_book on trade". But for the join of both tables it is estimate 2120 vs actual 11. Cheers, Jeff
On Thu, Dec 6, 2012 at 2:27 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Wed, Dec 5, 2012 at 9:43 AM, Claudio Freire <klaussfreire@gmail.com> wrote: >> On Wed, Dec 5, 2012 at 2:39 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> I'm not sure that this change would fix your problem, because it might >>> also change the costs of the alternative plans in a way that >>> neutralizes things. But I suspect it would fix it. Of course, a >>> correct estimate of the join size would also fix it--you have kind of >>> a perfect storm here. >> >> As far as I can see on the explain, the misestimation is 3x~4x not 200x. > > It is 3x (14085 vs 4588) for selectivity on one of the tables, "Index > Only Scan using idx_trade_id_book on trade". > > But for the join of both tables it is estimate 2120 vs actual 11. But the final result set isn't further worked on (except for the aggregate), which means it doesn't affect the cost by much.
On Thu, Dec 6, 2012 at 12:05 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Thu, Dec 6, 2012 at 2:27 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Wed, Dec 5, 2012 at 9:43 AM, Claudio Freire <klaussfreire@gmail.com> wrote: >>> As far as I can see on the explain, the misestimation is 3x~4x not 200x. >> >> It is 3x (14085 vs 4588) for selectivity on one of the tables, "Index >> Only Scan using idx_trade_id_book on trade". >> >> But for the join of both tables it is estimate 2120 vs actual 11. > > But the final result set isn't further worked on (except for the > aggregate), which means it doesn't affect the cost by much. Good point. Both the NL and hash join do about the same amount of work probing for success whether the success is actually there or not. So scratch what I said about the correlation being important, in this case it is not. The 3x error is enough to push it over the edge, but the fudge factor is what gets it so close to that edge in the first place. And I'm now pretty sure the fudge factor change would fix this. The truly-fast NL plan is getting overcharged by the fudge-factor once per each 14,085 of the loopings, while the truly-slow bitmap scan is overcharged only once for the entire scan. So the change is by no means neutralized between the two plans. I don't know if my other theory that the bitmap scan is overflowing work_mem (but not costed for doing so) is also contributing. Cheers, Jeff
On Tue, 2012-12-04 at 15:42 -0800, Jeff Janes wrote: > On Tue, Dec 4, 2012 at 10:03 AM, <postgresql@foo.me.uk> wrote: > >[...] > > > > Is there some nice bit of literature somewhere that explains what sort of > > costs are associated with the different types of lookup? > > I've heard good things about Greg Smith's book, but I don't know if it > covers this particular thing. > > Otherwise, I don't know of a good single place which is a tutorial > rather than a reference (or the code itself) > Greg's book is awesome. It really gives a lot of informations/tips/whatever on performances. I mostly remember all the informations about hardware, OS, PostgreSQL configuration, and such. Not much on the EXPLAIN part. On the EXPLAIN part, you may have better luck with some slides available here and there. Robert Haas gave a talk on the query planner at pgCon 2010. The audio feed of Robert Haas talk is available with this file: http://www.pgcon.org/2010/audio/15%20The%20PostgreSQL%20Query% 20Planner.mp3 You can also find the slides on https://sites.google.com/site/robertmhaas/presentations You can also read the "Explaining the Postgres Query Optimizer" talk written by Bruce Momjian. It's available there : http://momjian.us/main/presentations/internals.html And finally, you can grab my slides over here: http://www.dalibo.org/_media/understanding_explain.pdf. You have more than slides. I tried to put a lot of informations in there. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com
>> But the row estimates are not precise at the top of the join/filter. >> It thinks there will 2120 rows, but there are only 11. >Ah... I didn't spot that one... Yes, you are right there - this is probably a slightly atypical query of this sort actually, 2012 is a pretty good guess. On Claudio's suggestion I have found lots more things to read up on and am eagerly awaiting 6pm when I can bring the DB down and start tweaking. The effective_work_mem setting is going from 6Gb->88Gb which I think will make quite a difference. I still can't quite wrap around my head why accessing an index is expected to use more disk access than doing a bitmap scan of the table itself, but I guess it does make a bit of sense if postgres assumes the table is more likely to be cached. It's all quite, quite fascinating :) I'll let you know how it goes. - Phil
> The difference between cost estimation and actual cost of your queries, under relatively precise row estimates, seems to suggest your e_c_s or r_p_c aren't a reflection of your hardware's performance. Wow, so tweaking these has fixed it and then some. It now picks a slightly different plan than the 'fast' one previously: New super fast version with e_c_s 6GB->88Gb and r_p_c 2-> 1 (s_p_c 1->0.5): http://explain.depesz.com/s/ECk For reference: > Slow version with bitmapscan enabled: http://explain.depesz.com/s/6I7 > Fast version with bitmapscan disabled: http://explain.depesz.com/s/4MWG
Ah okay, thanks. I knew I could set various things but not effective_work_mem (I tried reloading the edited config file but it didn’t seem to pick it up)
From: Vitalii Tymchyshyn [mailto:tivv00@gmail.com]
Sent: 04 December 2012 18:51
To: postgresql@foo.me.uk
Cc: postgres performance list
Subject: Re: [PERFORM] Slow query: bitmap scan troubles
Well, you don't need to put anything down. Most settings that change planner decisions can be tuned on per-quey basis by issuing set commands in given session. This should not affect other queries more than it is needed to run query in the way planner chooses.
Best regards, Vitalii Tymchyshyn
2012/12/4 <postgresql@foo.me.uk>
>> But the row estimates are not precise at the top of the join/filter.
>> It thinks there will 2120 rows, but there are only 11.
>Ah... I didn't spot that one...
Yes, you are right there - this is probably a slightly atypical query of
this sort actually, 2012 is a pretty good guess.
On Claudio's suggestion I have found lots more things to read up on and am
eagerly awaiting 6pm when I can bring the DB down and start tweaking. The
effective_work_mem setting is going from 6Gb->88Gb which I think will make
quite a difference.
I still can't quite wrap around my head why accessing an index is expected
to use more disk access than doing a bitmap scan of the table itself, but I
guess it does make a bit of sense if postgres assumes the table is more
likely to be cached.
It's all quite, quite fascinating :)
I'll let you know how it goes.
- Phil
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
--
Best regards,
Vitalii Tymchyshyn
> Greg's book is awesome. It really gives a lot of informations/tips/whatever on performances. I mostly remember all theinformations about hardware, OS, PostgreSQL configuration, and such. Not much on the EXPLAIN part. Arrived this morning :) > http://www.pgcon.org/2010/audio/15%20The%20PostgreSQL%20Query% > https://sites.google.com/site/robertmhaas/presentations > http://momjian.us/main/presentations/internals.html > http://www.dalibo.org/_media/understanding_explain.pdf Well that is my evenings occupied for the next week. Thank you kindly. - Phil