Обсуждение: Bad plan for nested loop + limit
This dinky little query takes about 4 seconds to run: select event_occurrences.* from event_occurrences join section_items on section_items.subject_id = event_occurrences.event_id and section_items.subject_type = 'Event' and section_items.sandbox_id = 16399 where event_occurrences.start_time > '2009-02-14 18:15:14.739411 +0100' order by event_occurrences.start_time limit 4; Output from "explain analyze": Limit (cost=0.00..973.63 rows=4 width=48) (actual time=61.554..4039.704 rows=1 loops=1) -> Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual time=61.552..4039.700 rows=1 loops=1) -> Nested Loop (cost=0.00..68247.77 rows=297 width=52) (actual time=61.535..4039.682 rows=1 loops=1) -> Index Scan using index_event_occurrences_on_start_time on event_occurrences (cost=0.00..13975.01 rows=159718 width=48) (actual time=0.024..398.152 rows=155197 loops=1) Index Cond: (start_time > '2009-02-14 18:15:14.739411+01'::timestamp with time zone) -> Index Scan using index_section_items_on_subject_type_and_subject_id on section_items (cost=0.00..0.33 rows=1 width=4) (actual time=0.023..0.023 rows=0 loops=155197) Index Cond: (((section_items.subject_type)::text = 'Event'::text) AND (section_items.subject_id = event_occurrences.event_id)) Filter: (section_items.sandbox_id = 16399) -> Index Scan using event_instances_pkey on events (cost=0.00..6.23 rows=1 width=4) (actual time=0.014..0.015 rows=1 loops=1) Index Cond: (events.id = event_occurrences.event_id) Filter: (events.read_permissions = (-1)) Total runtime: 4039.788 ms Now, if I use "limit 50" it uses a plan that is several orders of magnitude more efficient: Limit (cost=6202.38..6202.51 rows=50 width=48) (actual time=0.170..0.171 rows=1 loops=1) -> Sort (cost=6202.38..6203.20 rows=326 width=48) (actual time=0.170..0.170 rows=1 loops=1) Sort Key: event_occurrences.start_time Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=5.09..6191.55 rows=326 width=48) (actual time=0.160..0.161 rows=1 loops=1) -> Bitmap Heap Scan on section_items (cost=5.09..328.94 rows=96 width=4) (actual time=0.024..0.087 rows=7 loops=1) Recheck Cond: (sandbox_id = 16399) Filter: ((subject_type)::text = 'Event'::text) -> Bitmap Index Scan on index_section_items_on_sandbox_id (cost=0.00..5.06 rows=107 width=0) (actual time=0.018..0.018 rows=7 loops=1) Index Cond: (sandbox_id = 16399) -> Index Scan using index_event_occurrences_on_event_id on event_occurrences (cost=0.00..60.14 rows=74 width=48) (actual time=0.010..0.010 rows=0 loops=7) Index Cond: (event_occurrences.event_id = section_items.subject_id) Filter: (event_occurrences.start_time > '2009-02-14 18:15:14.739411+01'::timestamp with time zone) Total runtime: 0.210 ms Similarly if I disable nested joins with "set enable_nestloop = off": Limit (cost=10900.13..10900.14 rows=4 width=48) (actual time=191.476..191.478 rows=1 loops=1) -> Sort (cost=10900.13..10900.95 rows=326 width=48) (actual time=191.474..191.475 rows=1 loops=1) Sort Key: event_occurrences.start_time Sort Method: quicksort Memory: 25kB -> Hash Join (cost=8944.52..10895.24 rows=326 width=48) (actual time=162.104..191.463 rows=1 loops=1) Hash Cond: (section_items.subject_id = event_occurrences.event_id) -> Bitmap Heap Scan on section_items (cost=5.09..328.94 rows=96 width=4) (actual time=0.026..0.050 rows=7 loops=1) Recheck Cond: (sandbox_id = 16399) Filter: ((subject_type)::text = 'Event'::text) -> Bitmap Index Scan on index_section_items_on_sandbox_id (cost=0.00..5.06 rows=107 width=0) (actual time=0.019..0.019 rows=7 loops=1) Index Cond: (sandbox_id = 16399) -> Hash (cost=5580.54..5580.54 rows=157752 width=48) (actual time=161.832..161.832 rows=155197 loops=1) -> Seq Scan on event_occurrences (cost=0.00..5580.54 rows=157752 width=48) (actual time=0.030..75.406 rows=155197 loops=1) Filter: (start_time > '2009-02-14 18:15:14.739411+01'::timestamp with time zone) Total runtime: 192.496 ms Some statistics: # # select attname, n_distinct from pg_stats where tablename = 'event_occurrences'; attname | n_distinct ------------+------------ id | -1 created_at | -0.291615 updated_at | -0.294081 created_by | 715 updated_by | 715 event_id | 2146 start_time | -0.10047 end_time | 5602 # select attname, n_distinct from pg_stats where tablename = 'section_items'; attname | n_distinct ----------------------+------------ id | -1 created_by | 1612 created_at | -0.708649 updated_at | -0.83635 updated_by | 1190 posted_at | -0.930831 section_id | 456 sandbox_id | 455 reference | 2 subject_id | -0.546929 subject_type | 5 conversation_id | 1981 read_permissions | 8 permission_policy_id | 11 Anything I can do to fix the query? This is PostgreSQL 8.3.5. Standard planner configs. Before testing I reindexed, vacuumed and analyzed the tables. Alexander.
On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo <alex@bengler.no> wrote: > > Output from "explain analyze": > > Limit (cost=0.00..973.63 rows=4 width=48) (actual > time=61.554..4039.704 rows=1 loops=1) > -> Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual > time=61.552..4039.700 rows=1 loops=1) > -> Nested Loop (cost=0.00..68247.77 rows=297 width=52) > (actual time=61.535..4039.682 rows=1 loops=1) Those estimates are pretty far off. Did you try increasing the statistics target? Also, is the first query repeatable (that is, is it already in cache when you do the test, or alternately, are all queries *out* of cache when you test?) -- - David T. Wilson david.t.wilson@gmail.com
On Sun, Feb 15, 2009 at 5:29 AM, David Wilson <david.t.wilson@gmail.com> wrote: > On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo <alex@bengler.no> wrote: >> >> Output from "explain analyze": >> >> Limit (cost=0.00..973.63 rows=4 width=48) (actual >> time=61.554..4039.704 rows=1 loops=1) >> -> Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual >> time=61.552..4039.700 rows=1 loops=1) >> -> Nested Loop (cost=0.00..68247.77 rows=297 width=52) >> (actual time=61.535..4039.682 rows=1 loops=1) > > Those estimates are pretty far off. Did you try increasing the > statistics target? Also, is the first query repeatable (that is, is it > already in cache when you do the test, or alternately, are all queries > *out* of cache when you test?) All in the cache when I do the test. Ok, so upping the statistics to 100 on section_items.subject_id fixed it: Limit (cost=3530.95..3530.96 rows=4 width=48) (actual time=0.107..0.107 rows=1 loops=1) -> Sort (cost=3530.95..3531.12 rows=66 width=48) (actual time=0.106..0.106 rows=1 loops=1) Sort Key: event_occurrences.start_time Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..3529.96 rows=66 width=48) (actual time=0.098..0.100 rows=1 loops=1) -> Index Scan using index_section_items_on_sandbox_id on section_items (cost=0.00..104.29 rows=22 width=4) (actual time=0.017..0.033 rows=7 loops=1) Index Cond: (sandbox_id = 16399) Filter: ((subject_type)::text = 'Event'::text) -> Index Scan using index_event_occurrences_on_event_id on event_occurrences (cost=0.00..154.79 rows=74 width=48) (actual time=0.008..0.008 rows=0 loops=7) Index Cond: (event_occurrences.event_id = section_items.subject_id) Filter: (event_occurrences.start_time > '2009-02-14 18:15:14.739411+01'::timestamp with time zone) Total runtime: 0.142 ms Thanks. Alexander.
On Sun, Feb 15, 2009 at 5:45 PM, Alexander Staubo <alex@bengler.no> wrote: > On Sun, Feb 15, 2009 at 5:29 AM, David Wilson <david.t.wilson@gmail.com> wrote: >> On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo <alex@bengler.no> wrote: >>> >>> Output from "explain analyze": >>> >>> Limit (cost=0.00..973.63 rows=4 width=48) (actual >>> time=61.554..4039.704 rows=1 loops=1) >>> -> Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual >>> time=61.552..4039.700 rows=1 loops=1) >>> -> Nested Loop (cost=0.00..68247.77 rows=297 width=52) >>> (actual time=61.535..4039.682 rows=1 loops=1) >> >> Those estimates are pretty far off. Did you try increasing the >> statistics target? Also, is the first query repeatable (that is, is it >> already in cache when you do the test, or alternately, are all queries >> *out* of cache when you test?) All right, this query keeps coming back to bite me. If this part of the join: ... and section_items.sandbox_id = 16399 yields a sufficiently large number of matches, then performance goes 'boink', like so: Limit (cost=0.00..34.86 rows=4 width=48) (actual time=4348.696..4348.696 rows=0 loops=1) -> Nested Loop (cost=0.00..60521.56 rows=6944 width=48) (actual time=4348.695..4348.695 rows=0 loops=1) -> Index Scan using index_event_occurrences_on_start_time on event_occurrences (cost=0.00..11965.38 rows=145712 width=48) (actual time=0.093..138.029 rows=145108 loops=1) Index Cond: (start_time > '2009-02-27 18:01:14.739411+01'::timestamp with time zone) -> Index Scan using index_section_items_on_subject_type_and_subject_id on section_items (cost=0.00..0.32 rows=1 width=4) (actual time=0.029..0.029 rows=0 loops=145108) Index Cond: (((section_items.subject_type)::text = 'Event'::text) AND (section_items.subject_id = event_occurrences.event_id)) Filter: (section_items.sandbox_id = 9) Total runtime: 4348.777 ms In this case: # select count(*) from section_items where sandbox_id = 9; count ------- 3126 If I remove the start_time > ... clause, performance is fine. Upping the statistics setting on any of the columns involved seems to have no effect. Is this a pathological border case, or is there something I can do to *generally* make this query run fast? Keep in mind that the query itself returns no rows at all. I want to avoid doing an initial "select count(...)" just to avoid the bad plan. Suffice to say, having a web request take 5 seconds is asking too much from our users. Alexander.
On Fri, Feb 27, 2009 at 3:18 PM, Alexander Staubo <alex@bengler.no> wrote: > On Sun, Feb 15, 2009 at 5:45 PM, Alexander Staubo <alex@bengler.no> wrote: >> On Sun, Feb 15, 2009 at 5:29 AM, David Wilson <david.t.wilson@gmail.com> wrote: >>> On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo <alex@bengler.no> wrote: >>>> >>>> Output from "explain analyze": >>>> >>>> Limit (cost=0.00..973.63 rows=4 width=48) (actual >>>> time=61.554..4039.704 rows=1 loops=1) >>>> -> Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual >>>> time=61.552..4039.700 rows=1 loops=1) >>>> -> Nested Loop (cost=0.00..68247.77 rows=297 width=52) >>>> (actual time=61.535..4039.682 rows=1 loops=1) >>> >>> Those estimates are pretty far off. Did you try increasing the >>> statistics target? Also, is the first query repeatable (that is, is it >>> already in cache when you do the test, or alternately, are all queries >>> *out* of cache when you test?) > > All right, this query keeps coming back to bite me. If this part of the join: > > ... and section_items.sandbox_id = 16399 > > yields a sufficiently large number of matches, then performance goes > 'boink', like so: > > Limit (cost=0.00..34.86 rows=4 width=48) (actual > time=4348.696..4348.696 rows=0 loops=1) > -> Nested Loop (cost=0.00..60521.56 rows=6944 width=48) (actual > time=4348.695..4348.695 rows=0 loops=1) > -> Index Scan using index_event_occurrences_on_start_time on > event_occurrences (cost=0.00..11965.38 rows=145712 width=48) (actual > time=0.093..138.029 rows=145108 loops=1) > Index Cond: (start_time > '2009-02-27 > 18:01:14.739411+01'::timestamp with time zone) > -> Index Scan using > index_section_items_on_subject_type_and_subject_id on section_items > (cost=0.00..0.32 rows=1 width=4) (actual time=0.029..0.029 rows=0 > loops=145108) > Index Cond: (((section_items.subject_type)::text = > 'Event'::text) AND (section_items.subject_id = > event_occurrences.event_id)) > Filter: (section_items.sandbox_id = 9) > Total runtime: 4348.777 ms > > In this case: > > # select count(*) from section_items where sandbox_id = 9; > count > ------- > 3126 > > If I remove the start_time > ... clause, performance is fine. Upping > the statistics setting on any of the columns involved seems to have no > effect. > > Is this a pathological border case, or is there something I can do to > *generally* make this query run fast? Keep in mind that the query > itself returns no rows at all. I want to avoid doing an initial > "select count(...)" just to avoid the bad plan. Suffice to say, having > a web request take 5 seconds is asking too much from our users. The problem here is that the planner estimates the cost of a Limit plan node by adding up (1) the startup cost of the underlying plan node, in this case 0 for the nestjoin, and (2) a percentage of the run cost, based on the ratio of the number of rows expected to be returned to the total number of rows. In this case, the nested loop is expected to return 6944 rows, so it figures it won't have to get very far to find the 4 you requested. So when the LIMIT clause is a little bigger, or missing, the planner tries to minimize the cost of the whole operation, whereas when the limit is very small, it picks a plan that is much slower overall on theory that it will be able to quit long before finishing the whole thing. When that turns out to be false, you get burned. That means that the root cause of the problem is the fact that the join is estimated to return hundreds or thousands of rows. But it's hard to think that you can make that estimate any better. The nestloop is expected to output 6944 rows, and the index scan on event_occurrences is expected to return 145712 rows. So the planner knows that only a tiny fraction of the rows in event_occurrences are going to have a match in section_items - it just doesn't think the fraction is quite tiny enough to keep it from making a bad decision. Interestingly, I think the solution Tom and I were talking about to another problem in this area would make your case MUCH WORSE. http://archives.postgresql.org/message-id/603c8f070902180908j3ae46774g535d96ece2c90e74@mail.gmail.com I will think about this some more but nothing is occurring to me off the top of my head. ...Robert
On Fri, Feb 27, 2009 at 11:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: > The problem here is that the planner estimates the cost of a Limit > plan node by adding up (1) the startup cost of the underlying plan > node, in this case 0 for the nestjoin, and (2) a percentage of the run > cost, based on the ratio of the number of rows expected to be returned > to the total number of rows. In this case, the nested loop is > expected to return 6944 rows, so it figures it won't have to get very > far to find the 4 you requested. [...] > I will think about this some more but nothing is occurring to me off > the top of my head. Thanks for explaining. Is there any way to rewrite the query in a way that will avoid the nested loop join -- other than actually disabling nested loop joins? If I do the latter, the resulting query uses a hash join and completes in 80-100 ms, which is still pretty horrible, especially for a query that returns nothing, but extremely auspicious compared to the unthinkable 4-5 seconds for the current query. Alexander.
On Sat, Feb 28, 2009 at 11:20 AM, Alexander Staubo <alex@bengler.no> wrote: > On Fri, Feb 27, 2009 at 11:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> The problem here is that the planner estimates the cost of a Limit >> plan node by adding up (1) the startup cost of the underlying plan >> node, in this case 0 for the nestjoin, and (2) a percentage of the run >> cost, based on the ratio of the number of rows expected to be returned >> to the total number of rows. In this case, the nested loop is >> expected to return 6944 rows, so it figures it won't have to get very >> far to find the 4 you requested. > [...] >> I will think about this some more but nothing is occurring to me off >> the top of my head. > > Thanks for explaining. Is there any way to rewrite the query in a way > that will avoid the nested loop join -- other than actually disabling > nested loop joins? If I do the latter, the resulting query uses a hash > join and completes in 80-100 ms, which is still pretty horrible, > especially for a query that returns nothing, but extremely auspicious > compared to the unthinkable 4-5 seconds for the current query. Can you post the schema for the two tables in question? Feel free to omit any columns that aren't included in the query, but make sure to include any unique indices, etc. What do you have default_statistics_target set to? If it's less than 100, you should probably raise it to 100 and re-analyze (the default value for 8.4 will be 100, but for 8.3 and prior it is 10). What is the approximate total number of rows in each of these two tables? Of the rows in section_items, how many have subject_type = 'Event'? ...Robert
On Sun, Mar 1, 2009 at 4:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: > What do you have default_statistics_target set to? If it's less than > 100, you should probably raise it to 100 and re-analyze (the default > value for 8.4 will be 100, but for 8.3 and prior it is 10). Changing it to 100 fixed the problem. Thanks for alerting me to the existence of default_statistics_target. Alexander.