Обсуждение: [PERFORM] Rowcount estimation changes based on from clause order
I stumbled upon a severe row count underestimation that confusingly went away when two inner joins in the from clause were reordered. I whittled it down to a reproducible test case. Schema: CREATE TABLE small (id serial primary key, ref_id int not null, subset int not null); CREATE TABLE big (id serial primary key, small_id int not null); INSERT INTO small (ref_id, subset) SELECT i/2+1, i/2+1 FROM generate_series(1,1000) i; INSERT INTO big (small_id) SELECT (i % 1000) + 1 FROM generate_series(1,1000000) i; CREATE INDEX ON small (ref_id); CREATE INDEX ON big (small_id); ANALYZE; And the queries, differing in only the order of joins: SELECT * FROM small INNER JOIN big ON small.id = big.small_id INNER JOIN (SELECT 1 UNION ALL SELECT 2) lookup(ref)ON lookup.ref = small.ref_id WHERE small.subset = 42; SELECT * FROM small INNER JOIN (SELECT 1 UNION ALL SELECT 2) lookup(ref) ON lookup.ref = small.ref_id INNER JOIN big ON small.id = big.small_id WHERE small.subset = 42; Resulting plan for the first case:Nested Loop (cost=20.45..2272.13 rows=8 width=24) -> Nested Loop (cost=0.28..16.69rows=1 width=16) -> Append (cost=0.00..0.04 rows=2 width=4) -> Result (cost=0.00..0.01rows=1 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) -> Index Scan using small_ref_id_idxon small (cost=0.28..8.32 rows=1 width=12) Index Cond: (ref_id = (1)) Filter: (subset = 42) -> BitmapHeap Scan on big (cost=20.18..2245.44 rows=1000 width=8) Recheck Cond: (small_id = small.id) -> BitmapIndex Scan on big_small_id_idx (cost=0.00..19.93 rows=1000 width=0) Index Cond: (small_id = small.id) Second case plan is identical except row count of the topmost nest loop:Nested Loop (cost=20.45..2272.13 rows=1000 width=24) The union subselect was in reality somewhat more complicated, but for the row count issue the simplification does not seem to matter. The behavior is seen both on 9.4 and on master. Does anybody have any idea what is going on here? In the real world case this is based on the estimation was 5 rows instead of 200k, which resulted in quite bad plan choices downstream. Regards, Ants Aasma -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Ants Aasma <ants.aasma@eesti.ee> writes: > I stumbled upon a severe row count underestimation that confusingly > went away when two inner joins in the from clause were reordered. Hm, looks more like an overestimate in this example, but anyway ... > Does anybody have any idea what is going on here? set_joinrel_size_estimates says * Since there is more than one way to make a joinrel for more than two* base relations, the results we get here could dependon which component* rel pair is provided. In theory we should get the same answers no matter* which pair is provided;in practice, since the selectivity estimation* routines don't handle all cases equally well, we might not. Butthere's* not much to be done about it. In this example I think the core of the issue is actually not so much bad selectivity estimates as rowcount roundoff error. If we first consider joining "small" with "big", we get an estimate of 2000 rows (which is dead on for what would happen if we just joined those). Then we estimate the final result size as the join of that to "lookup". The selectivity number for that step is somewhat hogwash but happens to yield a result that's not awful (8 rows). In the other case we first estimate the size of the join of "small" with the "lookup" subquery, and we get a rounded-off estimate of one row, whereas without the roundoff it would have been probably about 0.01. When that's joined to "big", we are computing one row times 1 million rows times a selectivity estimate that's about right for the "small.id = big.small_id" clause; but because the roundoff already inflated the first join's size so much, you end up with an inflated final result. This suggests that there might be some value in considering the sub-relations from largest to smallest, so that roundoff error in the earlier estimates is less likely to contaminate the final answer. Not sure how expensive it would be to do that or what sort of instability it might introduce into plan choices. Whether that's got anything directly to do with your original problem is hard to say. Joins to subqueries, which we normally lack any stats for, tend to produce pretty bogus selectivity numbers in themselves; so the original problem might've been more of that nature. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
On Thu, Oct 12, 2017 at 11:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ants Aasma <ants.aasma@eesti.ee> writes: >> I stumbled upon a severe row count underestimation that confusingly >> went away when two inner joins in the from clause were reordered. > > Hm, looks more like an overestimate in this example, but anyway ... > >> Does anybody have any idea what is going on here? > > set_joinrel_size_estimates says > > * Since there is more than one way to make a joinrel for more than two > * base relations, the results we get here could depend on which component > * rel pair is provided. In theory we should get the same answers no matter > * which pair is provided; in practice, since the selectivity estimation > * routines don't handle all cases equally well, we might not. But there's > * not much to be done about it. > > In this example I think the core of the issue is actually not so much > bad selectivity estimates as rowcount roundoff error. > > If we first consider joining "small" with "big", we get an estimate of > 2000 rows (which is dead on for what would happen if we just joined > those). Then we estimate the final result size as the join of that to > "lookup". The selectivity number for that step is somewhat hogwash but > happens to yield a result that's not awful (8 rows). > > In the other case we first estimate the size of the join of "small" with > the "lookup" subquery, and we get a rounded-off estimate of one row, > whereas without the roundoff it would have been probably about 0.01. > When that's joined to "big", we are computing one row times 1 million rows > times a selectivity estimate that's about right for the "small.id = > big.small_id" clause; but because the roundoff already inflated the first > join's size so much, you end up with an inflated final result. > > This suggests that there might be some value in considering the > sub-relations from largest to smallest, so that roundoff error > in the earlier estimates is less likely to contaminate the final > answer. Not sure how expensive it would be to do that or what > sort of instability it might introduce into plan choices. > > Whether that's got anything directly to do with your original problem is > hard to say. Joins to subqueries, which we normally lack any stats for, > tend to produce pretty bogus selectivity numbers in themselves; so the > original problem might've been more of that nature. Thanks for pointing me in the correct direction. The original issue was that values from lookup joined to ref_id and the subset filter in the small table were almost perfectly correlated, which caused the underestimate. In the second case this was hidden by the intermediate clamping to 1, accidentally resulting in a more correct estimate. I actually think that it might be better to consider relations from smallest to largest. The reasoning being - a join cannot produce a fraction of a row, it will either produce 0 or 1, and we should probably plan for the case when it does return something. Going even further, and I haven't looked at how feasible this is, but I have run into several cases lately where cardinality underestimates clamping to 1 result in catastrophically bad plans. Like a stack of nested loops with unparameterized GroupAggregates and HashAggregates as inner sides bad. It seems to me that row estimates should clamp to something slightly larger than 1 unless it's provably going to be 1. Regards, Ants Aasma -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance