Re: BUG #15007: LIMIT not respected in sub-queries
От | Tomas Vondra |
---|---|
Тема | Re: BUG #15007: LIMIT not respected in sub-queries |
Дата | |
Msg-id | 7bbee86a-d4c9-2f9b-64bf-d4f372062676@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: BUG #15007: LIMIT not respected in sub-queries (Will Storey <will@summercat.com>) |
Список | pgsql-bugs |
On 01/19/2018 02:55 AM, Will Storey wrote: > On Thu 2018-01-18 00:08:31 +0100, Tomas Vondra wrote: > > <snip> > >> I went through the test case, and I think I know what's going on. The >> script did not reproduce the issue for me, but I think I've been able to >> tweak the query to use a plan producing more issues. >> >> The key is to force a particular join order, and disable operations that >> would materialize intermediate results of the t1 scan. That is, we want >> something like this: >> >> -> nestloop >> -> nestloop t2 + t3 >> -> seq scan t1 >> >> Firstly, I've replaced the dynamic random() condition with a static one: >> >> random() <= 0.5 >> >> this is not strictly necessary, but it simplifies the plan. We need the >> random() call though, as I'll explain later. >> >> Then I've disabled a bunch of plan nodes that would materialize results >> of the t1 seq scan: >> >> set enable_material = off; >> set enable_sort = off; >> set enable_hashjoin = off; >> >> And finally, I've disabled join reordering by setting >> >> set join_collapse_limit = 1; >> >> Now, if you rewrite the query like this (which essentially just forces a >> particular join order, when combined with join_collapse_limit=1, nothing >> else): >> >> explain analyze SELECT * FROM >> (t3 JOIN t2 USING (t2_id)) JOIN >> (SELECT * FROM t1 >> WHERE t1_id IS NOT NULL AND >> t1_id < 100 AND >> t1_val LIKE 'h%' AND >> random() <= 0.5 >> LIMIT 5 >> ) AS t1 >> ON t3.t1_id = t1.t1_id >> WHERE t2.t2_val LIKE 'he%'; >> >> you will get plans like this: >> >> >> QUERY PLAN >> ------------------------------------------------------------------------ >> Nested Loop (cost=0.00..31.41 rows=5 width=21) >> (actual time=0.052..1.825 rows=7 loops=1) >> Join Filter: (t3.t1_id = t1.t1_id) >> Rows Removed by Join Filter: 73 >> -> Nested Loop (cost=0.00..16.10 rows=16 width=14) >> (actual time=0.030..0.917 rows=16 loops=1) >> Join Filter: (t3.t2_id = t2.t2_id) >> Rows Removed by Join Filter: 160 >> -> Seq Scan on t2 (cost=0.00..1.14 rows=11 width=10) >> (actual time=0.015..0.041 rows=11 loops=1) >> Filter: ((t2_val)::text ~~ 'he%'::text) >> -> Seq Scan on t3 (cost=0.00..1.16 rows=16 width=8) >> (actual time=0.003..0.037 rows=16 loops=11) >> -> Limit (cost=0.00..0.84 rows=5 width=7) >> (actual time=0.008..0.037 rows=5 loops=16) >> -> Seq Scan on t1 (cost=0.00..1.52 rows=9 width=7) >> (actual time=0.004..0.016 rows=5 loops=16) >> Filter: ((t1_id IS NOT NULL) AND (t1_id < 100) AND >> ((t1_val)::text ~~ 'h%'::text) AND >> (random() <= '0.5'::double precision)) >> Rows Removed by Filter: 5 >> Planning time: 0.625 ms >> Execution time: 1.911 ms >> (15 rows) >> >> >> This should be equivalent to the original query, and should produce the >> same results (modulo random() of course). >> >> But notice it actually does produce 7 rows! >> >> Nested Loop (cost=0.00..31.41 rows=5 width=21) >> (actual time=0.052..1.825 rows=7 loops=1) >> ^ >> >> The problem is that it ends up executing the sequential scan on t1 >> repeatedly (because it's the inner relation in a nested loop), and >> because random() is volatile, the results of the scan are likely different. >> >> Each rescan individually still respects the LIMIT, but when combined >> result may be larger - there may be more unique IDs, matching additional >> rows from the other tables. >> >>> >>> I believe a better way to write this query would be to use a CTE. >>> >> >> Yes, that will stabilize the output of the random() function, >> eliminating the nondeterminism during rescans. > > Wow, thank you for the great explanation of what is going on! > > It sounds like this is not really a bug then and is just something to > be expected when using LIMIT with such volatile joins. > The volatile part here is not the join, but the random function during rescans of a relation. But yeah, I don't think this qualifies as a bug. > > I suppose I expected that if there is a LIMIT then that would be the > maximum number of rows the subquery would ever provide. The planner > would have to force materializing/stabilizing in such cases it sounds > like. Maybe that is not possible or not a good idea. It does seem > like a pretty edge case. > Yes, something like that would be necessary. I don't know how difficult that would be, though. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления:
Следующее
От: PG Bug reporting formДата:
Сообщение: BUG #15019: REASSIGN OWNED doesn't work grantors of DEFAULT ACL