Re: BUG #15007: LIMIT not respected in sub-queries
От | Tomas Vondra |
---|---|
Тема | Re: BUG #15007: LIMIT not respected in sub-queries |
Дата | |
Msg-id | f74052d0-828c-428d-efe6-0d32d029fa21@2ndquadrant.com обсуждение исходный текст |
Ответ на | BUG #15007: LIMIT not respected in sub-queries (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #15007: LIMIT not respected in sub-queries
(Will Storey <will@summercat.com>)
|
Список | pgsql-bugs |
Hi, On 01/11/2018 10:16 PM, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 15007 > Logged by: Will Storey > Email address: will@summercat.com > PostgreSQL version: 10.1 > Operating system: Ubuntu 16.04 > Description: > > Hello, > > I am not sure this is a bug. But it is surprising to me and seems to > contradict the documentation in terms of join nesting. > > I have a SELECT query with a sub-SELECT in it. The sub-SELECT has a LIMIT > clause. I've found that sometimes I receive more rows (at most one extra in > my testing) than the LIMIT, where I expected only as many rows as the LIMIT. > This depends on the query plan. With some plans it never happens, and with > others it happens frequently. > > In looking into this behaviour, I came across hints that this is a known > quirk. I found bug reports related specifically to UPDATE/DELETE that sound > similar to this, but no mention that the behaviour can happen with SELECT: > > https://dba.stackexchange.com/questions/69471/postgres-update-limit-1?noredirect=1&lq=1 > (note the comments on the accepted answer) > https://www.postgresql.org/message-id/1399649764731-5803406.post%40n5.nabble.com > (and the thread) > https://www.postgresql.org/message-id/1385918761589-5781081.post%40n5.nabble.com > > This happens with both PostgreSQL 10.1 on Ubuntu 16.04 (from the PostgreSQL > repos: PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu > 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit) as well as on PostgreSQL > 9.6.5 (where I initially encountered the behaviour). > I think you're right those issues are related - a change of plan may affect the results in a somewhat non-deterministic way. In the cases you linked it's UPDATE on additional rows, in your case it's more about random() during rescans. I'll get to that in a minute. > Unfortunately my test case is not very clean and it is somewhat long, so > I've put it in a gist on GitHub: > > https://gist.github.com/horgh/f3e8ede81d866844e7d162d677968bf0 > > The SELECT query (run by the Perl program) quickly prints out that it > receives 6 rows. > > As you can see in the EXPLAIN ANALYZE output, the innermost Nested Loop has > loops > 1. I believe this is the cause of the behaviour. If I tweak the test > to have a plan where that node runs before the Seq Scan, there are never > more than 5 rows. > I went through the test case, and I think I know what's going on. The script did not reproduce the issue for me, but I think I've been able to tweak the query to use a plan producing more issues. The key is to force a particular join order, and disable operations that would materialize intermediate results of the t1 scan. That is, we want something like this: -> nestloop -> nestloop t2 + t3 -> seq scan t1 Firstly, I've replaced the dynamic random() condition with a static one: random() <= 0.5 this is not strictly necessary, but it simplifies the plan. We need the random() call though, as I'll explain later. Then I've disabled a bunch of plan nodes that would materialize results of the t1 seq scan: set enable_material = off; set enable_sort = off; set enable_hashjoin = off; And finally, I've disabled join reordering by setting set join_collapse_limit = 1; Now, if you rewrite the query like this (which essentially just forces a particular join order, when combined with join_collapse_limit=1, nothing else): explain analyze SELECT * FROM (t3 JOIN t2 USING (t2_id)) JOIN (SELECT * FROM t1 WHERE t1_id IS NOT NULL AND t1_id < 100 AND t1_val LIKE 'h%' AND random() <= 0.5 LIMIT 5 ) AS t1 ON t3.t1_id = t1.t1_id WHERE t2.t2_val LIKE 'he%'; you will get plans like this: QUERY PLAN ------------------------------------------------------------------------ Nested Loop (cost=0.00..31.41 rows=5 width=21) (actual time=0.052..1.825 rows=7 loops=1) Join Filter: (t3.t1_id = t1.t1_id) Rows Removed by Join Filter: 73 -> Nested Loop (cost=0.00..16.10 rows=16 width=14) (actual time=0.030..0.917 rows=16 loops=1) Join Filter: (t3.t2_id = t2.t2_id) Rows Removed by Join Filter: 160 -> Seq Scan on t2 (cost=0.00..1.14 rows=11 width=10) (actual time=0.015..0.041 rows=11 loops=1) Filter: ((t2_val)::text ~~ 'he%'::text) -> Seq Scan on t3 (cost=0.00..1.16 rows=16 width=8) (actual time=0.003..0.037 rows=16 loops=11) -> Limit (cost=0.00..0.84 rows=5 width=7) (actual time=0.008..0.037 rows=5 loops=16) -> Seq Scan on t1 (cost=0.00..1.52 rows=9 width=7) (actual time=0.004..0.016 rows=5 loops=16) Filter: ((t1_id IS NOT NULL) AND (t1_id < 100) AND ((t1_val)::text ~~ 'h%'::text) AND (random() <= '0.5'::double precision)) Rows Removed by Filter: 5 Planning time: 0.625 ms Execution time: 1.911 ms (15 rows) This should be equivalent to the original query, and should produce the same results (modulo random() of course). But notice it actually does produce 7 rows! Nested Loop (cost=0.00..31.41 rows=5 width=21) (actual time=0.052..1.825 rows=7 loops=1) ^ The problem is that it ends up executing the sequential scan on t1 repeatedly (because it's the inner relation in a nested loop), and because random() is volatile, the results of the scan are likely different. Each rescan individually still respects the LIMIT, but when combined result may be larger - there may be more unique IDs, matching additional rows from the other tables. > > I believe a better way to write this query would be to use a CTE. > Yes, that will stabilize the output of the random() function, eliminating the nondeterminism during rescans. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления:
Предыдущее
От: "David G. Johnston"Дата:
Сообщение: Re: BUG #15006: "make check" error if current user is "user"
Следующее
От: Tom LaneДата:
Сообщение: Re: BUG #15006: "make check" error if current user is "user"