Re: Consider parallel for lateral subqueries with limit
От | Brian Davis |
---|---|
Тема | Re: Consider parallel for lateral subqueries with limit |
Дата | |
Msg-id | a50766a4-a927-41c4-984c-76e513b6d1c4@www.fastmail.com обсуждение исходный текст |
Ответ на | Consider parallel for lateral subqueries with limit (James Coleman <jtc331@gmail.com>) |
Список | pgsql-hackers |
> Note that near the end of grouping planner we have a similar check: > > if (final_rel->consider_parallel && root->query_level > 1 && > !limit_needed(parse)) > > guarding copying the partial paths from the current rel to the final > rel. I haven't managed to come up with a test case that exposes that Played around with this a bit, here's a non-correlated subquery that gets us to that if statement DROP TABLE IF EXISTS foo; CREATE TABLE foo (bar int); INSERT INTO foo (bar) SELECT g FROM generate_series(1, 10000) AS g; SELECT ( SELECT bar FROM foo LIMIT 1 ) AS y FROM foo; I also was thinking about the LATERAL part. I couldn't think of any reason why the uncorrelated subquery's results would need to be shared and therefore the same, whenwe'll be "looping" over each row of the source table, running the subquery anew for each, conceptually. But then I tried this... test=# CREATE TABLE foo (bar int); CREATE TABLE test=# test=# INSERT INTO foo (bar) test-# SELECT test-# g test-# FROM test-# generate_series(1, 10) AS g; INSERT 0 10 test=# test=# test=# SELECT test-# foo.bar, test-# lat.bar test-# FROM test-# foo JOIN LATERAL ( test(# SELECT test(# bar test(# FROM test(# foo AS foo2 test(# ORDER BY test(# random() test(# LIMIT 1 test(# ) AS lat ON true; bar | bar -----+----- 1 | 7 2 | 7 3 | 7 4 | 7 5 | 7 6 | 7 7 | 7 8 | 7 9 | 7 10 | 7 (10 rows) As you can see, random() is only called once. If postgres were supposed to be running the subquery for each source row,conceptually, it would be a mistake to cache the results of a volatile function like random(). The docs say: "When a FROM item contains LATERAL cross-references, evaluation proceeds as follows: for each row of the FROMitem providing the cross-referenced column(s), or set of rows of multiple FROM items providing the columns, the LATERALitem is evaluated using that row or row set's values of the columns. The resulting row(s) are joined as usual withthe rows they were computed from. This is repeated for each row or set of rows from the column source table(s)." They don't say what happens with LATERAL when there aren't cross-references though. As we expect, adding one does show random()being called once for each source row. test=# SELECT test-# foo.bar, test-# lat.bar test-# FROM test-# foo JOIN LATERAL ( test(# SELECT test(# bar test(# FROM test(# foo AS foo2 test(# WHERE test(# foo2.bar < foo.bar + 100000 test(# ORDER BY test(# random() test(# LIMIT 1 test(# ) AS lat ON true; bar | bar -----+----- 1 | 5 2 | 8 3 | 3 4 | 4 5 | 5 6 | 5 7 | 1 8 | 3 9 | 7 10 | 3 (10 rows) It seems like to keep the same behavior that exists today, results of LATERAL subqueries would need to be the same if theyaren't correlated, and so you couldn't run them in parallel with a limit if the order wasn't guaranteed. But I'll bethe first to admit that it's easy enough for me to miss a key piece of logic on something like this, so I could be wayoff base too.
В списке pgsql-hackers по дате отправления:
Предыдущее
От: David RowleyДата:
Сообщение: Re: Hybrid Hash/Nested Loop joins and caching results from subplans