Re: Consider parallel for lateral subqueries with limit

Поиск
Список
Период
Сортировка
От James Coleman
Тема Re: Consider parallel for lateral subqueries with limit
Дата
Msg-id CAAaqYe_ssUwJmYkdxO0oKqsrxPB0Ktndu7i5YiThjCor7+mqOg@mail.gmail.com
обсуждение исходный текст
Ответ на Consider parallel for lateral subqueries with limit  (James Coleman <jtc331@gmail.com>)
Ответы Re: Consider parallel for lateral subqueries with limit  (James Coleman <jtc331@gmail.com>)
Список pgsql-hackers
On Mon, Nov 30, 2020 at 7:00 PM James Coleman <jtc331@gmail.com> wrote:
>
> I've been investigating parallelizing certain correlated subqueries,
> and during that work stumbled across the fact that
> set_rel_consider_parallel disallows parallel query on what seems like
> a fairly simple case.
>
> Consider this query:
>
> select t.unique1
> from tenk1 t
> join lateral (select t.unique1 from tenk1 offset 0) l on true;
>
> Current set_rel_consider_parallel sets consider_parallel=false on the
> subquery rel because it has a limit/offset. That restriction makes a
> lot of sense when we have a subquery whose results conceptually need
> to be "shared" (or at least be the same) across multiple workers
> (indeed the relevant comment in that function notes that cases where
> we could prove a unique ordering would also qualify, but punts on
> implementing that due to complexity). But if the subquery is LATERAL,
> then no such conceptual restriction.
>
> If we change the code slightly to allow considering parallel query
> even in the face of LIMIT/OFFSET for LATERAL subqueries, then our
> query above changes from the following plan:
>
>  Nested Loop
>    Output: t.unique1
>    ->  Gather
>          Output: t.unique1
>          Workers Planned: 2
>          ->  Parallel Index Only Scan using tenk1_unique1 on public.tenk1 t
>                Output: t.unique1
>    ->  Gather
>          Output: NULL::integer
>          Workers Planned: 2
>          ->  Parallel Index Only Scan using tenk1_hundred on public.tenk1
>                Output: NULL::integer
>
> to this plan:
>
>  Gather
>    Output: t.unique1
>    Workers Planned: 2
>    ->  Nested Loop
>          Output: t.unique1
>          ->  Parallel Index Only Scan using tenk1_unique1 on public.tenk1 t
>                Output: t.unique1
>          ->  Index Only Scan using tenk1_hundred on public.tenk1
>                Output: NULL::integer
>
> The code change itself is quite simple (1 line). As far as I can tell
> we don't need to expressly check parallel safety of the limit/offset
> expressions; that appears to happen elsewhere (and that makes sense
> since the RTE_RELATION case doesn't check those clauses either).
>
> If I'm missing something about the safety of this (or any other
> issue), I'd appreciate the feedback.

Note that near the end of grouping planner we have a similar check:

if (final_rel->consider_parallel && root->query_level > 1 &&
        !limit_needed(parse))

guarding copying the partial paths from the current rel to the final
rel. I haven't managed to come up with a test case that exposes that
though since simple examples like the one above get converted into a
JOIN, so we're not in grouping_planner for a subquery. Making the
subquery above correlated results in us getting to that point, but
isn't currently marked as parallel safe for other reasons (because it
has params), so that's not a useful test. I'm not sure if there are
cases where we can't convert to a join but also don't involve params;
haven't thought about it a lot though.

James



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Cost overestimation of foreign JOIN
Следующее
От: Anastasia Lubennikova
Дата:
Сообщение: Re: Terminate the idle sessions