Re: segfault with incremental sort

Поиск

Список

Период

Сортировка

От	James Coleman
Тема	Re: segfault with incremental sort
Дата	25 ноября 2020 г. 05:27:18
Msg-id	CAAaqYe8NtzyMny3bgZqNOrUsFPOVkZ0jv6tQ3GJLFwzVgVkNUw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: segfault with incremental sort (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	Re: segfault with incremental sort (Amit Kapila <amit.kapila16@gmail.com>)
Список	pgsql-bugs

Дерево обсуждения

On Mon, Nov 23, 2020 at 9:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Nov 23, 2020 at 7:23 PM James Coleman <jtc331@gmail.com> wrote:
> >
> > On Mon, Nov 23, 2020 at 7:56 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sat, Nov 21, 2020 at 1:21 AM James Coleman <jtc331@gmail.com> wrote:
> > >
> > > IIRC, the reason was that for correlated subplans each time we need to
> > > send the param for execution to workers, and for that, we don't have
> > > an implementation yet. Basically, if the param size changes each time
> > > (say for varchar type of params), the amount of memory required would
> > > be different each time. It is not that we can't implement it but I
> > > think we have left it originally because we were not sure of the
> > > number of cases it can benefit and certainly it needs some more work.
> > > I am not following this and related discussions closely but can
> > > explain to me why you think the query/plan you are talking about is
> > > safe with respect to the above hazard?
> >
> > Thanks for the explanation.
> >
> > In this particular case we're not dealing with variable length fields
> > (it's an int), so that particular problem wouldn't inherently apply
> > (though I understand the generalized rule).
> >
> > But I'm not really quite sure how sending params to workers (from the
> > leader I assume) is relevant here. In another thread you can see the
> > full plan [1], but effectively we have:
> >
> > Gather Merge
> >   Sort
> >     Nested Loop
> >       Bitmap Heap Scan
> >       Index Only Scan
> >       Subplan 1
> >       Subplan 2
> >
> > where the two subplans are returning the single result of a correlated
> > subquery (in a SELECT). As I understand it this doesn't require any
> > coordination with/from the leader at all; all of the information in
> > this case should actually be local to the individual worker. Each
> > worker needs to, for each tuple in its index scan, do another index
> > lookup based on that tuple.
> >
>
> Yeah, this doesn't require any communication between leader and worker
> but to enable it for such cases, we need to identify when we have
> correlated subquery where we can make it parallel-safe and then
> probably allow it. IIRC, we only allow cases of un-correlated subplans
> and initplans when those are at the same or a level above the Gather
> (Merge) node.

In principle do you see any reason why given:

select distinct
  unique1,
  (select t.unique1 from tenk1 where tenk1.unique1 = t.unique1)
from tenk1 t, generate_series(1, 1000);

it wouldn't be valid to go from the current (with patch from [1]) plan of:

 Unique
   ->  Sort
         Sort Key: t.unique1, ((SubPlan 1))
         ->  Gather
               Workers Planned: 2
               ->  Nested Loop
                     ->  Parallel Index Only Scan using tenk1_unique1 on tenk1 t
                     ->  Function Scan on generate_series
               SubPlan 1
                 ->  Index Only Scan using tenk1_unique1 on tenk1
                       Index Cond: (unique1 = t.unique1)

to this plan?

 Unique
   ->  Gather Merge
         Workers Planned: 2
         ->  Sort
               Sort Key: t.unique1, ((SubPlan 1))
               ->  Nested Loop
                     ->  Parallel Index Only Scan using tenk1_unique1 on tenk1 t
                     ->  Function Scan on generate_series
                     SubPlan 1
                       ->  Index Only Scan using tenk1_unique1 on tenk1
                             Index Cond: (unique1 = t.unique1)

My intuition is that it would be safe (not just in the parallel sense
but also in the sense of safety for pushing down expression evaluation
into the target list to push the sort down), but I want to make sure
that's correct before proceeding.

But I also have a bigger question:

I've been stepping through this in the debugger, and have noticed that
the only reason we aren't selecting the second plan (again, with the
fix from [1]) is that the plan for "Index Only Scan using
tenk1_unique1 on tenk1" when passed into build_subplan has
"plan->parallel_safe = false". Earlier "consider_parallel = false" has
been set on the path by set_rel_consider_parallel because
is_parallel_safe doesn't find any safe_param_ids, and thus the
correlated subquery having a PARAM_EXEC param fails the safe_param_ids
member check in max_parallel_hazard_walker since the param isn't from
this level.

So far that's correct, but then when we come back around to checking
parallel safety of the expression for a parallel sort we call
is_parallel_safe, and the max_parallel_hazard_walker SubPlan case
checks subplan->parallel_safe, finds it false
("max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)" returns
true), and thus doesn't proceed to actually checking the subplan's
testexpr or args.

That seems a bit of miss to me, because while such a subplan is
parallel unsafe in isolation, it is not in this context. That is, if
we re-run the checks on testexpr and args in this context, then we'll
not find anything parallel unsafe about them (the param can safely be
found in the current context), so we'll be free to execute the subplan
in workers.

> Now, I think it is quite possible that in PG-13 we have
> unintentionally allowed plans containing correlated sub-queries but
> missed to take care of it at all required places.

Yes, we're discussing a fix in [1] for PG13's missing checks for
parallel safety here.

James

1: https://www.postgresql.org/message-id/CAAaqYe8cK3g5CfLC4w7bs%3DhC0mSksZC%3DH5M8LSchj5e5OxpTAg%40mail.gmail.com

В списке pgsql-bugs по дате отправления:

Предыдущее

От: Tom Lane
Дата: 24 ноября 2020 г., 21:59:02
Сообщение: Re: BUG #16739: Temporary files not deleting from data folder on disk

Следующее

От: Manoj Kumar
Дата: 25 ноября 2020 г., 07:39:37
Сообщение: Re: BUG #16739: Temporary files not deleting from data folder on disk

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: segfault with incremental sort

Предыдущее

Следующее