Re: SegFault on 9.6.14

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: SegFault on 9.6.14
Дата
Msg-id CAA4eK1LE=Nnrc007sS00_nNt0eYdOejmikKcZ7S92mXqFi7n=A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: SegFault on 9.6.14  (vignesh C <vignesh21@gmail.com>)
Список pgsql-hackers
On Wed, Aug 7, 2019 at 3:15 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, Jul 31, 2019 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jul 31, 2019 at 12:05 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > The other idea we had discussed which comes closer to adopting Tom's
> > position was that during ExecShutdownNode, we just destroy parallel
> > workers, collect instrumentation data and don't destroy the parallel
> > context.  The parallel context could be destroyed in ExecEndNode
> > (ExecEndGather(Merge)) code path.  The problem with this idea is that
> > ExitParallelMode doesn't expect parallel context to be active.  Now,
> > we can either change the location of Exit/EnterParallelMode or relax
> > that restriction.  As mentioned above that restriction appears good to
> > me, so I am not in favor of changing it unless we have some other
> > solid way to install it.   I am not sure if this idea is better than
> > other approaches we are discussing.
> >
> >
> I have made a patch based on the above lines.
> I have tested the scenarios which Thomas had shared in the earlier
> mail and few more tests based on Thomas's tests.
> I'm not sure if we will be going ahead with this solution or not.
> Let me know your opinion on the same.
> If you feel this approach is ok, we can add few of this tests into pg tests.
>

This patch is on the lines of what I had in mind, but I see some
problems in this which are explained below.  The other approach to fix
this was to move Enter/ExitParallelMode to the outer layer.  For ex.,
can we enter in parallel mode during InitPlan and exit from it during
ExecEndPlan?  That might not be good to backpatch, but it might turn
out to be more robust than the current approach.

Few comments on your patch:
1.
@@ -569,13 +569,6 @@ ExecParallelCleanup(ParallelExecutorInfo *pei)
  if (pei->instrumentation)
  ExecParallelRetrieveInstrumentation(pei->planstate,
  pei->instrumentation);
-
- if (pei->pcxt != NULL)
- {
- DestroyParallelContext(pei->pcxt);
- pei->pcxt = NULL;
- }
- pfree(pei);
 }

Here, you have just removed parallel context-free, but I think we
can't detach from parallel context area here as well, otherwise, it
will create similar problems in other cases. Note, that we create the
area only in ExecInitParallelPlan and just reuse it in
ExecParallelReinitialize.  So, if we allow getting it destroyed in
ExecParallelCleanup (which is called via ExecShutdownNode), we won't
have access to it in rescan code path.  IT is better if we have a test
for the same as well.  I think we should only retrieve the
instrumentation information here.  Also, if we do that, then we might
also want to change function name and comments atop of this function.

2.
ExecEndGather(GatherState *node)
 {
+ ParallelExecutorInfo *pei = node->pei;
  ExecShutdownGather(node);
+
+ if (pei != NULL)
+ {
+ if (pei->pcxt != NULL)
+ {
+ DestroyParallelContext(pei->pcxt);
+ pei->pcxt = NULL;
+ }
+
+ pfree(pei);
+ node->pei = NULL;
+ }

I feel that it is better you move a collection of instrumentation
information from ExecParallelCleanup to a separate function and then
use ExecParallelCleanup here.

3.
extern bool IsInParallelMode(void);
+extern bool getParallelModeLevel(void);

To be consistent, it better to name the function as GetParallelModeLevel.

4.
@@ -1461,6 +1461,8 @@ ExecEndPlan(PlanState *planstate, EState *estate)
  ExecEndNode(subplanstate);
  }

+ if (estate->es_use_parallel_mode)
+ Assert (getParallelModeLevel() > 0 || !ParallelContextActive());

Add some comments here to explain about this Assert.  I am not sure if
this is correct because it won't fail even if the parallel mode is
non-zero and there is no parallel
context.  At this stage, we must have exited the parallel mode.

5.
explain analyze
   select count(*) from join_foo
    left join (select b1.id from join_bar b1
 limit 1000) ss

All the tests in your test file use left join to reproduce the issue,
but I think it should be reproducible with inner join as well.  This
comment is not that your test case is wrong, but I want to see if we
can further simplify it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: Proposal for Signal Detection Refactoring
Следующее
От: Andres Freund
Дата:
Сообщение: Re: POC: converting Lists into arrays