Re: Cancelling parallel query leads to segfault

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Cancelling parallel query leads to segfault
Дата
Msg-id 20180214185651.277g7o3xdzys624d@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Cancelling parallel query leads to segfault  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Ответы Re: Cancelling parallel query leads to segfault
Список pgsql-hackers
On 2018-02-12 15:43:49 -0500, Peter Eisentraut wrote:
> On 2/6/18 12:06, Andres Freund wrote:
> > On 2018-02-06 12:01:08 -0500, Peter Eisentraut wrote:
> >> On 2/1/18 20:35, Andres Freund wrote:
> >>> On February 1, 2018 11:13:06 PM GMT+01:00, Peter Eisentraut
> >>> <peter.eisentraut@2ndquadrant.com> wrote:
> >>>> Here is a patch to implement that idea. Do you have a way to test it
> >>>> repeatedly, or do you just randomly cancel queries?
> >>>
> >>> For me cancelling the long running parallel queries I tried reliably
> >>> triggers the issue. I encountered it while cancelling tpch q1 during JIT
> >>> work.
> >>
> >> Why does canceling a query result in elog(FATAL)?  It should just be
> >> elog(ERROR), which wouldn't trigger this issue.
> > 
> > The workers are shut down.
> 
> I have used the setup mentioned in
> <https://www.postgresql.org/message-id/6a909374-2602-7136-8c70-397330a418f3%402ndquadrant.com>
> to reproduce this, without success.  I have tried statement_timeout and
> manual cancels.  Any other ideas?
> 
> I don't doubt that the issue exists, but it would be nice to be able to
> reproduce it.

With your example I can reliably trigger the issue if I shut down the
server while the query is running:

^C2018-02-14 10:54:06.786 PST [22261][] LOG:  received fast shutdown request
2018-02-14 10:54:06.786 PST [22261][] LOG:  aborting any active transactions
2018-02-14 10:54:06.786 PST [22275][4/3] FATAL:  terminating connection due to administrator command
2018-02-14 10:54:06.786 PST [22275][4/3] STATEMENT:  select from t1 where a = 55;
2018-02-14 10:54:06.786 PST [22274][5/3] FATAL:  terminating connection due to administrator command
2018-02-14 10:54:06.786 PST [22274][5/3] STATEMENT:  select from t1 where a = 55;
2018-02-14 10:54:06.786 PST [22271][3/2] FATAL:  terminating connection due to administrator command
2018-02-14 10:54:06.786 PST [22271][3/2] STATEMENT:  select from t1 where a = 55;
2018-02-14 10:54:06.787 PST [22261][] LOG:  background worker "logical replication launcher" (PID 22268) exited with
exitcode 1
 
2018-02-14 10:54:06.787 PST [22261][] LOG:  background worker "parallel worker" (PID 22274) exited with exit code 1
2018-02-14 10:54:06.787 PST [22261][] LOG:  background worker "parallel worker" (PID 22275) exited with exit code 1
2018-02-14 10:54:06.788 PST [22261][] LOG:  server process (PID 22271) was terminated by signal 11: Segmentation fault
2018-02-14 10:54:06.788 PST [22261][] DETAIL:  Failed process was running: select from t1 where a = 55;
2018-02-14 10:54:06.788 PST [22261][] LOG:  terminating any other active server processes
2018-02-14 10:54:06.789 PST [22285][] FATAL:  the database system is shutting down
2018-02-14 10:54:06.789 PST [22261][] LOG:  abnormal database system shutdown
2018-02-14 10:54:06.790 PST [22261][] LOG:  database system is shut down

but only if I don't use EXPLAIN ANALYZE. Not quite sure what that is
about.

Your patch appears to fix the issue.

Greetings,

Andres Freund


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Rearm statement_timeout after each executedquery.
Следующее
От: rqtx
Дата:
Сообщение: [HACKERS] Inserting data into a new catalog table via source code