Обсуждение: Segmentation fault with parallelism PG 10.4

Поиск
Список
Период
Сортировка

Segmentation fault with parallelism PG 10.4

От
"Heinemann, Manfred (IMS)"
Дата:

Hi All,

 

We recently upgraded to Postgresql 10.4 from 9.6.7.

We allow some user querying against the database and a user wrote sql that would repeatedly cause a segmentation fault. We can’t share the data but I will show what I can.

 

Here is an example from the postgres logs with the sql simplified but still throwing the error. It is not very well written but we can’t have a bad query bring down the instance.

 

<2018-06-12 08:59:28 PDT [9109]: [38-1] db=, user=, host=> LOG:  server process (PID 11471) was terminated by signal 11: Segmentation fault

<2018-06-12 08:59:28 PDT [9109]: [39-1] db=, user=, host=> DETAIL:  Failed process was running: SELECT p.vital_status, c.addr_at_dx_street_name

        FROM ctc c

        INNER JOIN patient p ON c.pat_id = p.pat_id

        LEFT JOIN ctc_registry cr ON c.ctc_id = cr.ctc_id

        LEFT JOIN facility_admission fa ON fa.ctc_id = cr.ctc_id AND fa.fac_id = cr.follow_up_hospital

        LEFT JOIN patient_reference_id pr ON pr.pat_id = p.pat_id AND pr.fac_id = cr.follow_up_hospital

        WHERE

        c.primary_site between 'C500' and 'C509'

        AND c.date_of_diagnosis_mm between case when c.date_of_diagnosis_yyyy = 2006 then 7 else 1 end and case when c.date_of_diagnosis_yyyy = 2015 then 6 else 12 end

        AND (c.sequence_number = '00'

        OR (c.sequence_number = '02'

        AND p.pat_id in (select pat_id

        from ctc

        where behavior_icdo3 = 2

        and sequence_number = '01'

        and deleted = 0

        and pat_id in (select pat_id

        from ctc

        where deleted = 0)))

        )

        ;

 

We changed max_parallel_workers_per_gather to 0 from the default of 2 and the sql runs fine. I attached the explain plans with parallel turned on and off to see if that helps.

We would obviously like to be able to run the system with parallel turned on.

 

Thanks,

Manfred

 




Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are not the addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender of the error.
Вложения

Re: Segmentation fault with parallelism PG 10.4

От
Tom Lane
Дата:
"Heinemann, Manfred (IMS)" <HeinemannM@imsweb.com> writes:
> We allow some user querying against the database and a user wrote sql that would repeatedly cause a segmentation
fault.We can't share the data but I will show what I can. 

The info you've provided is not of much help, but if you could show
a stack trace from the point of the segfault, that might narrow
things down.

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

            regards, tom lane


RE: Segmentation fault with parallelism PG 10.4

От
"Heinemann, Manfred (IMS)"
Дата:
This is what I see in a stack trace

Program received signal SIGSEGV, Segmentation fault.
0x000000000084dcf3 in pglz_decompress ()

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, June 12, 2018 1:45 PM
To: Heinemann, Manfred (IMS) <HeinemannM@imsweb.com>
Cc: pgsql-admin@lists.postgresql.org
Subject: Re: Segmentation fault with parallelism PG 10.4

"Heinemann, Manfred (IMS)" <HeinemannM@imsweb.com> writes:
> We allow some user querying against the database and a user wrote sql that would repeatedly cause a segmentation
fault.We can't share the data but I will show what I can. 

The info you've provided is not of much help, but if you could show a stack trace from the point of the segfault, that
mightnarrow things down. 

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

regards, tom lane

________________________________

Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are
notthe addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or
copyingof this communication is strictly prohibited. If you have received this e-mail in error, please notify the
senderof the error. 


RE: Segmentation fault with parallelism PG 10.4

От
"Heinemann, Manfred (IMS)"
Дата:
Is this stack trace helpful at all?
Is there anything else I could do to investigate?

Thanks,
Manfred

>This is what I see in a stack trace
>
>Program received signal SIGSEGV, Segmentation fault.
>0x000000000084dcf3 in pglz_decompress ()
>
>>-----Original Message-----
>>From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>>Sent: Tuesday, June 12, 2018 1:45 PM
>>To: Heinemann, Manfred (IMS) <HeinemannM@imsweb.com>
>>Cc: pgsql-admin@lists.postgresql.org
>>Subject: Re: Segmentation fault with parallelism PG 10.4
>>
>>"Heinemann, Manfred (IMS)" <HeinemannM@imsweb.com> writes:
>>> We allow some user querying against the database and a user wrote sql that would repeatedly cause a segmentation
fault.We can't share the data but I will show what I can. 
>>
>>The info you've provided is not of much help, but if you could show a stack trace from the point of the segfault,
thatmight narrow things down. 
>>
>>https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
>>
>>regards, tom lane
________________________________

Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are
notthe addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or
copyingof this communication is strictly prohibited. If you have received this e-mail in error, please notify the
senderof the error. 


________________________________

Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are
notthe addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or
copyingof this communication is strictly prohibited. If you have received this e-mail in error, please notify the
senderof the error. 


Re: Segmentation fault with parallelism PG 10.4

От
Tom Lane
Дата:
"Heinemann, Manfred (IMS)" <HeinemannM@imsweb.com> writes:
> Is this stack trace helpful at all?

No, because it isn't actually a stack trace, just the top frame.  See

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

            regards, tom lane


RE: Segmentation fault with parallelism PG 10.4

От
"Heinemann, Manfred (IMS)"
Дата:
Here is what I get for the stack trace, is it more helpful?

Program received signal SIGSEGV, Segmentation fault.
0x000000000084dcb0 in pglz_decompress ()
#0  0x000000000084dcb0 in pglz_decompress ()
#1  0x00000000004b8d2f in toast_decompress_datum ()
#2  0x0000000000793ce0 in numeric_eq ()
#3  0x00000000005e68eb in ExecInterpExpr ()
#4  0x00000000005ef589 in ExecScan ()
#5  0x0000000000608784 in ExecNestLoop ()
#6  0x00000000005fc6b5 in ExecGather ()
#7  0x00000000005feb35 in ExecHashJoin ()
#8  0x00000000006086d4 in ExecNestLoop ()
#9  0x00000000005e9b02 in standard_ExecutorRun ()
#10 0x000000000071305b in PortalRunSelect ()
#11 0x00000000007142d1 in PortalRun ()
#12 0x000000000071038b in exec_simple_query ()
#13 0x0000000000711589 in PostgresMain ()
#14 0x0000000000476399 in ServerLoop ()
#15 0x00000000006a9acc in PostmasterMain ()
#16 0x000000000062b58c in main ()

Thanks,
Manfred

>"Heinemann, Manfred (IMS)" <HeinemannM@imsweb.com> writes:
>> Is this stack trace helpful at all?
>
>No, because it isn't actually a stack trace, just the top frame.  See
>
>https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
>
>regards, tom lane

________________________________

Information in this e-mail may be confidential. It is intended only for the addressee(s) identified above. If you are
notthe addressee(s), or an employee or agent of the addressee(s), please note that any dissemination, distribution, or
copyingof this communication is strictly prohibited. If you have received this e-mail in error, please notify the
senderof the error. 


Re: Segmentation fault with parallelism PG 10.4

От
Tom Lane
Дата:
"Heinemann, Manfred (IMS)" <HeinemannM@imsweb.com> writes:
> Here is what I get for the stack trace, is it more helpful?
> Program received signal SIGSEGV, Segmentation fault.
> 0x000000000084dcb0 in pglz_decompress ()
> #0  0x000000000084dcb0 in pglz_decompress ()
> #1  0x00000000004b8d2f in toast_decompress_datum ()
> #2  0x0000000000793ce0 in numeric_eq ()
> #3  0x00000000005e68eb in ExecInterpExpr ()
> #4  0x00000000005ef589 in ExecScan ()
> #5  0x0000000000608784 in ExecNestLoop ()
> #6  0x00000000005fc6b5 in ExecGather ()
> #7  0x00000000005feb35 in ExecHashJoin ()
> #8  0x00000000006086d4 in ExecNestLoop ()
> #9  0x00000000005e9b02 in standard_ExecutorRun ()
> #10 0x000000000071305b in PortalRunSelect ()
> #11 0x00000000007142d1 in PortalRun ()
> #12 0x000000000071038b in exec_simple_query ()
> #13 0x0000000000711589 in PostgresMain ()
> #14 0x0000000000476399 in ServerLoop ()
> #15 0x00000000006a9acc in PostmasterMain ()
> #16 0x000000000062b58c in main ()

Hmm.  So (1) apparently we have some corrupt data in a numeric column,
and (2) it looks like the crash is happening in a parallel leader
process, not a worker process, which makes it pretty mystifying why
it's connected to parallelism at all.

But we're not going to be able to get far with that amount of info.
Is there any chance of extracting a self-contained test case?

            regards, tom lane