Обсуждение: The segmentation fault of Postgresql 9.6.24

Поиск
Список
Период
Сортировка

The segmentation fault of Postgresql 9.6.24

От
Kevin Wang
Дата:
Hello hackers,

Our prod databases are still PG 9.6.24.  We have one primary plus 3 stream replications that are all working well for a long time. However, when I promoted one standby database to the primary role, we the the below error message from the PG log:
=======================
2023-12-01 06:57:35.541 UTC,,,1553,,6569738f.611,639,,2023-12-01 05:47:59 UTC,,0,LOG,00000,"server process (PID 31839) was terminated by signal 11: Segmentation fault","Failed process was running: UPDATE xxxx SET employee_id = (9489910) WHERE id = (1162120221)",,,,,,,,""



Here is the message from dmesg:
=======================
[ 3676.406247] postgres[27789]: segfault at 0 ip 00005618bf79bfe4 sp 00007ffcd9a75dc8 error 4 in postgres[5618bf3db000+3f7000]
[ 3676.406265] Code: ff ff 48 83 c2 40 ff d0 e8 19 9c ff ff e8 44 0f c4 ff 0f 1f 40 00 f3 0f 1e fa e9 27 be cc ff 0f 1f 80 00 00 00 00 f3 0f 1e fa <0f> b6 17 89 d1
 83 e1 03 80 f9 02 74 0f 80 fa 01 74 0a 48 89 f8 c3
[ 3715.937850] postgres[27928]: segfault at 0 ip 00005618bf79bfe4 sp 00007ffcd9a75dc8 error 4 in postgres[5618bf3db000+3f7000]
[ 3715.937858] Code: ff ff 48 83 c2 40 ff d0 e8 19 9c ff ff e8 44 0f c4 ff 0f 1f 40 00 f3 0f 1e fa e9 27 be cc ff 0f 1f 80 00 00 00 00 f3 0f 1e fa <0f> b6 17 89 d1
 83 e1 03 80 f9 02 74 0f 80 fa 01 74 0a 48 89 f8 c3
[ 3732.278367] postgres[28212]: segfault at 0 ip 00005618bf79bfe4 sp 00007ffcd9a75dc8 error 4 in postgres[5618bf3db000+3f7000]
[ 3732.278384] Code: ff ff 48 83 c2 40 ff d0 e8 19 9c ff ff e8 44 0f c4 ff 0f 1f 40 00 f3 0f 1e fa e9 27 be cc ff 0f 1f 80 00 00 00 00 f3 0f 1e fa <0f> b6 17 89 d1
 83 e1 03 80 f9 02 74 0f 80 fa 01 74 0a 48 89 f8 c3

Error 4 is the error related to unmapping memory. But the database works well for long time as the standby database. After it was promoted to the primary role, no memory parameter change at all.

Could you give us some hint where to fix this issue?

Regards,

Kevin

Re: The segmentation fault of Postgresql 9.6.24

От
Tomas Vondra
Дата:
On 12/28/23 21:09, Kevin Wang wrote:
> Hello hackers,
> 
> Our prod databases are still PG 9.6.24.  We have one primary plus 3
> stream replications that are all working well for a long time. 

Everything is working well until the day it breaks ...

> However, when I promoted one standby database to the primary role,
> we the the below error message from the PG log:
> =======================
> 2023-12-01 06:57:35.541 UTC,,,1553,,6569738f.611,639,,2023-12-01
> 05:47:59 UTC,,0,LOG,00000,"server process (PID 31839) was terminated by
> signal 11: Segmentation fault","Failed process was running: UPDATE xxxx
> SET employee_id = (9489910) WHERE id = (1162120221)",,,,,,,,""
> 
> 
> 
> Here is the message from dmesg:
> =======================
> [ 3676.406247] postgres[27789]: segfault at 0 ip 00005618bf79bfe4 sp
> 00007ffcd9a75dc8 error 4 in postgres[5618bf3db000+3f7000]
> [ 3676.406265] Code: ff ff 48 83 c2 40 ff d0 e8 19 9c ff ff e8 44 0f c4
> ff 0f 1f 40 00 f3 0f 1e fa e9 27 be cc ff 0f 1f 80 00 00 00 00 f3 0f 1e
> fa <0f> b6 17 89 d1
>  83 e1 03 80 f9 02 74 0f 80 fa 01 74 0a 48 89 f8 c3
> [ 3715.937850] postgres[27928]: segfault at 0 ip 00005618bf79bfe4 sp
> 00007ffcd9a75dc8 error 4 in postgres[5618bf3db000+3f7000]
> [ 3715.937858] Code: ff ff 48 83 c2 40 ff d0 e8 19 9c ff ff e8 44 0f c4
> ff 0f 1f 40 00 f3 0f 1e fa e9 27 be cc ff 0f 1f 80 00 00 00 00 f3 0f 1e
> fa <0f> b6 17 89 d1
>  83 e1 03 80 f9 02 74 0f 80 fa 01 74 0a 48 89 f8 c3
> [ 3732.278367] postgres[28212]: segfault at 0 ip 00005618bf79bfe4 sp
> 00007ffcd9a75dc8 error 4 in postgres[5618bf3db000+3f7000]
> [ 3732.278384] Code: ff ff 48 83 c2 40 ff d0 e8 19 9c ff ff e8 44 0f c4
> ff 0f 1f 40 00 f3 0f 1e fa e9 27 be cc ff 0f 1f 80 00 00 00 00 f3 0f 1e
> fa <0f> b6 17 89 d1
>  83 e1 03 80 f9 02 74 0f 80 fa 01 74 0a 48 89 f8 c3
> 
> Error 4 is the error related to unmapping memory. But the database works
> well for long time as the standby database. After it was promoted to the
> primary role, no memory parameter change at all.
> 

Why do you think "4" means unmapping memory? 4 is error code for
"user-mode access" (i.e. not invalid memory access from kernel).

> Could you give us some hint where to fix this issue?
> 

This could be pretty much anything, and without seeing where exactly it
fails it's impossible to say. I see you apparently hit the issue
repeatedly, and tall the information is *exactly* the same - addresses,
code, etc. Try decoding the addresses with addr2line, or even better get
a proper backtrace - either from a core file, or using gdb.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The segmentation fault of Postgresql 9.6.24

От
Bruce Momjian
Дата:
On Thu, Dec 28, 2023 at 11:20:18PM +0100, Tomas Vondra wrote:
> This could be pretty much anything, and without seeing where exactly it
> fails it's impossible to say. I see you apparently hit the issue
> repeatedly, and tall the information is *exactly* the same - addresses,
> code, etc. Try decoding the addresses with addr2line, or even better get
> a proper backtrace - either from a core file, or using gdb.

Also, Postgres 9.6 went out of support on November 11, 2021:

    https://www.postgresql.org/support/versioning/

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.