RE: Troubleshooting a segfault and instance crash

Поиск
Список
Период
Сортировка
От Jan Bilek
Тема RE: Troubleshooting a segfault and instance crash
Дата
Msg-id SYXPR01MB0701A7C2340B64AF661408C0B5DF0@SYXPR01MB0701.ausprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Troubleshooting a segfault and instance crash  (Pavel Stehule <pavel.stehule@gmail.com>)
Ответы Re: Troubleshooting a segfault and instance crash
Список pgsql-general
Hi Blair, Pavel,

we are using procedure described in https://access.redhat.com/solutions/4896  to automate crash detail collection for our production systems on RHEL 7.

Perhaps something like this can help on your side.

Kind Regards,
Jan

On 2018-03-09 04:35:05+10:00 Pavel Stehule wrote:
 
 
2018-03-08 19:16 GMT+01:00 Blair Boadway <bboadway@abebooks.com>:

Hi Pavel,  

I don’t have a core yet, the only way I have now is to intentionally crash the prod system a couple of times.  Haven’t resorted to that yet.

hard to help without backtrace - and then you need core dump
 
 

Interesting you mentioned pgaudit—it is installed on this system because that is a our standard installation but on this particular system we haven’t yet needed audits so the audit role is ‘empty’.  (And on a different system with same installation and heavy of audit we’ve seen no segfaults)

other extensions are simply or without relation to DDL or well known. So pgaudit is best candidate - but the error can be anywhere
 
Regards
 
Pavel

On this system  

pgaudit.role = 'auditor'

pgaudit.log_parameter = off

pgaudit.log_catalog = off

pgaudit.log_statement_once = on

pgaudit.log_level = log  

select * from information_schema.role_table_grants where grantee = 'auditor';

(0 rows)  

thanks, Blair  

From: Pavel Stehule <pavel.stehule@gmail.com>
Date: Thursday, March 8, 2018 at 9:49 AM
To: Blair Boadway <bboadway@abebooks.com>
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Subject: Re: Troubleshooting a segfault and instance crash

Hi

 

2018-03-08 18:40 GMT+01:00 Blair Boadway <bboadway@abebooks.com>:

Hello,  

We’re seeing an occasional segfault on a particular database  

Mar  7 14:46:35 pgprod2 kernel:postgres[29351]: segfault at 0 ip 000000302f32868a sp 00007ffcf1547498 error 4 in libc-2.12.so[302f200000+18a000]

Mar  7 14:46:35 pgprod2 POSTGRES[21262]: [5] user=,db=,app=client= LOG:  server process (PID 29351) was terminated by signal 11: Segmentation fault  

It crashes the database, though it starts again on its own without any apparent issues.  This has happened 3 times in 2 months and each time the segfault error and memory address is the same. We’ve only seen it on one database, though we’ve seen it on both hosts of primary/standby setup—we switched over primary to other host and got a segfault there, which seems to eliminate a hardware issue.  Oddly the database has no issues for normal DML workloads (it is a moderately busy prod oltp system) but the segfault has happened very shortly after DML changes are made.  Most recently it happened while running a series of grants for new db users we were deploying (ie. running a sql script from psql on the primary host)  

grant usage on schema app to app_user1;

grant usage on schema app to app_user2;

...  

Our set up is

RHEL 6.9  - 2.6.32-696.16.1.el6.x86_64

PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit

Extensions - pg_cron,repmgr_funcs,pgaudit,pg_stat_statements,pg_hint_plan,pglogical  

So far can’t reproduce on a test system, have just added some OS config to collect core from the OS but haven’t collected a core yet.  There isn’t any particular config change or extension that we can link to the problem, this is a system that has run for months without problems since last config changes.  Appreciate any ideas.

can you get core dump? It can be pgaudit bug maybe? It is complex extension.

Regards

Pavel

 

Regards,

Blair

В списке pgsql-general по дате отправления:

Предыдущее
От: Dave Cramer
Дата:
Сообщение: Re: JDBC connectivity issue
Следующее
От: Bjørn T Johansen
Дата:
Сообщение: Re: Authentication?