Обсуждение: Segmentation Fault

Поиск
Список
Период
Сортировка

Segmentation Fault

От
Benson Jin
Дата:
Hi All,

We are having a problem with our streaming replication read only node. It has crashed a few times with a couple of different reasons, mostly "segmentation fault". The latest log are listed below:

2012-05-30 23:56:37.385 UTC::: LOG:  server process (PID 19476) was terminated by signal 11: Segmentation fault
2012-05-30 23:56:37.385 UTC::: LOG:  terminating any other active server processes
2012-05-30 23:56:37.385 UTC:10.43.6.61:webmaster:panorama WARNING:  terminating connection because of crash of another server process
2012-05-30 23:56:37.385 UTC:10.43.6.61:webmaster:panorama DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2012-05-30 23:56:37.385 UTC:10.43.6.61:webmaster:panorama HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2012-05-30 23:56:37.385 UTC:10.43.6.81:webmaster:panorama WARNING:  terminating connection because of crash of another server process
2012-05-30 23:56:37.385 UTC:10.43.6.81:webmaster:panorama DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2012-05-30 23:56:37.385 UTC:10.43.6.81:webmaster:panorama HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2012-05-30 23:56:37.385 UTC:10.43.6.81:webmaster:panorama WARNING:  terminating connection because of crash of another server process
2012-05-30 23:56:37.385 UTC:10.43.6.81:webmaster:panorama DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2012-05-30 23:56:37.385 UTC:10.43.6.81:webmaster:panorama HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2012-05-30 23:56:37.575 UTC:10.43.6.81:webmaster:panorama FATAL:  the database system is in recovery mode


Our setup:
2x physical server - Dell PE R815, 64GB ECC RAM, 2 CPUs (12 cores each),  storing pgsql data on SAN backed volumes.
CentOS 5.6
PostgreSQL 9.0.8, compiled *without* int64 datetime.
Both servers are identically configured (or at least as much as we could ensure)
One is master, another is streaming read-only node.
The master runs two instances of postgreSQL, where the slave runs 5 instances of postgreSQL. 2 out of 5 are streaming replication from the master, rest 3 are streaming replication from other DB nodes. Those 2 instances serves clients as Read Only. The master node never had any crash so far. However, the 2 instances on slave have crashed 3 times by now, 1 time on one readonly instance, twice on another readonly instance. Above log was generated from one of the instances.

All three crashes happened when the database was doing vacuuming. we automatically purge some data every night, and run vacuum analyze right after that... Our the CPU load is generally on 40%-60% mark.

I have run a complete set of hardware diagnostics on the slave, with no faulty hardware detected. Can someone kindly shed some lights on me? I am not sure where to look into at this point....


Cheers,  
 

Bo Jin 

Operating/IT Manager
Troo Corporation [www.troo.com]
43 Auriga Drive, Suite 102, Ottawa, ON K2E 7Y8
Ph: +1 877.702.8766 x156
Fax: +1 855.726.8766

Re: Segmentation Fault

От
Craig Ringer
Дата:
On 06/11/2012 11:34 AM, Benson Jin wrote:
p { margin: 0; }
Hi All,

We are having a problem with our streaming replication read only node. It has crashed a few times with a couple of different reasons, mostly "segmentation fault".

Any chance of examining a core dump and getting a stack trace?

http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

--
Craig Ringer

Re: Segmentation Fault

От
"Dickson S. Guedes"
Дата:
2012/6/11 Benson Jin <benson.jin@troo.com>:
> Hi All,
>
> We are having a problem with our streaming replication read only node. It
> has crashed a few times with a couple of different reasons, mostly
> "segmentation fault". The latest log are listed below:
>
> 2012-05-30 23:56:37.385 UTC::: LOG:  server process (PID 19476) was
> terminated by signal 11: Segmentation fault
> 2012-05-30 23:56:37.385 UTC::: LOG:  terminating any other active server
> processes
> 2012-05-30 23:56:37.385 UTC:10.43.6.61:webmaster:panorama WARNING:
>  terminating connection because of crash of another server process


[... cut ...]


> Our setup:
> 2x physical server - Dell PE R815, 64GB ECC RAM, 2 CPUs (12 cores each),
>  storing pgsql data on SAN backed volumes.
> CentOS 5.6
> PostgreSQL 9.0.8, compiled *without* int64 datetime.
> Both servers are identically configured (or at least as much as we could
> ensure)


Did you compiled it from scratch e.g. make clean && make && make install?


[]s
--
Dickson S. Guedes
mail/xmpp: guedes@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

Re: Segmentation Fault

От
Benson Jin
Дата:
I will try produce one and submit to here.


From: "Craig Ringer" <ringerc@ringerc.id.au>
To: "Benson Jin" <benson.jin@troo.com>
Cc: pgsql-general@postgresql.org
Sent: Monday, June 11, 2012 3:49:41 AM
Subject: Re: [GENERAL] Segmentation Fault

On 06/11/2012 11:34 AM, Benson Jin wrote:
p { margin: 0; }
Hi All,

We are having a problem with our streaming replication read only node. It has crashed a few times with a couple of different reasons, mostly "segmentation fault".

Any chance of examining a core dump and getting a stack trace?

http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

--
Craig Ringer

Re: Segmentation Fault

От
Benson Jin
Дата:
Hi All,

A silly question.... how do I get install external symbols for postgresql, if compiled it myself previously? Do I recompile it with --enable-debug option?

Cheers,

Benson


From: "Craig Ringer" <ringerc@ringerc.id.au>
To: "Benson Jin" <benson.jin@troo.com>
Cc: pgsql-general@postgresql.org
Sent: Monday, June 11, 2012 3:49:41 AM
Subject: Re: [GENERAL] Segmentation Fault

On 06/11/2012 11:34 AM, Benson Jin wrote:
p { margin: 0; }
Hi All,

We are having a problem with our streaming replication read only node. It has crashed a few times with a couple of different reasons, mostly "segmentation fault".

Any chance of examining a core dump and getting a stack trace?

http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

--
Craig Ringer

Re: Segmentation Fault

От
Craig Ringer
Дата:
On 06/12/2012 03:15 AM, Benson Jin wrote:
Hi All,

A silly question.... how do I get install external symbols for postgresql, if compiled it myself previously? Do I recompile it with --enable-debug option?

If you didn't strip the executable then the cores produced even without --enable-debug will be somewhat useful, but debug info is ideal. You could try that first, since it'll give you a chance to make sure you can at least get and analyse a core file before spending time recompiling and reinstalling Pg.

Recompiling with --enable-debug is the easy way to get executables with debug info embedded in them. If the installed libraries, etc haven't changed since compiling your original copy of PostgreSQL, and if you're compiling the EXACT same source code, you can debug the core against these executables without replacing the ones you're actually running. Something like a libc upgrade since the original copies were built might mean that the new debug executables are no longer exactly compatible with the ones your core file came from, though, so the results can't be totally trusted.

Ideally you want to run the debug executables and get the core to debug from them. An --enable-debug build on gcc will still be optimised and should have no detectable performance difference. The executables are huge, but the ELF sections containing the debug info never get mapped in, so it doesn't actually matter.

An alternative is to build with --enable-debug, strip the debug info into external symbols packages using "strip --only-keep-debug". There isn't much point unless disk space consumed by executables is a big concern, though.

I always use --enable-debug when building Pg. I rarely need the symbols, but it's handy to have them when I do.

--
Craig Ringer