Обсуждение: Server Crash

Поиск
Список
Период
Сортировка

Server Crash

От
"Hajek, Nick"
Дата:
All,
We experienced a crash of a Postgresql server which from the log appears to have began with this entry:
 
Log:  background writer process (PID 3457) was terminated by signal 9
 
After the db was restarted, it operated apparently normally for about 15 minutes and then crashed again with the log recording the same message at the beginning of the second event.  After that crash, I rebooted the server and it has ran normally since that time - although that's been less than one hour.
 
System Details - Postgresql 8.2.4, Suse 10.1 Linux (2.6.16),  HP DL380 w/ RAID drives.
 
Anyone have any thoughts?
 
thanks,
 
Nick

Re: Server Crash

От
"Scott Marlowe"
Дата:
On Tue, Apr 22, 2008 at 8:14 AM, Hajek, Nick <Nick.Hajek@vishay.com> wrote:
>
>
> All,
> We experienced a crash of a Postgresql server which from the log appears to
> have began with this entry:
>
> Log:  background writer process (PID 3457) was terminated by signal 9

Kill -9 is the "shoot it in the head" signal.  It is not generated by
postgresql in normal operation.  It can be generated by "pg_ctl -m
immediate stop" .  At least I think that's what signal it sends.

Anyway, the most common cause of kill -9s randomly showing up in linux
is the OOM killer.

It's quite possible you're running your machine out of memory / swap
somehow and linux is killing the biggest, fattest process it can find,
which is pgsql.

you might wanna run vmstat 1 to see what's happening during these times.

Re: Server Crash

От
Ray Stell
Дата:
On Tue, Apr 22, 2008 at 09:13:09AM -0600, Scott Marlowe wrote:
> On Tue, Apr 22, 2008 at 8:14 AM, Hajek, Nick <Nick.Hajek@vishay.com> wrote:
> It's quite possible you're running your machine out of memory / swap
> somehow and linux is killing the biggest, fattest process it can find,
> which is pgsql.

syslog would have something to say about that, also.

Re: Server Crash

От
"Hajek, Nick"
Дата:

> -----Original Message-----
> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
> Sent: Tuesday, April 22, 2008 10:13 AM
> To: Hajek, Nick
> Cc: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Server Crash
>
> On Tue, Apr 22, 2008 at 8:14 AM, Hajek, Nick
> <Nick.Hajek@vishay.com> wrote:
> >
> >
> > All,
> > We experienced a crash of a Postgresql server which from the log
> > appears to have began with this entry:
> >
> > Log:  background writer process (PID 3457) was terminated
> by signal 9
>
> Kill -9 is the "shoot it in the head" signal.  It is not
> generated by postgresql in normal operation.  It can be
> generated by "pg_ctl -m immediate stop" .  At least I think
> that's what signal it sends.
>
> Anyway, the most common cause of kill -9s randomly showing up
> in linux is the OOM killer.
>
> It's quite possible you're running your machine out of memory
> / swap somehow and linux is killing the biggest, fattest
> process it can find, which is pgsql.
>
> you might wanna run vmstat 1 to see what's happening during
> these times.
>

Bingo.  I checked the syslog and found the OOM killer and indications
that the free swap space was zero.  Now I just need to find what's
eating memory.  Thanks for the help.

Re: Server Crash

От
Tom Lane
Дата:
> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
>> Kill -9 is the "shoot it in the head" signal.  It is not
>> generated by postgresql in normal operation.  It can be
>> generated by "pg_ctl -m immediate stop" .  At least I think
>> that's what signal it sends.

Just for the archives: Postgres never generates kill -9 at all.
(Immediate stop uses SIGQUIT, instead.)  When you see that in
the log, you can be sure it was a manual action or the OOM killer.

            regards, tom lane

Re: Server Crash

От
"Scott Marlowe"
Дата:
On Tue, Apr 22, 2008 at 10:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
>
> >> Kill -9 is the "shoot it in the head" signal.  It is not
>  >> generated by postgresql in normal operation.  It can be
>  >> generated by "pg_ctl -m immediate stop" .  At least I think
>  >> that's what signal it sends.
>
>  Just for the archives: Postgres never generates kill -9 at all.
>  (Immediate stop uses SIGQUIT, instead.)  When you see that in
>  the log, you can be sure it was a manual action or the OOM killer.

Thanks.  Just wondering, what's the difference in behavior from
pgsql's perspective from sigquit and siqkill?  Is sigkill more
dangerous than sigquit?

Re: Server Crash

От
Tom Lane
Дата:
"Scott Marlowe" <scott.marlowe@gmail.com> writes:
> Thanks.  Just wondering, what's the difference in behavior from
> pgsql's perspective from sigquit and siqkill?  Is sigkill more
> dangerous than sigquit?

Yes it is, because sigkill can't be trapped --- it causes instant
process death with no chance to clean up.  Not that we have backends
do a lot of cleanup after sigquit either, but at least the option
exists.  The real difference is in the postmaster: kill -9 on the
postmaster is a seriously bad idea, because it gets no chance to shut
down its children.

            regards, tom lane

Re: Server Crash

От
Fabio Pardi
Дата:

Hi Anjul,

please avoid cross posting over multiple mailing lists.

Also asking again does not help, and usually is counterproductive: members see a reply and they might put efforts on helping somebody who did not get a reply yet.

Please take good note of it.

About your problem:

I would suggest you to upgrade to a newer version, since Postgres 9.1 is too old and does not get updates any longer.

Besides that, not being a perl expert i cannot help with your procedure. I think anyway the problem might be somewhere else. I suspect that your server crashes and what you read there is only a consequence, not the cause.

Could you post any relevant log entry in /var/log/messages and maybe anything else appearing in the postgres logfile?


What kind of machine are you running on? can we have more specs?

How is your server configured?


regards,

fabio pardi




On 27/06/18 10:51, Anjul Tyagi wrote:
Hi All,

can you please suggest on the issue we are facing?
 
 
 

Regards,

Anjul TYAGI

 

ü Go Green


------ Original Message ------
From: "Anjul Tyagi" <anjul@ibosstech-us.com>
Sent: 26-06-2018 18:17:26
Subject: Server Crash

Hi All,

We have recently deployed couple of new plperl SPROC in our postgres production server and after that server start throwing error. Currently we are using postgres 9.1 and planning to upgrade with PG 10. 

But we are not sure if that will cause the same error in PG 10 as well. We really need help.

Error Message:

2018-06-21 13:43:29 EDT [22212]: [4-1] user=cmsuser,db=forte ,host=10.10.1.3,port=10.10.1.3(56601),id=5b2bdea5.56c4,line=4DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

 

2018-06-21 13:43:29 EDT [3673]: [5-1] user=cmsuser,db=forte ,host=10.10.1.3,port=10.10.1.3(55649),id=5b2bd568.e59,line=5DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.


Server Details: 
Linux - Red Hat Enterprise Linux Server release 6.3 (Santiago)
Postgres -  Postgres 9.1.13

below are the sample SPROC code:


CREATE OR REPLACE FUNCTION getauthcode(
    claim character varying,
    oflag character varying,
    pacode character varying,
    dispos character varying)
  RETURNS text AS
$BODY$
use strict;
use warnings;
use SOAP::Lite;
use JSON;


try
{
    my $host = `hostname`;
    my $rv = spi_exec_query("select * from getsprocurl('getauthcode','".$host."')");
    my $url = $rv->{rows}[0]->{ret_url};
    elog(NOTICE, 'Host Name  ' . $host . ' URL '. $url );
   
    my $soap = SOAP::Lite->new();
    my $service = $soap->service($url);
    my %params =  ("claim" => $_[0], "oflag" => $_[1], "pacode" => $_[2], "dispos" => $_[3]);
    my $response = $service->getAuthCode(%params);
    my $json_array = decode_json($response);
   
    for my $report ( $json_array) {
    my $status =  $report->{status}, '\n';
        if ($status eq 'success')
        {
            return $report->{PriorApprovalCode};
        }
        else{
            return $report->{message};
        }
    }   
}
catch Exception with
{
    my $ex = shift;
    return 'SOAPFAULT: ' . date('H:i:s') . ' ' . exception($ex);
}
 
$BODY$
  LANGUAGE plperlu VOLATILE STRICT
  COST 100;

Appreciate for your help in advance.
 

Regards,

Anjul TYAGI

 

ü Go Green



Re: Server Crash

От
"Anjul Tyagi"
Дата:
Hi Fabio,

I understand your point, however i am in tremendous pressure to provide the solution.

I have attached the log file for review.

We have Red Hat machine and Postgres 9.1.13 version.

Appreciate for your response and help.
 
 
 

Regards,

Anjul TYAGI

 

ü Go Green


------ Original Message ------
From: "Fabio Pardi" <f.pardi@portavita.eu>
Sent: 27-06-2018 14:43:30
Subject: Re: Server Crash

Hi Anjul,

please avoid cross posting over multiple mailing lists.

Also asking again does not help, and usually is counterproductive: members see a reply and they might put efforts on helping somebody who did not get a reply yet.

Please take good note of it.

About your problem:

I would suggest you to upgrade to a newer version, since Postgres 9.1 is too old and does not get updates any longer.

Besides that, not being a perl expert i cannot help with your procedure. I think anyway the problem might be somewhere else. I suspect that your server crashes and what you read there is only a consequence, not the cause.

Could you post any relevant log entry in /var/log/messages and maybe anything else appearing in the postgres logfile?


What kind of machine are you running on? can we have more specs?

How is your server configured?


regards,

fabio pardi




On 27/06/18 10:51, Anjul Tyagi wrote:
Hi All,

can you please suggest on the issue we are facing?
 
 
 

Regards,

Anjul TYAGI

 

ü Go Green


------ Original Message ------
From: "Anjul Tyagi" <anjul@ibosstech-us.com>
Sent: 26-06-2018 18:17:26
Subject: Server Crash

Hi All,

We have recently deployed couple of new plperl SPROC in our postgres production server and after that server start throwing error. Currently we are using postgres 9.1 and planning to upgrade with PG 10. 

But we are not sure if that will cause the same error in PG 10 as well. We really need help.

Error Message:

2018-06-21 13:43:29 EDT [22212]: [4-1] user=cmsuser,db=forte ,host=10.10.1.3,port=10.10.1.3(56601),id=5b2bdea5.56c4,line=4DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

 

2018-06-21 13:43:29 EDT [3673]: [5-1] user=cmsuser,db=forte ,host=10.10.1.3,port=10.10.1.3(55649),id=5b2bd568.e59,line=5DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.


Server Details: 
Linux - Red Hat Enterprise Linux Server release 6.3 (Santiago)
Postgres -  Postgres 9.1.13

below are the sample SPROC code:


CREATE OR REPLACE FUNCTION getauthcode(
    claim character varying,
    oflag character varying,
    pacode character varying,
    dispos character varying)
  RETURNS text AS
$BODY$
use strict;
use warnings;
use SOAP::Lite;
use JSON;


try
{
    my $host = `hostname`;
    my $rv = spi_exec_query("select * from getsprocurl('getauthcode','".$host."')");
    my $url = $rv->{rows}[0]->{ret_url};
    elog(NOTICE, 'Host Name  ' . $host . ' URL '. $url );
   
    my $soap = SOAP::Lite->new();
    my $service = $soap->service($url);
    my %params =  ("claim" => $_[0], "oflag" => $_[1], "pacode" => $_[2], "dispos" => $_[3]);
    my $response = $service->getAuthCode(%params);
    my $json_array = decode_json($response);
   
    for my $report ( $json_array) {
    my $status =  $report->{status}, '\n';
        if ($status eq 'success')
        {
            return $report->{PriorApprovalCode};
        }
        else{
            return $report->{message};
        }
    }   
}
catch Exception with
{
    my $ex = shift;
    return 'SOAPFAULT: ' . date('H:i:s') . ' ' . exception($ex);
}
 
$BODY$
  LANGUAGE plperlu VOLATILE STRICT
  COST 100;

Appreciate for your help in advance.
 

Regards,

Anjul TYAGI

 

ü Go Green



Вложения

Re: Server Crash

От
Fabio Pardi
Дата:

Hi,

what was running as PID 7471?

what happened before the first message at 10:29:52?

what is 'alarm clock' appearing at first line in the logfile?


still missing answers from my previous message:

/var/log/messages ?

how much ram on the machine?

what are the memory settings of postgres?

regards,

fabio pardi



On 27/06/18 12:40, Anjul Tyagi wrote:
Hi Fabio,

I understand your point, however i am in tremendous pressure to provide the solution.

I have attached the log file for review.

We have Red Hat machine and Postgres 9.1.13 version.

Appreciate for your response and help.
 
 
 

Regards,

Anjul TYAGI

 

ü Go Green


------ Original Message ------
From: "Fabio Pardi" <f.pardi@portavita.eu>
Sent: 27-06-2018 14:43:30
Subject: Re: Server Crash

Hi Anjul,

please avoid cross posting over multiple mailing lists.

Also asking again does not help, and usually is counterproductive: members see a reply and they might put efforts on helping somebody who did not get a reply yet.

Please take good note of it.

About your problem:

I would suggest you to upgrade to a newer version, since Postgres 9.1 is too old and does not get updates any longer.

Besides that, not being a perl expert i cannot help with your procedure. I think anyway the problem might be somewhere else. I suspect that your server crashes and what you read there is only a consequence, not the cause.

Could you post any relevant log entry in /var/log/messages and maybe anything else appearing in the postgres logfile?


What kind of machine are you running on? can we have more specs?

How is your server configured?


regards,

fabio pardi




On 27/06/18 10:51, Anjul Tyagi wrote:
Hi All,

can you please suggest on the issue we are facing?
 
 
 

Regards,

Anjul TYAGI

 

ü Go Green


------ Original Message ------
From: "Anjul Tyagi" <anjul@ibosstech-us.com>
Sent: 26-06-2018 18:17:26
Subject: Server Crash

Hi All,

We have recently deployed couple of new plperl SPROC in our postgres production server and after that server start throwing error. Currently we are using postgres 9.1 and planning to upgrade with PG 10. 

But we are not sure if that will cause the same error in PG 10 as well. We really need help.

Error Message:

2018-06-21 13:43:29 EDT [22212]: [4-1] user=cmsuser,db=forte ,host=10.10.1.3,port=10.10.1.3(56601),id=5b2bdea5.56c4,line=4DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

 

2018-06-21 13:43:29 EDT [3673]: [5-1] user=cmsuser,db=forte ,host=10.10.1.3,port=10.10.1.3(55649),id=5b2bd568.e59,line=5DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.


Server Details: 
Linux - Red Hat Enterprise Linux Server release 6.3 (Santiago)
Postgres -  Postgres 9.1.13

below are the sample SPROC code:


CREATE OR REPLACE FUNCTION getauthcode(
    claim character varying,
    oflag character varying,
    pacode character varying,
    dispos character varying)
  RETURNS text AS
$BODY$
use strict;
use warnings;
use SOAP::Lite;
use JSON;


try
{
    my $host = `hostname`;
    my $rv = spi_exec_query("select * from getsprocurl('getauthcode','".$host."')");
    my $url = $rv->{rows}[0]->{ret_url};
    elog(NOTICE, 'Host Name  ' . $host . ' URL '. $url );
   
    my $soap = SOAP::Lite->new();
    my $service = $soap->service($url);
    my %params =  ("claim" => $_[0], "oflag" => $_[1], "pacode" => $_[2], "dispos" => $_[3]);
    my $response = $service->getAuthCode(%params);
    my $json_array = decode_json($response);
   
    for my $report ( $json_array) {
    my $status =  $report->{status}, '\n';
        if ($status eq 'success')
        {
            return $report->{PriorApprovalCode};
        }
        else{
            return $report->{message};
        }
    }   
}
catch Exception with
{
    my $ex = shift;
    return 'SOAPFAULT: ' . date('H:i:s') . ' ' . exception($ex);
}
 
$BODY$
  LANGUAGE plperlu VOLATILE STRICT
  COST 100;

Appreciate for your help in advance.
 

Regards,

Anjul TYAGI

 

ü Go Green