Re: pg crashing

Поиск
Список
Период
Сортировка
От Roberts, Jon
Тема Re: pg crashing
Дата
Msg-id 1A6E6D554222284AB25ABE3229A92762E9A688@nrtexcus702.int.asurion.com
обсуждение исходный текст
Ответ на Re: pg crashing  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pg crashing  (Craig Ringer <craig@postnewspapers.com.au>)
Re: pg crashing  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
> "Roberts, Jon" <Jon.Roberts@asurion.com> writes:
> > Version: "PostgreSQL 8.3.0, compiled by Visual C++ build 1400"
>
> Well, there are plenty of known bugs in 8.3.0 by now.  You really
> should update before complaining, not after.

I'm not complaining.  I just want to make sure that if I upgrade, it
will fix the problem.  An upgrade could possible introduce a new
problem.  I also wonder if this is isolated to Win32 because we are
upgrading to Solaris very soon.

>
> > Problem: My database keeps on crashing every few days with this type
of
> > error message:
>
> > 2008-07-01 10:46:30 CDT LOG:  all server processes terminated;
> > reinitializing
>
> I think your real problem is with what happened *before* that.

I found the first instance of "crash" and then got the rest of the log
file.

2008-07-01 10:43:42 CDT LOG:  server process (PID 3524) exited with exit
code 128
2008-07-01 10:43:42 CDT LOG:  terminating any other active server
processes
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT CONTEXT:  SQL statement "SELECT
pg_sleep(cast(control.fn_get_variable('sleep_time') as int))"
    PL/pgSQL function "fn_update_status" line 136 at PERFORM
    SQL statement "SELECT  gp_load.fn_update_status( $1 ,  $2 ,
'Processing', '', 0)"
    PL/pgSQL function "fn_load" line 56 at PERFORM
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT CONTEXT:  PL/pgSQL function "fn_get_job_details"
line 114 at IF
    PL/pgSQL function "fn_load" line 465 at FOR over SELECT rows
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT CONTEXT:  SQL statement "SELECT
pg_sleep(cast(control.fn_get_variable('sleep_time') as int))"
    PL/pgSQL function "fn_update_status" line 136 at PERFORM
    SQL statement "SELECT  gp_load.fn_update_status( $1 ,  $2 ,
'Processing', '', 0)"
    PL/pgSQL function "fn_load" line 56 at PERFORM
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT CONTEXT:  SQL statement "SELECT
pg_sleep(cast(control.fn_get_variable('sleep_time') as int))"
    PL/pgSQL function "fn_update_status" line 136 at PERFORM
    SQL statement "SELECT  gp_load.fn_update_status( $1 ,  $2 ,
'Processing', '', 0)"
    PL/pgSQL function "fn_load" line 56 at PERFORM
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT CONTEXT:  SQL statement "SELECT
pg_sleep(cast(control.fn_get_variable('sleep_time') as int))"
    PL/pgSQL function "fn_update_status" line 136 at PERFORM
    SQL statement "SELECT  gp_load.fn_update_status( $1 ,  $2 ,
'Processing', '', 0)"
    PL/pgSQL function "fn_load" line 56 at PERFORM
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT CONTEXT:  SQL statement "SELECT
pg_sleep(cast(control.fn_get_variable('sleep_time') as int))"
    PL/pgSQL function "fn_update_status" line 136 at PERFORM
    SQL statement "SELECT  gp_load.fn_update_status( $1 ,  $2 ,
'Processing', '', 0)"
    PL/pgSQL function "fn_load" line 56 at PERFORM
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
INFO:  2008-07-01 10:44:42.373008-05:3000:Executing Query ..... create
temporary table order_iud on commit drop as select
s."jid",s."order_id",s."parent_order_id",s."claim_id",s."fulfillment_met
hod",s."order_status_id",s."payment_status",s."created_by",s."created_dt
m",s."modified_by",s."modified_dtm",s."deleted_flg",s."claimed_item_id",
s."order_state_hashcode",s."autopaymentauth_flg",s."salvageexpectation_s
tatus",s."snr_fee",s."salvageexpectation_reason",s."salvageexpectation_n
ote",s."nocharge_override",s."change_datetime",s."change_type",s."edw_pr
ocess_flag",s."edw_process_ts" from stage_intelliset.order s, (select
max("order".jid) as jid, "order".order_id from stage_intelliset."order"
group by "order".order_id) m where m.jid = s.jid and m.order_id::text =
s.order_id::text distributed by (order_id)
INFO:  2008-07-01 10:44:43.185777-05:3000:Execution Complete.
INFO:  2008-07-01 10:44:46.988228-05:5000:Executing Update ..... update
replica_intelliset.invoiceautopayment set "order_id" = y."order_id",
"invoice_id" = y."invoice_id", "transaction_id" = y."transaction_id",
"paymenttype_code" = y."paymenttype_code", "amount" = y."amount",
"status_code" = y."status_code", "deleted_flg" = y."deleted_flg",
"created_by" = y."created_by", "created_dtm" = y."created_dtm",
"modified_by" = y."modified_by", "modified_dtm" = y."modified_dtm",
edw_modified_dt = y.edw_process_ts, edw_modified_id = y.jid,
edw_clock_ts = clock_timestamp() from (select
a."invoiceautopayement_id", a."order_id", a."invoice_id",
a."transaction_id", a."paymenttype_code", a."amount", a."status_code",
a."deleted_flg", a."created_by", a."created_dtm", a."modified_by",
a."modified_dtm",  a.edw_process_ts, a.jid, a.change_type from
invoiceautopayment_iud a inner join
replica_intelliset.invoiceautopayment b on a.invoiceautopayement_id =
b.invoiceautopayement_id) y where
replica_intelliset.invoiceautopayment.invoiceautopayement_id =
y.invoiceautopayement_id and y.change_type <> 3
INFO:  2008-07-01 10:44:48.785215-05:5000:Update Complete.
INFO:  2008-07-01 10:44:48.786175-05:6000:Executing Insert ..... insert
into replica_intelliset.invoiceautopayment ("invoiceautopayement_id",
"order_id", "invoice_id", "transaction_id", "paymenttype_code",
"amount", "status_code", "deleted_flg", "created_by", "created_dtm",
"modified_by", "modified_dtm",  edw_created_dt, edw_modified_dt,
edw_created_id, edw_modified_id, edw_clock_ts) select
a."invoiceautopayement_id", a."order_id", a."invoice_id",
a."transaction_id", a."paymenttype_code", a."amount", a."status_code",
a."deleted_flg", a."created_by", a."created_dtm", a."modified_by",
a."modified_dtm",  edw_process_ts as edw_created_dt, null as
edw_modified_dt, a.jid as edw_created_id, null as edw_modified_id,
clock_timestamp() as edw_clock_ts from invoiceautopayment_iud a left
outer join replica_intelliset.invoiceautopayment b on
a.invoiceautopayement_id = b.invoiceautopayement_id where
(b.invoiceautopayement_id is null) and a.change_type <> 3
INFO:  2008-07-01 10:44:50.738058-05:6000:Insert Complete.
INFO:  2008-07-01 10:44:50.750247-05:8000:Executing insert into archive
table..... insert into stage_intelliset.invoiceautopayment_arch select *
from stage_intelliset.invoiceautopayment
INFO:  2008-07-01 10:44:50.839105-05:8000:Insert complete
2008-07-01 10:43:55 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:55 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:55 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:55 CDT CONTEXT:  SQL statement "select * from
dblink('gp',  $1 ) as t1 (return_text varchar)"
    PL/pgSQL function "fn_remote_sql" line 63 at SQL statement
    SQL statement "SELECT  gp_execute.fn_remote_sql( $1 ,  $2 )"
    PL/pgSQL function "fn_load" line 393 at PERFORM
2008-07-01 10:43:55 CDT LOG:  could not send data to client: No
connection could be made because the target machine actively refused it.


2008-07-01 10:43:55 CDT CONTEXT:  SQL statement "select * from
dblink('gp',  $1 ) as t1 (return_text varchar)"
    PL/pgSQL function "fn_remote_sql" line 63 at SQL statement
    SQL statement "SELECT  gp_execute.fn_remote_sql( $1 ,  $2 )"
    PL/pgSQL function "fn_load" line 393 at PERFORM
2008-07-01 10:43:55 CDT STATEMENT:  select * from gp_load.fn_load(809)
INFO:  2008-07-01 10:44:53.103065-05:5000:Executing Update ..... update
replica_intelliset.order set "parent_order_id" = y."parent_order_id",
"claim_id" = y."claim_id", "fulfillment_method" =
y."fulfillment_method", "order_status_id" = y."order_status_id",
"payment_status" = y."payment_status", "created_by" = y."created_by",
"created_dtm" = y."created_dtm", "modified_by" = y."modified_by",
"modified_dtm" = y."modified_dtm", "deleted_flg" = y."deleted_flg",
"claimed_item_id" = y."claimed_item_id", "order_state_hashcode" =
y."order_state_hashcode", "autopaymentauth_flg" =
y."autopaymentauth_flg", "salvageexpectation_status" =
y."salvageexpectation_status", "snr_fee" = y."snr_fee",
"salvageexpectation_reason" = y."salvageexpectation_reason",
"salvageexpectation_note" = y."salvageexpectation_note",
"nocharge_override" = y."nocharge_override", edw_modified_dt =
y.edw_process_ts, edw_modified_id = y.jid, edw_clock_ts =
clock_timestamp() from (select a."order_id", a."parent_order_id",
a."claim_id", a."fulfillment_method", a."order_status_id",
a."payment_status", a."created_by", a."created_dtm", a."modified_by",
a."modified_dtm", a."deleted_flg", a."claimed_item_id",
a."order_state_hashcode", a."autopaymentauth_flg",
a."salvageexpectation_status", a."snr_fee",
a."salvageexpectation_reason", a."salvageexpectation_note",
a."nocharge_override",  a.edw_process_ts, a.jid, a.change_type from
order_iud a inner join replica_intelliset.order b on a.order_id =
b.order_id) y where replica_intelliset.order.order_id = y.order_id and
y.change_type <> 3
INFO:  2008-07-01 10:44:54.536366-05:5000:Update Complete.
INFO:  2008-07-01 10:44:54.537312-05:6000:Executing Insert ..... insert
into replica_intelliset.order ("order_id", "parent_order_id",
"claim_id", "fulfillment_method", "order_status_id", "payment_status",
"created_by", "created_dtm", "modified_by", "modified_dtm",
"deleted_flg", "claimed_item_id", "order_state_hashcode",
"autopaymentauth_flg", "salvageexpectation_status", "snr_fee",
"salvageexpectation_reason", "salvageexpectation_note",
"nocharge_override",  edw_created_dt, edw_modified_dt, edw_created_id,
edw_modified_id, edw_clock_ts) select a."order_id", a."parent_order_id",
a."claim_id", a."fulfillment_method", a."order_status_id",
a."payment_status", a."created_by", a."created_dtm", a."modified_by",
a."modified_dtm", a."deleted_flg", a."claimed_item_id",
a."order_state_hashcode", a."autopaymentauth_flg",
a."salvageexpectation_status", a."snr_fee",
a."salvageexpectation_reason", a."salvageexpectation_note",
a."nocharge_override",  edw_process_ts as edw_created_dt, null as
edw_modified_dt, a.jid as edw_created_id, null as edw_modified_id,
clock_timestamp() as edw_clock_ts from order_iud a left outer join
replica_intelliset.order b on a.order_id = b.order_id where (b.order_id
is null) and a.change_type <> 3
INFO:  2008-07-01 10:44:54.577749-05:6000:Insert Complete.
INFO:  2008-07-01 10:44:54.583708-05:8000:Executing insert into archive
table..... insert into stage_intelliset.order_arch select * from
stage_intelliset.order
INFO:  2008-07-01 10:44:54.715591-05:8000:Insert complete
2008-07-01 10:43:58 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:43:58 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:58 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:58 CDT CONTEXT:  SQL statement "select * from
dblink('gp',  $1 ) as t1 (return_text varchar)"
    PL/pgSQL function "fn_remote_sql" line 63 at SQL statement
    SQL statement "SELECT  gp_execute.fn_remote_sql( $1 ,  $2 )"
    PL/pgSQL function "fn_load" line 393 at PERFORM
2008-07-01 10:43:58 CDT LOG:  could not send data to client: No
connection could be made because the target machine actively refused it.


2008-07-01 10:43:58 CDT CONTEXT:  SQL statement "select * from
dblink('gp',  $1 ) as t1 (return_text varchar)"
    PL/pgSQL function "fn_remote_sql" line 63 at SQL statement
    SQL statement "SELECT  gp_execute.fn_remote_sql( $1 ,  $2 )"
    PL/pgSQL function "fn_load" line 393 at PERFORM
2008-07-01 10:43:58 CDT STATEMENT:  select * from gp_load.fn_load(813)
2008-07-01 10:46:30 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:46:30 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:46:30 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:46:30 CDT CONTEXT:  PL/pgSQL function "fn_load" line 271
at IF
2008-07-01 10:46:30 CDT LOG:  could not send data to client: No
connection could be made because the target machine actively refused it.


2008-07-01 10:46:30 CDT CONTEXT:  PL/pgSQL function "fn_load" line 271
at IF
2008-07-01 10:46:30 CDT STATEMENT:  select * from gp_load.fn_load(757)
2008-07-01 10:46:30 CDT WARNING:  terminating connection because of
crash of another server process
2008-07-01 10:46:30 CDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:46:30 CDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:46:30 CDT CONTEXT:  SQL statement "select
remote_execute.fn_get_max( $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8
,  $9 ,  $10 ,  $11 )"
    PL/pgSQL function "fn_load" line 265 at SQL statement
2008-07-01 10:46:30 CDT LOG:  could not send data to client: No
connection could be made because the target machine actively refused it.


2008-07-01 10:46:30 CDT CONTEXT:  SQL statement "select
remote_execute.fn_get_max( $1 ,  $2 ,  $3 ,  $4 ,  $5 ,  $6 ,  $7 ,  $8
,  $9 ,  $10 ,  $11 )"
    PL/pgSQL function "fn_load" line 265 at SQL statement
2008-07-01 10:46:30 CDT STATEMENT:  select * from gp_load.fn_load(965)
2008-07-01 10:46:30 CDT LOG:  all server processes terminated;
reinitializing
2008-07-01 10:46:31 CDT FATAL:  pre-existing shared memory block is
still in use
2008-07-01 10:46:31 CDT HINT:  Check if there are any old server
processes still running, and terminate them.

> But:
>
> > 2008-07-01 10:46:31 CDT FATAL:  pre-existing shared memory block is
> > still in use
> > 2008-07-01 10:46:31 CDT HINT:  Check if there are any old server
> > processes still running, and terminate them.
>
> Hmm ... the code in win32_shmem.c that generates this message seems
> mighty bogus to me --- it's just hoping that one-second delay is
> enough.  Another problem is that postmaster children that do
> PGSharedMemoryDetach will still have valid inherited handles for
> the shmem segment --- does that factor into the behavior?  It looks
> to me like the CloseHandle ought to be in PGSharedMemoryDetach.
>
>             regards, tom lane

This wouldn't be a problem in Unix?


Jon

В списке pgsql-general по дате отправления:

Предыдущее
От: "Roberts, Jon"
Дата:
Сообщение: Re: simple tool for building web forms
Следующее
От: "Roberts, Jon"
Дата:
Сообщение: Re: pg crashing