segmentation fault postgres 9.3.5 core dump perlu related ?

Поиск
Список
Период
Сортировка
От Day, David
Тема segmentation fault postgres 9.3.5 core dump perlu related ?
Дата
Msg-id 401084E5E73F4241A44F3C9E6FD7942801158483CC@exch-01
обсуждение исходный текст
Ответы Re: segmentation fault postgres 9.3.5 core dump perlu related ?  (Guy Helmer <ghelmer@palisadesystems.com>)
Список pgsql-general

Update/Information sharing on my pursuit of  segmentation faults

 

FreeBSD 10.0-RELEASE-p12 amd64

Postgres version 9.3.5

 

Below are three postgres core files generated from two different machine ( Georgia and Alabama ) on Feb 11.

These cores would not be caused  from an  environment update issue that I last suspected might be causing the segfaults

So I am kind of back to square one in terms of thinking what is occurring.

 

?  I am not sure that I understand the associated time events in the  postgres log file output.  Is this whatever happens to be running on the other postgress forked process when the cored  process was detected ?

If this is the case then I have probably been reading to much from the content of the postgres log file at the time of core.

This probably just represents collateral damage of routine transactions that were in other forked  processes at the time one of the processes cored ?

 

Therefore I would now just assert  that postgres has a sporadic segmentation problem,  no known way to reliably cause it

and am uncertain as to how to proceed to resolve it.

 

 

Georgia 8:38

Georgia 17:55

Alabama: 15:30  

 

--

 

 

If someone sees something suggesting  a direction to pursue from these core file back traces much appreciated.

 

 

 

Thanks

 

 

Dave

 

Georgia - Core 17:55 – Feb 11

(gdb) bt

#0  0x00000000006f8670 in SearchCatCache ()

#1  0x0000000000672537 in enum_in ()

#2  0x000000000071375b in InputFunctionCall ()

#3  0x0000000000713b7e in OidInputFunctionCall ()

#4  0x0000000000509a3d in coerce_type ()

#5  0x0000000000511af3 in make_fn_arguments ()

#6  0x0000000000513fed in make_op ()

#7  0x000000000050f53b in ?? ()

#8  0x000000000050d706 in transformExpr ()

#9  0x0000000000518333 in transformTargetList ()

#10 0x00000000004f02bc in transformStmt ()

#11 0x000000000064109d in pg_analyze_and_rewrite_params ()

#12 0x00000000006fbc6b in ?? ()

#13 0x00000000006fb6f5 in GetCachedPlan ()

#14 0x000000000059597a in SPI_plan_get_cached_plan ()

#15 0x00000008024ed34d in ?? () from /usr/local/lib/postgresql/plpgsql.so

#16 0x00000008024f2590 in ?? () from /usr/local/lib/postgresql/plpgsql.so

#17 0x00000008024ee0d0 in ?? () from /usr/local/lib/postgresql/plpgsql.so

#18 0x00000008024eaf3b in ?? () from /usr/local/lib/postgresql/plpgsql.so

#19 0x00000008024ea243 in plpgsql_exec_function () from /usr/local/lib/postgresql/plpgsql.so

#20 0x00000008024e6551 in plpgsql_call_handler () from /usr/local/lib/postgresql/plpgsql.so

#21 0x000000000057611f in ExecMakeTableFunctionResult ()

#22 0x000000000058b6c7 in ?? ()

#23 0x000000000057bab2 in ExecScan ()

#24 0x00000000005756b8 in ExecProcNode ()

#25 0x0000000000573630 in standard_ExecutorRun ()

#26 0x0000000000645b0a in ?? ()

#27 0x0000000000645719 in PortalRun ()

#28 0x00000000006438ea in PostgresMain ()

#29 0x00000000005ff267 in PostmasterMain ()

#30 0x00000000005a31ba in main ()

(gdb) info threads

  Id   Target Id         Frame

* 2    Thread 802c06400 (LWP 100070) 0x00000000006f8670 in SearchCatCache ()

* 1    Thread 802c06400 (LWP 100070) 0x00000000006f8670 in SearchCatCache ()

 

 

? The gdb info threads response is still an annoying piece of information.  Connecting gdb to a healthy running postmaster gives the same thread count as the core file. (2)

However, other system system tools (top ps ) which  indicate number of threads for the process only indicate one thread on the healty process. So I think this is  a debugger bug.

 

 

 

2015-02-11T17:55:13.732147-05:00 georgia local0 info postgres[38321]: [7236-1] user=ace_db_client, db=ace_db, proc=38321, audit=dbm_client9, LOG:  du

ration: 4.384 ms  statement: COMMIT

2015-02-11T17:55:13.743399-05:00 georgia local0 info postgres[86738]: [12-1] user=redcom, db=ace_db, proc=86738, audit=[unknown], LOG:  duration: 14.

581 ms  statement: SELECT database, COALESCE(max(extract(epoch FROM CURRENT_TIMESTAMP-prepared)),0) FROM pg_prepared_xacts JOIN pg_database ON datnam

e=database WHERE datname='ace_db' GROUP BY database ORDER BY 1

2015-02-11T17:55:13.833624-05:00 georgia local0 info postgres[1018]: [11-1] user=, db=, proc=1018, audit=, LOG:  server process (PID 38319) was termi

nated by signal 11: Segmentation fault

2015-02-11T17:55:13.833669-05:00 georgia local0 info postgres[1018]: [11-2] user=, db=, proc=1018, audit=, DETAIL:  Failed process was running: SELEC

T * FROM cc.register_port_sip_user($1, $2, $3, $4, $5, $6, $7, $8, $9, $10 )

2015-02-11T17:55:13.833701-05:00 georgia local0 info postgres[1018]: [12-1] user=, db=, proc=1018, audit=, LOG:  terminating any other active server

processes

2015-02-11T17:55:13.833896-05:00 georgia local0 notice postgres[38321]: [7237-1] user=ace_db_client, db=ace_db, proc=38321, audit=dbm_client9, WARNIN

G:  terminating connection because of crash of another server process

2015-02-11T17:55:13.833923-05:00 georgia local0 notice postgres[38321]: [7237-2] user=ace_db_client, db=ace_db, proc=38321, audit=dbm_client9, DETAIL

:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally a

nd possibly corrupted shared memory.

2015-02

 

 

Georgia-Core 8:38 -  Feb 11

[New process 101032]

[New Thread 802c06400 (LWP 101032)]

Core was generated by `postgres'.

Program terminated with signal SIGSEGV, Segmentation fault.

#0  0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18

(gdb) bt

#0  0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18

#1  0x000000080c4cab49 in Perl_sv_clear () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18

#2  0x000000080c4cb13a in Perl_sv_free2 () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18

#3  0x000000080c4e5102 in Perl_free_tmps () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18

#4  0x000000080bcfedea in plperl_destroy_interp () from /usr/local/lib/postgresql/plperl.so

#5  0x000000080bcfec05 in plperl_fini () from /usr/local/lib/postgresql/plperl.so

#6  0x00000000006292c6 in ?? ()

#7  0x000000000062918d in proc_exit ()

#8  0x00000000006443f3 in PostgresMain ()

#9  0x00000000005ff267 in PostmasterMain ()

#10 0x00000000005a31ba in main ()

(gdb) info threads

  Id   Target Id         Frame

* 2    Thread 802c06400 (LWP 101032) 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18

* 1    Thread 802c06400 (LWP 101032) 0x000000080c4b6d51 in Perl_hfree_next_entry () from /usr/local/lib/perl5/5.18/mach/CORE/libperl.so.5.18

 

Postgres.log content

 

ation: 0.087 ms  statement: UNLISTEN "tbl_changed"

2015-02-11T08:38:48.227368-05:00 georgia local0 info postgres[27177]: [1276-1] user=ace_db_client, db=ace_db, proc=27177, audit=dbm_client6, LOG:  du

ration: 0.152 ms  statement: UNLISTEN "tbl_changed"

2015-02-11T08:38:48.246438-05:00 georgia local0 info postgres[27176]: [1262-1] user=ace_db_client, db=ace_db, proc=27176, audit=dbm_client8, LOG:  du

ration: 0.155 ms  statement: UNLISTEN "tbl_changed"

2015-02-11T08:38:48.576282-05:00 georgia local0 info postgres[27174]: [388-1] user=ace_db_client, db=ace_db, proc=27174, audit=dbm_client2, LOG:  dur

ation: 0.094 ms  statement: UNLISTEN "tbl_changed"

2015-02-11T08:38:49.754208-05:00 georgia local0 info postgres[1018]: [7-1] user=, db=, proc=1018, audit=, LOG:  server process (PID 27172) was termin

ated by signal 11: Segmentation fault

2015-02-11T08:38:49.754236-05:00 georgia local0 info postgres[1018]: [8-1] user=, db=, proc=1018, audit=, LOG:  terminating any other active server p

rocesses

2015-02-11T08:38:49.763667-05:00 georgia local0 notice postgres[19938]: [7-1] user=, db=, proc=19938, audit=, WARNING:  terminating connection becaus

e of crash of another server process

2015-02-11T08:38:49.763693-05:00 georgia local0 notice postgres[19938]: [7-2] user=, db=, proc=19938, audit=, DETAIL:  The postmaster has commanded t

his server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memo

ry.

2015-02-11T08:38:49.763711-05:00 georgia local0 notice postgres[19938]: [7-3] user=, db=, proc=19938, audit=, HINT:  In a moment you should be able t

o reconnect to the database and repeat your command.

2015-02-11T08:38:49.769432-05:00 georgia local0 notice postgres[20073]: [9-1] user=redcom, db=ace_db, proc=20073, audit=[unknown], WARNING:  terminat

ing connection because of crash of another server process

2015-02-11T08:38:49.769657-05:00 georgia local0 notice postgres[20073]: [9-2] user=redcom, db=ace_db, proc=20073, audit=[unknown], DETAIL:  The postm

aster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly

corrupted shared memo

 

 

 

 

Alabama – 15:30 Feb 11

 

Program terminated with signal SIGSEGV, Segmentation fault.

#0  0x0000000801da7883 in ?? () from /lib/libc.so.7

(gdb) bt

#0  0x0000000801da7883 in ?? () from /lib/libc.so.7

#1  0x0000000801da943b in ?? () from /lib/libc.so.7

#2  0x0000000801db457c in free () from /lib/libc.so.7

#3  0x000000000072b739 in ?? ()

#4  0x000000000072b9bd in MemoryContextDelete ()

#5  0x00000000006fbc17 in ?? ()

#6  0x00000000006fb6f5 in GetCachedPlan ()

#7  0x0000000000594eec in ?? ()

#8  0x00000008024ee8c5 in ?? () from /usr/local/lib/postgresql/plpgsql.so

#9  0x00000008024ef3e5 in ?? () from /usr/local/lib/postgresql/plpgsql.so

#10 0x00000008024ebf3b in ?? () from /usr/local/lib/postgresql/plpgsql.so

#11 0x00000008024eb243 in plpgsql_exec_function () from /usr/local/lib/postgresql/plpgsql.so

#12 0x00000008024e7551 in plpgsql_call_handler () from /usr/local/lib/postgresql/plpgsql.so

#13 0x000000000057611f in ExecMakeTableFunctionResult ()

#14 0x000000000058b6c7 in ?? ()

#15 0x000000000057bab2 in ExecScan ()

#16 0x00000000005756b8 in ExecProcNode ()

#17 0x0000000000573630 in standard_ExecutorRun ()

#18 0x0000000000645b0a in ?? ()

#19 0x0000000000645719 in PortalRun ()

#20 0x00000000006438ea in PostgresMain ()

#21 0x00000000005ff267 in PostmasterMain ()

#22 0x00000000005a31ba in main ()

(gdb) info threads

  Id   Target Id         Frame

* 2    Thread 802c06400 (LWP 100574) 0x0000000801da7883 in ?? () from /lib/libc.so.7

* 1    Thread 802c06400 (LWP 100574) 0x0000000801da7883 in ?? () from /lib/libc.so.7

 

2015-02-11T15:16:19.029980-05:00 alabama local0 warning postgres[1980]: [7-6] #011

2015-02-11T15:16:19.029989-05:00 alabama local0 warning postgres[1980]: [7-7] #011

2015-02-11T15:16:19.030000-05:00 alabama local0 warning postgres[1980]: [7-8] #011

2015-02-11T15:30:44.991096-05:00 alabama local0 info postgres[54202]: [3-1] user=, db=, proc=54202, audit=, LOG:  server process (PID 87242) was

terminated by signal 11: Segmentation fault

2015-02-11T15:30:44.991122-05:00 alabama local0 info postgres[54202]: [3-2] user=, db=, proc=54202, audit=, DETAIL:  Failed process was running:

SELECT * FROM cc.get_port_and_registration_data($1, $2, $3, $4, $5)

2015-02-11T15:30:44.991175-05:00 alabama local0 info postgres[54202]: [4-1] user=, db=, proc=54202, audit=, LOG:  terminating any other active se

rver processes

2015-02-11T15:30:45.004506-05:00 alabama local0 notice postgres[87241]: [3-1] user=ace_db_client, db=ace_db, proc=87241, audit=dbm_client5, WARNI

NG:  terminating connection because of crash of another server process

2015-02-11T15:30:45.004567-05:00 alabama local0 notice postgres[87241]: [3-2] user=ace_db_client, db=ace_db, proc=87241, audit=dbm_client5, DETAI

L:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnorma

lly and possibly corrupted shared memory.

2015-02-11T15:30:45.129123-05:00 alabama local0 notice postgres[87241]: [3-3] user=ace_db_client, db=ace_db, proc=87241, audit=dbm_client5, HINT:

  In a moment you should be able to reconnect to the database and repeat your command.

2015-02-11T15:30:45.129437-05:00 alabama local0 notice postgres[87238]: [3-1] user=ace_db_client, db=ace_db, proc=87238, audit=dbm_client2, WARNI

NG:  terminating connection because of crash of another server process

В списке pgsql-general по дате отправления:

Предыдущее
От: David G Johnston
Дата:
Сообщение: Re: infinite recursion detected in rules for relation
Следующее
От: Saimon Lim
Дата:
Сообщение: How to hide stored procedure's bodies from specific user