Обсуждение: postmaster core ( finally I have it )
Hi all, since long time ( in the mean time I did Postgres upgrade four time and now I'm using 7.3.3 ) I'm having, at least once in a week, a signal 11 on a backend, and how you can immagine with the subseguent drop of all connections, finally now I have the core. The process killed made always the same select ( with different id_package ): SELECT id_publisher, publisher_name, id_package, package_name FROM v_psl_package_info WHERE id_package = 177; Here it is: bash-2.05$ file core core: ELF 32-bit LSB core file of 'postmaster' (signal 11), Intel 80386, version 1, from 'postmaster' bash-2.05$ gdb /usr/bin/postmaster core GNU gdb Red Hat Linux (5.2-2) Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"...(no debugging symbols found)... warning: core file may not match specified executable file. Core was generated by `postgres: push empdb 192.168'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libpam.so.0...(no debugging symbols found)...done. Loaded symbols for /lib/libpam.so.0 Reading symbols from /lib/libssl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libssl.so.2 Reading symbols from /lib/libcrypto.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libcrypto.so.2 Reading symbols from /usr/kerberos/lib/libkrb5.so.3...(no debugging symbols found)...done. Loaded symbols for /usr/kerberos/lib/libkrb5.so.3 Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.4...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libreadline.so.4 Reading symbols from /lib/libtermcap.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /usr/kerberos/lib/libcom_err.so.3...done. Loaded symbols for /usr/kerberos/lib/libcom_err.so.3 Reading symbols from /usr/kerberos/lib/libk5crypto.so.3...done. Loaded symbols for /usr/kerberos/lib/libk5crypto.so.3 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 #0 0x08171fdd in RelationBuildRuleLock () (gdb) where #0 0x08171fdd in RelationBuildRuleLock () #1 0x081722ab in RelationBuildDesc () #2 0x0817311c in RelationIdGetRelation () #3 0x08077b95 in relation_open () #4 0x08077d24 in heap_openrv () #5 0x080b1015 in addRangeTableEntry () #6 0x080ab6de in transformTableEntry () #7 0x080ab942 in transformFromClauseItem () #8 0x080ab321 in transformFromClause () #9 0x0809f639 in transformSelectStmt () #10 0x0809dcc4 in transformStmt () #11 0x0809da06 in parse_analyze () #12 0x08128a35 in pg_analyze_and_rewrite () #13 0x08128ce2 in pg_exec_query_string () #14 0x08129fc3 in PostgresMain () #15 0x08110344 in DoBackend () #16 0x0810fc2d in BackendStartup () #17 0x0810ed21 in ServerLoop () #18 0x0810e6f9 in PostmasterMain () #19 0x080e9cdd in main () #20 0x40249336 in __libc_start_main (main=0x80e9ab0 <main>, argc=3, ubp_av=0xbffffb04, init=0x806b2ec <_init>, fini=0x8188830 <_fini>, rtld_fini=0x4000d2fc <_dl_fini>, stack_end=0xbffffafc) at ../sysdeps/generic/libc-start.c:129 (gdb) quit and here all the activity performed by the process: Jul 22 20:00:02 sdr2 postgres[25079]: [2896523] LOG: connection received: host= xxx.xxx.xxx.xxx port=53809 Jul 22 20:00:02 sdr2 postgres[25079]: [2896524] LOG: connection authorized: user=push database=empdb Jul 22 20:00:02 sdr2 postgres[25079]: [2896525] DEBUG: /usr/bin/postmaster child[25079]: starting with ( Jul 22 20:00:02 sdr2 postgres[25079]: [2896526] DEBUG: ^Ipostgres Jul 22 20:00:02 sdr2 postgres[25079]: [2896527] DEBUG: ^I-v131072 Jul 22 20:00:02 sdr2 postgres[25079]: [2896528] DEBUG: ^I-p Jul 22 20:00:02 sdr2 postgres[25079]: [2896529] DEBUG: ^Iempdb Jul 22 20:00:02 sdr2 postgres[25079]: [2896530] DEBUG: ) Jul 22 20:00:02 sdr2 postgres[25079]: [2896531] DEBUG: InitPostgres Jul 22 20:00:02 sdr2 postgres[25079]: [2896532] DEBUG: StartTransactionCommand Jul 22 20:00:02 sdr2 postgres[25079]: [2896533-1] LOG: query: set datestyle to 'ISO'; select version(), case when pg_encoding_to_char(1) = 'SQL_ASCII' then 'UNKNOWN' else Jul 22 20:00:02 sdr2 postgres[25079]: [2896533-2] getdatabaseencoding() end; Jul 22 20:00:02 sdr2 postgres[25079]: [2896534] DEBUG: ProcessUtility Jul 22 20:00:02 sdr2 postgres[25079]: [2896535] DEBUG: CommitTransactionCommand Jul 22 20:00:02 sdr2 postgres[25079]: [2896536] DEBUG: StartTransactionCommand Jul 22 20:00:02 sdr2 postgres[25079]: [2896537] DEBUG: ProcessQuery Jul 22 20:00:02 sdr2 postgres[25079]: [2896538] DEBUG: CommitTransactionCommand Jul 22 20:00:02 sdr2 postgres[25079]: [2896539] LOG: duration: 0.035168 sec Jul 22 20:00:02 sdr2 postgres[25079]: [2896540] DEBUG: StartTransactionCommand Jul 22 20:00:02 sdr2 postgres[25079]: [2896541] LOG: query: set client_encoding = 'UNICODE'; show autocommit Jul 22 20:00:02 sdr2 postgres[25079]: [2896542] DEBUG: ProcessUtility Jul 22 20:00:02 sdr2 postgres[25079]: [2896543] DEBUG: CommitTransactionCommand Jul 22 20:00:02 sdr2 postgres[25079]: [2896544] DEBUG: StartTransactionCommand Jul 22 20:00:02 sdr2 postgres[25079]: [2896545] DEBUG: ProcessUtility Jul 22 20:00:02 sdr2 postgres[25079]: [2896546] DEBUG: CommitTransactionCommand Jul 22 20:00:02 sdr2 postgres[25079]: [2896547] LOG: duration: 0.003609 sec Jul 22 20:00:02 sdr2 postgres[25079]: [2896548] DEBUG: StartTransactionCommand Jul 22 20:00:02 sdr2 postgres[25079]: [2896549] LOG: query: SELECT id_publisher, publisher_name, id_package, package_name FROM v_psl_package_info WHERE id_package = 177 Do you need others informations or may I do something in order to permit you to isolate the problem ? Regards Gaetano
"Mendola Gaetano" <mendola@bigfoot.com> writes:
> The process killed made always the same select ( with different
> id_package ):
> SELECT id_publisher, publisher_name, id_package, package_name
> FROM v_psl_package_info
> WHERE id_package = 177;
> (gdb) where
> #0  0x08171fdd in RelationBuildRuleLock ()
> #1  0x081722ab in RelationBuildDesc ()
> #2  0x0817311c in RelationIdGetRelation ()
> #3  0x08077b95 in relation_open ()
> #4  0x08077d24 in heap_openrv ()
> #5  0x080b1015 in addRangeTableEntry ()
> #6  0x080ab6de in transformTableEntry ()
> #7  0x080ab942 in transformFromClauseItem ()
> #8  0x080ab321 in transformFromClause ()
> #9  0x0809f639 in transformSelectStmt ()
> #10 0x0809dcc4 in transformStmt ()
> #11 0x0809da06 in parse_analyze ()
Hmm.  RelationBuildRuleLock is concerned with reading the rule
information for a table; specifically, it's evidently trying to
read the rules for v_psl_package_info.  I suspect some form of
data corruption in the pg_rewrite row(s) for this table.  Do you
see any misbehavior when you do
select * from pg_rewrite where ev_class = 'v_psl_package_info'::regclass
In particular, are any of the columns NULL in these rows?
        regards, tom lane
			
		From: "Tom Lane" <tgl@sss.pgh.pa.us> > "Mendola Gaetano" <mendola@bigfoot.com> writes: > > The process killed made always the same select ( with different > > id_package ): > > > SELECT id_publisher, publisher_name, id_package, package_name > > FROM v_psl_package_info > > WHERE id_package = 177; > > > (gdb) where > > #0 0x08171fdd in RelationBuildRuleLock () > > #1 0x081722ab in RelationBuildDesc () > > #2 0x0817311c in RelationIdGetRelation () > > #3 0x08077b95 in relation_open () > > #4 0x08077d24 in heap_openrv () > > #5 0x080b1015 in addRangeTableEntry () > > #6 0x080ab6de in transformTableEntry () > > #7 0x080ab942 in transformFromClauseItem () > > #8 0x080ab321 in transformFromClause () > > #9 0x0809f639 in transformSelectStmt () > > #10 0x0809dcc4 in transformStmt () > > #11 0x0809da06 in parse_analyze () > > Hmm. RelationBuildRuleLock is concerned with reading the rule > information for a table; specifically, it's evidently trying to > read the rules for v_psl_package_info. I suspect some form of > data corruption in the pg_rewrite row(s) for this table. Do you > see any misbehavior when you do > > select * from pg_rewrite where ev_class = 'v_psl_package_info'::regclass All seems good. > In particular, are any of the columns NULL in these rows? I did: select * from pg_rewrite where ev_class = 'v_psl_package_info'::regclass and ( rulename is null or ev_class is null or ev_attr isnull or ev_type is null or is_instead is null or ev_qual is null or ev_action is null ); and I had: No rows. v_psl_package_info is a view, shall we look in another direction ? thank you Gaetano
"Mendola Gaetano" <mendola@bigfoot.com> writes:
> From: "Tom Lane" <tgl@sss.pgh.pa.us>
>> I suspect some form of
>> data corruption in the pg_rewrite row(s) for this table.  Do you
>> see any misbehavior when you do
>> 
>> select * from pg_rewrite where ev_class = 'v_psl_package_info'::regclass
> All seems good.
I really don't see any other explanation for a crash in that routine
than problems with the pg_rewrite data.
Do you get this any time you try to do something with the
v_psl_package_info view, or is it a once-in-a-while problem?
If the latter, I wonder whether it could be an issue with a
flaky disk sector ...
        regards, tom lane
			
		From: "Tom Lane" <tgl@sss.pgh.pa.us> > "Mendola Gaetano" <mendola@bigfoot.com> writes: > > From: "Tom Lane" <tgl@sss.pgh.pa.us> > >> I suspect some form of > >> data corruption in the pg_rewrite row(s) for this table. Do you > >> see any misbehavior when you do > >> > >> select * from pg_rewrite where ev_class = 'v_psl_package_info'::regclass > > > All seems good. > > I really don't see any other explanation for a crash in that routine > than problems with the pg_rewrite data. > > Do you get this any time you try to do something with the > v_psl_package_info view, or is it a once-in-a-while problem? > If the latter, I wonder whether it could be an issue with a > flaky disk sector ... Is once-in-a-while but always at 00 minutes. This select is performed each 20 minutes and the core happen always at 00 never at 20 and never at 40! I have scheduled also a vacuumdb -z each 15 minutes so only at 00 these are running together. I had another core at 21:00 today and this is the log of the vacuum ( there was that error just during the pg_rewrite vacuum!): INFO: --Relation pg_catalog.pg_description-- INFO: Pages 12: Changed 0, Empty 0; Tup 1390: Vac 0, Keep 0, UnUsed 1. Total CPU 0.00s/0.00u sec elapsed 0.20 sec. INFO: --Relation pg_toast.pg_toast_16416-- INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0. Total CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: Analyzing pg_catalog.pg_description INFO: --Relation pg_catalog.pg_group-- INFO: Pages 1: Changed 0, Empty 0; Tup 3: Vac 0, Keep 0, UnUsed 9. Total CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: --Relation pg_toast.pg_toast_1261-- INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0. Total CPU 0.00s/0.00u sec elapsed 0.01 sec. INFO: Analyzing pg_catalog.pg_group INFO: --Relation pg_catalog.pg_proc-- INFO: Pages 111: Changed 0, Empty 0; Tup 1822: Vac 0, Keep 0, UnUsed 272. Total CPU 0.02s/0.00u sec elapsed 0.29 sec. INFO: --Relation pg_toast.pg_toast_1255-- INFO: Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0. Total CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: Analyzing pg_catalog.pg_proc INFO: --Relation pg_catalog.pg_rewrite-- INFO: Pages 47: Changed 0, Empty 0; Tup 231: Vac 0, Keep 0, UnUsed 74. Total CPU 0.00s/0.00u sec elapsed 0.07 sec. INFO: --Relation pg_toast.pg_toast_16410-- INFO: Pages 22: Changed 0, Empty 0; Tup 77: Vac 0, Keep 0, UnUsed 33. Total CPU 0.00s/0.00u sec elapsed 0.03 sec. INFO: Analyzing pg_catalog.pg_rewrite WARNING: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormallyand possibly corrupted shared memory. I have rolled back the current transaction and am going to terminateyour database system connection and exit. Please reconnect to the database system and repeat your query. server closed the connection unexpectedly This probably means the server terminated abnormally before or whileprocessing the request. connection to server was lost vacuumdb: vacuum empdb failed I think we are near the solution now. Thank you Gaetano
"Mendola Gaetano" <mendola@bigfoot.com> writes:
> Is once-in-a-while but always at 00 minutes. This select is performed each
> 20 minutes and
> the core happen always at 00 never at 20 and never at 40!
Now that is very interesting ... why would that be?
Could we see the definition of this view?
> I had another core at 21:00 today and this is the log of the vacuum
> ( there was that error just during the pg_rewrite vacuum!):
I think that's just coincidence.
        regards, tom lane
			
		"Tom Lane" <tgl@sss.pgh.pa.us> writes:
> "Mendola Gaetano" <mendola@bigfoot.com> writes:
> > Is once-in-a-while but always at 00 minutes. This select is performed
each
> > 20 minutes and
> > the core happen always at 00 never at 20 and never at 40!
>
> Now that is very interesting ... why would that be?
>
> Could we see the definition of this view?
Of course here it is:
CREATE OR REPLACE VIEW v_psl_package_info AS SELECT pb.id_publisher   AS id_publisher,        pb.name           AS
publisher_name,       pk.id_package     AS id_package,        pk.name           AS package_name
 
 FROM  v_packages pk JOIN v_publishers pb USING (id_publisher);
CREATE OR REPLACE VIEW v_packages AS SELECT p.id_package            AS id_package,        p.id_publisher          AS
id_publisher,       p.name                  AS name,        p.information           AS information,
p.description          AS description,        sp_lookup_key('package_type', p.id_package_type)
     AS TYPE,        sp_lookup_key('target', p.id_target)                                AS target,        p.port
          AS port,        p.priority              AS priority,        sp_lookup_key('fec', p.id_fec)
           AS fec,        p.input_group           AS input_group,        p.output_group          AS output_group,
CASE WHEN p.updatable             THEN 1             ELSE 0        END                     AS updatable,
p.checksum             AS checksum,        p.version               AS version,        p.start_file            AS
start_file,       p.view_target_group     AS view_target_group,        p.target_group          AS target_group,
CASEWHEN p.auto_listen             THEN 1             ELSE 0        END                     AS auto_listen,        CASE
WHENp.public_flag             THEN 1             ELSE 0        END                     AS public_flag,
p.needed_version       AS needed_version,        p.logic_version         AS logic_version,        p.package_size
 AS package_size,        ps.id_drm_process       AS id_drm_process,        ps.id_cas_service       AS id_cas_service,
    ps.id_cas_settings      AS id_cas_settings,        ps.id_drm_service       AS id_drm_service
 
 FROM packages p LEFT OUTER JOIN package_security ps USING (id_package) ORDER BY p.id_publisher, p.name;
CREATE OR REPLACE VIEW v_publishers AS SELECT p.id_publisher AS id_publisher,        p.login        AS login,
p.password    AS password,        p.ftp_password AS ftp_password,        p.name         AS name,        p.address
ASaddress,        p.city         AS city,        sp_lookup_descr('country', p.id_country)                       AS
country,       p.zip          AS zip,        p.phone_number AS phone_number,        p.fax_number   AS fax_number,
p.email        AS email,        p.web_site     AS web_site,        p.description  AS description,        v.id_pid
ASid_pid,        v.id_transponder AS id_transponder,        v.transponder  AS transponder,        v.pid          AS
pid,       v.satellite    AS satellite,        v.frequency    AS downlink_freq,        v.polarization AS polarization,
     v.fec          AS fec,        v.symbol_rate  AS symbol_rate,        v.description  AS txp_description,
a.multicast_ipAS multicast_ip
 
 FROM publishers p JOIN v_pids v USING (id_pid)                   JOIN addresses a USING (id_publisher) WHERE
id_publisher<>0ORDER BY p.name;
 
CREATE OR REPLACE VIEW v_pids AS SELECT p.id_pid          AS id_pid,        t.id_transponder  AS id_transponder,
t.name           AS transponder,        p.pid             AS pid,        sp_lookup_descr('orbital_ptn', t.orbital_pos)
                       AS satellite,        t.frequency       AS frequency,        t.polarization    AS polarization,
    t.fec             AS fec,        t.symbol_rate     AS symbol_rate,        t.description     AS description
 
 FROM  pids p JOIN transponders t USING (id_transponder);
and the code for the function sp_lookup_key:
CREATE OR REPLACE FUNCTION sp_lookup_key ( TEXT,INTEGER )
RETURNS TEXT AS'
DECLARE lookup_name  ALIAS FOR $1; my_id_key    ALIAS FOR $2; my_id_lookup INTEGER; my_key       TEXT;
BEGIN
  SELECT INTO my_id_lookup id_lookup  FROM v_lookup_tables  WHERE name = lookup_name::varchar;
  IF NOT FOUND THEN     RETURN NULL;  END IF;
  SELECT INTO my_key key  FROM lookup_tables  WHERE id_table = my_id_lookup  AND   id_key   = my_id_key;
  IF NOT FOUND THEN     RETURN NULL;  END IF;
  RETURN my_key;
END;
' LANGUAGE 'plpgsql'
WITH ( iscachable );
> > I had another core at 21:00 today and this is the log of the vacuum
> > ( there was that error just during the pg_rewrite vacuum!):
>
> I think that's just coincidence.
>
> regards, tom lane
May be was coincidence, I'm going to avoid the vacuum and that select
concurreny and see what happen in these next days.
Just to add some others information,
this is what happens 22 Jul at 20:00
( this time was during the pg_attribute vacuum)
INFO:  --Relation pg_catalog.pg_description--
INFO:  Pages 12: Changed 0, Empty 0; Tup 1390: Vac 0, Keep 0, UnUsed 1.       Total CPU 0.01s/0.00u sec elapsed 0.03
sec.
INFO:  --Relation pg_toast.pg_toast_16416--
INFO:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.       Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  Analyzing pg_catalog.pg_description
INFO:  --Relation pg_catalog.pg_group--
INFO:  Pages 1: Changed 0, Empty 0; Tup 3: Vac 0, Keep 0, UnUsed 9.       Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  --Relation pg_toast.pg_toast_1261--
INFO:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.       Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  Analyzing pg_catalog.pg_group
INFO:  --Relation pg_catalog.pg_proc--
INFO:  Pages 110: Changed 0, Empty 0; Tup 1821: Vac 0, Keep 0, UnUsed 269.       Total CPU 0.01s/0.00u sec elapsed 0.22
sec.
INFO:  --Relation pg_toast.pg_toast_1255--
INFO:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.       Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  Analyzing pg_catalog.pg_proc
INFO:  --Relation pg_catalog.pg_rewrite--
INFO:  Pages 47: Changed 0, Empty 0; Tup 231: Vac 0, Keep 0, UnUsed 73.       Total CPU 0.01s/0.00u sec elapsed 0.18
sec.
INFO:  --Relation pg_toast.pg_toast_16410--
INFO:  Pages 22: Changed 0, Empty 0; Tup 77: Vac 0, Keep 0, UnUsed 33.       Total CPU 0.00s/0.00u sec elapsed 0.02
sec.
INFO:  Analyzing pg_catalog.pg_rewrite
INFO:  --Relation pg_catalog.pg_type--
INFO:  Pages 10: Changed 0, Empty 0; Tup 523: Vac 0, Keep 0, UnUsed 27.       Total CPU 0.00s/0.00u sec elapsed 0.03
sec.
INFO:  Analyzing pg_catalog.pg_type
INFO:  --Relation pg_catalog.pg_attribute--
WARNING:  Message from PostgreSQL backend:       The Postmaster has informed me that some other backend       died
abnormallyand possibly corrupted shared memory.       I have rolled back the current transaction and am       going to
terminateyour database system connection and exit.       Please reconnect to the database system and repeat your
query.
server closed the connection unexpectedly       This probably means the server terminated abnormally       before or
whileprocessing the request.
 
connection to server was lost
vacuumdb: vacuum  empdb failed
Thank you
Gaetano