Обсуждение: panic: index siblings mismatch
Hello, today I recognized an error while playing with pgpool-II 2.1 and postgresql 8.3.5 (on 64bit linux). I really don't know if the error is caused by pgpool or if it's postgresql-related...?!? There are up to 200 active connections in parallel and thereby sometimes high load (with high write:read ratio, peak level ~60% write). Can this be a problem in such a usage scenario, are there some limitations in pgpool-II <-> number of connections / load ??? any hints are much appreciated, since we don't know what happened... thanks in advance...::GERD::... LOG DETAILS: a) postgresql log of second db node where the error occured b) pgpool log a) ================================================================== 2009-03-10 11:16:33 CET [11008]: [1-1] PANIC: right sibling's left-link doesn't match: block 695 links to 785 instead of expected 694 in index "PartyAddress_index" 2009-03-10 11:16:33 CET [11008]: [2-1] STATEMENT: INSERT INTO "PartyAddress" ("Name1", "Name2", "Street1", "Street2", " City", "Postcode", "Region", "Country", "Timezone", "Qualifier", "Compressed", "PositionId") VALUES (E'Testing', E'', E' Public street', E'', E'somewhere', E'12345', E'-', E'DE', E'UTC+1', E'address.loading', FALSE, NULL); select lastval() 2009-03-10 11:16:33 CET [29026]: [3-1] LOG: server process (PID 11008) was terminated by signal 6: Aborted 2009-03-10 11:16:33 CET [29026]: [4-1] LOG: terminating any other active server processes 2009-03-10 11:16:33 CET [10978]: [5-1] WARNING: terminating connection because of crash of another server process 2009-03-10 11:16:33 CET [10978]: [6-1] DETAIL: The postmaster has commanded this server process to roll back the curren t transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2009-03-10 11:16:33 CET [10978]: [7-1] HINT: In a moment you should be able to reconnect to the database and repeat you r command. 2009-03-10 11:16:33 CET [10947]: [1-1] WARNING: terminating connection because of crash of another server process ..... ..... 2009-03-10 11:16:33 CET [29026]: [6-1] LOG: all server processes terminated; reinitializing 2009-03-10 11:16:33 CET [11015]: [1-1] LOG: database system was interrupted; last known up at 2009-03-10 11:09:10 CET 2009-03-10 11:16:33 CET [11015]: [2-1] LOG: starting archive recovery 2009-03-10 11:16:33 CET [11015]: [3-1] LOG: restore_command = 'cp /exchange/wal_from_iweblin/%f %p' cp: cannot stat `/exchange/wal_from_iweblin/00000003.history': No such file or directory cp: cannot stat `/exchange/wal_from_iweblin/0000000300000001000000AA': No such file or directory 2009-03-10 11:16:33 CET [11015]: [4-1] LOG: automatic recovery in progress 2009-03-10 11:16:33 CET [11015]: [5-1] LOG: redo starts at 1/AABEBFA8 2009-03-10 11:16:33 CET [11015]: [6-1] LOG: unexpected pageaddr 0/36BEE000 in log file 1, segment 170, offset 12509184 2009-03-10 11:16:33 CET [11015]: [7-1] LOG: redo done at 1/AABED938 2009-03-10 11:16:33 CET [11015]: [8-1] LOG: last completed transaction was at log time 2009-03-10 11:15:17.533913+01 cp: cannot stat `/exchange/wal_from_iweblin/0000000300000001000000AA': No such file or directory cp: cannot stat `/exchange/wal_from_iweblin/00000004.history': No such file or directory 2009-03-10 11:16:33 CET [11015]: [9-1] LOG: selected new timeline ID: 4 cp: cannot stat `/exchange/wal_from_iweblin/00000003.history': No such file or directory 2009-03-10 11:16:33 CET [11015]: [10-1] LOG: archive recovery complete 2009-03-10 11:16:33 CET [11015]: [11-1] LOG: checkpoint starting: shutdown immediate 2009-03-10 11:16:34 CET [11015]: [12-1] LOG: checkpoint complete: wrote 4 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.005 s, sync=0.086 s, total=0.242 s 2009-03-10 11:16:34 CET [11023]: [1-1] LOG: autovacuum launcher started 2009-03-10 11:16:34 CET [29026]: [7-1] LOG: database system is ready to accept connections ======================================= b) ========================================== 2009-03-10 11:16:23 DEBUG: pid 19059: read kind from frontend X(58) 2009-03-10 11:16:31 DEBUG: pid 27107: read kind from frontend Q(51) 2009-03-10 11:16:31 LOG: pid 27107: statement: SELECT d.oid, datname as name, pg_get_userbyid(datdba) as owner, ds.des cription, pg_encoding_to_char(encoding) as dbencoding, datistemplate, datlastsysoid, datconfig, s.spcname as tblspace, d atconnlimit AS conlimit FROM pg_database d LEFT OUTER JOIN pg_description ds ON d.oid = ds.objoid LEFT OUTER JOIN pg_tab lespace s ON d.dattablespace = s.oid WHERE datname = 'test' 2009-03-10 11:16:31 DEBUG: pid 27107: waiting for backend 0 completing the query 2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind from 0 th backend T NUM_BACKENDS: 1 2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from backend: T 2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind from 0 th backend D NUM_BACKENDS: 1 2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from backend: D 2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind from 0 th backend C NUM_BACKENDS: 1 2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from backend: C 2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind from 0 th backend Z NUM_BACKENDS: 1 2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from backend: Z 2009-03-10 11:16:31 DEBUG: pid 27107: pool_read_message_length: slot: 0 length: 5 2009-03-10 11:16:31 DEBUG: pid 27107: ReadyForQuery: message length: 5 2009-03-10 11:16:31 DEBUG: pid 27107: ReadyForQuery: transaction state: I 2009-03-10 11:16:31 DEBUG: pid 27107: end_load_balance: end load balance mode 2009-03-10 11:16:31 DEBUG: pid 27107: read kind from frontend X(58) 2009-03-10 11:16:36 DEBUG: pid 27107: I am 27107 accept fd 5 2009-03-10 11:16:36 LOG: pid 27107: connection received: host=172.28.1.126 port=1946 2009-03-10 11:16:36 DEBUG: pid 27107: Protocol Major: 1234 Minor: 5679 database: user: 2009-03-10 11:16:36 DEBUG: pid 27107: SSLRequest: sent N; retry startup 2009-03-10 11:16:36 DEBUG: pid 27107: Protocol Major: 3 Minor: 0 database: test user: test 2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: connecting 0 backend 2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: connecting 1 backend 2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: skipping slot 1 because backend_status = 3
On Tue, Mar 10, 2009 at 3:35 PM, Gerd König <koenig@transporeon.com> wrote: > Hello, > > today I recognized an error while playing with pgpool-II 2.1 and > postgresql 8.3.5 (on 64bit linux). > I really don't know if the error is caused by pgpool or if it's > postgresql-related...?!? > any hints are much appreciated, since we don't know what happened... > > thanks in advance...::GERD::... > > a) postgresql log of second db node where the error occured > b) pgpool log > > a) > ================================================================== > 2009-03-10 11:16:33 CET [11008]: [1-1] PANIC: right sibling's left-link > doesn't match: block 695 links to 785 instead of expected 694 in index > "PartyAddress_index" This is a sign of index corruption. Have you had any unexpected power outages lately? you could try reindexing the table that's affected and see if the problem goes away, but I'd be far more worried about what caused the original corruption to the index.
Have you turned off autovacuum on the first db node before starting online recovery? -- Tatsuo Ishii SRA OSS, Inc. Japan > Hello, > > today I recognized an error while playing with pgpool-II 2.1 and > postgresql 8.3.5 (on 64bit linux). > I really don't know if the error is caused by pgpool or if it's > postgresql-related...?!? > > There are up to 200 active connections in parallel and thereby sometimes > high load (with high write:read ratio, peak level ~60% write). > Can this be a problem in such a usage scenario, are there some > limitations in pgpool-II <-> number of connections / load ??? > > any hints are much appreciated, since we don't know what happened... > > thanks in advance...::GERD::... > > > > LOG DETAILS: > > a) postgresql log of second db node where the error occured > b) pgpool log > > > a) > ================================================================== > 2009-03-10 11:16:33 CET [11008]: [1-1] PANIC: right sibling's left-link > doesn't match: block 695 links to 785 instead of expected 694 in index > "PartyAddress_index" > 2009-03-10 11:16:33 CET [11008]: [2-1] STATEMENT: INSERT INTO > "PartyAddress" ("Name1", "Name2", "Street1", "Street2", " > City", "Postcode", "Region", "Country", "Timezone", "Qualifier", > "Compressed", "PositionId") VALUES (E'Testing', E'', E' > Public street', E'', E'somewhere', E'12345', E'-', E'DE', E'UTC+1', > E'address.loading', FALSE, NULL); select lastval() > 2009-03-10 11:16:33 CET [29026]: [3-1] LOG: server process (PID 11008) > was terminated by signal 6: Aborted > 2009-03-10 11:16:33 CET [29026]: [4-1] LOG: terminating any other > active server processes > 2009-03-10 11:16:33 CET [10978]: [5-1] WARNING: terminating connection > because of crash of another server process > 2009-03-10 11:16:33 CET [10978]: [6-1] DETAIL: The postmaster has > commanded this server process to roll back the curren > t transaction and exit, because another server process exited abnormally > and possibly corrupted shared memory. > 2009-03-10 11:16:33 CET [10978]: [7-1] HINT: In a moment you should be > able to reconnect to the database and repeat you > r command. > 2009-03-10 11:16:33 CET [10947]: [1-1] WARNING: terminating connection > because of crash of another server process > ..... > ..... > 2009-03-10 11:16:33 CET [29026]: [6-1] LOG: all server processes > terminated; reinitializing > 2009-03-10 11:16:33 CET [11015]: [1-1] LOG: database system was > interrupted; last known up at 2009-03-10 11:09:10 CET > 2009-03-10 11:16:33 CET [11015]: [2-1] LOG: starting archive recovery > 2009-03-10 11:16:33 CET [11015]: [3-1] LOG: restore_command = 'cp > /exchange/wal_from_iweblin/%f %p' > cp: cannot stat `/exchange/wal_from_iweblin/00000003.history': No such > file or directory > cp: cannot stat `/exchange/wal_from_iweblin/0000000300000001000000AA': > No such file or directory > 2009-03-10 11:16:33 CET [11015]: [4-1] LOG: automatic recovery in progress > 2009-03-10 11:16:33 CET [11015]: [5-1] LOG: redo starts at 1/AABEBFA8 > 2009-03-10 11:16:33 CET [11015]: [6-1] LOG: unexpected pageaddr > 0/36BEE000 in log file 1, segment 170, offset 12509184 > 2009-03-10 11:16:33 CET [11015]: [7-1] LOG: redo done at 1/AABED938 > 2009-03-10 11:16:33 CET [11015]: [8-1] LOG: last completed transaction > was at log time 2009-03-10 11:15:17.533913+01 > cp: cannot stat `/exchange/wal_from_iweblin/0000000300000001000000AA': > No such file or directory > cp: cannot stat `/exchange/wal_from_iweblin/00000004.history': No such > file or directory > 2009-03-10 11:16:33 CET [11015]: [9-1] LOG: selected new timeline ID: 4 > cp: cannot stat `/exchange/wal_from_iweblin/00000003.history': No such > file or directory > 2009-03-10 11:16:33 CET [11015]: [10-1] LOG: archive recovery complete > 2009-03-10 11:16:33 CET [11015]: [11-1] LOG: checkpoint starting: > shutdown immediate > 2009-03-10 11:16:34 CET [11015]: [12-1] LOG: checkpoint complete: wrote > 4 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 > recycled; write=0.005 s, sync=0.086 s, total=0.242 s > 2009-03-10 11:16:34 CET [11023]: [1-1] LOG: autovacuum launcher started > 2009-03-10 11:16:34 CET [29026]: [7-1] LOG: database system is ready to > accept connections > ======================================= > > > > b) > ========================================== > 2009-03-10 11:16:23 DEBUG: pid 19059: read kind from frontend X(58) > 2009-03-10 11:16:31 DEBUG: pid 27107: read kind from frontend Q(51) > 2009-03-10 11:16:31 LOG: pid 27107: statement: SELECT d.oid, datname > as name, pg_get_userbyid(datdba) as owner, ds.des > cription, pg_encoding_to_char(encoding) as dbencoding, datistemplate, > datlastsysoid, datconfig, s.spcname as tblspace, d > atconnlimit AS conlimit FROM pg_database d LEFT OUTER JOIN > pg_description ds ON d.oid = ds.objoid LEFT OUTER JOIN pg_tab > lespace s ON d.dattablespace = s.oid WHERE datname = 'test' > > 2009-03-10 11:16:31 DEBUG: pid 27107: waiting for backend 0 completing > the query > 2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind > from 0 th backend T NUM_BACKENDS: 1 > 2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from > backend: T > 2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind > from 0 th backend D NUM_BACKENDS: 1 > 2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from > backend: D > 2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind > from 0 th backend C NUM_BACKENDS: 1 > 2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from > backend: C > 2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind > from 0 th backend Z NUM_BACKENDS: 1 > 2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from > backend: Z > 2009-03-10 11:16:31 DEBUG: pid 27107: pool_read_message_length: slot: 0 > length: 5 > 2009-03-10 11:16:31 DEBUG: pid 27107: ReadyForQuery: message length: 5 > 2009-03-10 11:16:31 DEBUG: pid 27107: ReadyForQuery: transaction state: I > 2009-03-10 11:16:31 DEBUG: pid 27107: end_load_balance: end load balance > mode > 2009-03-10 11:16:31 DEBUG: pid 27107: read kind from frontend X(58) > 2009-03-10 11:16:36 DEBUG: pid 27107: I am 27107 accept fd 5 > 2009-03-10 11:16:36 LOG: pid 27107: connection received: > host=172.28.1.126 port=1946 > 2009-03-10 11:16:36 DEBUG: pid 27107: Protocol Major: 1234 Minor: 5679 > database: user: > 2009-03-10 11:16:36 DEBUG: pid 27107: SSLRequest: sent N; retry startup > 2009-03-10 11:16:36 DEBUG: pid 27107: Protocol Major: 3 Minor: 0 > database: test user: test > 2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: connecting 0 backend > 2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: connecting 1 backend > 2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: skipping slot 1 > because backend_status = 3 > _______________________________________________ > Pgpool-general mailing list > Pgpool-general@pgfoundry.org > http://pgfoundry.org/mailman/listinfo/pgpool-general