Обсуждение: panic: index siblings mismatch

Поиск
Список
Период
Сортировка

panic: index siblings mismatch

От
Gerd König
Дата:
Hello,

today I recognized an error while playing with pgpool-II 2.1 and
postgresql 8.3.5 (on 64bit linux).
I really don't know if the error is caused by pgpool or if it's
postgresql-related...?!?

There are up to 200 active connections in parallel and thereby sometimes
high load (with high write:read ratio, peak level ~60% write).
Can this be a problem in such a usage scenario, are there some
limitations in pgpool-II <-> number of connections / load ???

any hints are much appreciated, since we don't know what happened...

thanks in advance...::GERD::...



LOG DETAILS:

a) postgresql log of second db node where the error occured
b) pgpool log


a)
==================================================================
2009-03-10 11:16:33 CET [11008]: [1-1] PANIC:  right sibling's left-link
doesn't match: block 695 links to 785 instead of expected 694 in index
"PartyAddress_index"
2009-03-10 11:16:33 CET [11008]: [2-1] STATEMENT:  INSERT INTO
"PartyAddress" ("Name1", "Name2", "Street1", "Street2", "
City", "Postcode", "Region", "Country", "Timezone", "Qualifier",
"Compressed", "PositionId") VALUES (E'Testing', E'', E'
Public street', E'', E'somewhere', E'12345', E'-', E'DE', E'UTC+1',
E'address.loading', FALSE, NULL); select lastval()
2009-03-10 11:16:33 CET [29026]: [3-1] LOG:  server process (PID 11008)
was terminated by signal 6: Aborted
2009-03-10 11:16:33 CET [29026]: [4-1] LOG:  terminating any other
active server processes
2009-03-10 11:16:33 CET [10978]: [5-1] WARNING:  terminating connection
because of crash of another server process
2009-03-10 11:16:33 CET [10978]: [6-1] DETAIL:  The postmaster has
commanded this server process to roll back the curren
t transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
2009-03-10 11:16:33 CET [10978]: [7-1] HINT:  In a moment you should be
able to reconnect to the database and repeat you
r command.
2009-03-10 11:16:33 CET [10947]: [1-1] WARNING:  terminating connection
because of crash of another server process
.....
.....
2009-03-10 11:16:33 CET [29026]: [6-1] LOG:  all server processes
terminated; reinitializing
2009-03-10 11:16:33 CET [11015]: [1-1] LOG:  database system was
interrupted; last known up at 2009-03-10 11:09:10 CET
2009-03-10 11:16:33 CET [11015]: [2-1] LOG:  starting archive recovery
2009-03-10 11:16:33 CET [11015]: [3-1] LOG:  restore_command = 'cp
/exchange/wal_from_iweblin/%f %p'
cp: cannot stat `/exchange/wal_from_iweblin/00000003.history': No such
file or directory
cp: cannot stat `/exchange/wal_from_iweblin/0000000300000001000000AA':
No such file or directory
2009-03-10 11:16:33 CET [11015]: [4-1] LOG:  automatic recovery in progress
2009-03-10 11:16:33 CET [11015]: [5-1] LOG:  redo starts at 1/AABEBFA8
2009-03-10 11:16:33 CET [11015]: [6-1] LOG:  unexpected pageaddr
0/36BEE000 in log file 1, segment 170, offset 12509184
2009-03-10 11:16:33 CET [11015]: [7-1] LOG:  redo done at 1/AABED938
2009-03-10 11:16:33 CET [11015]: [8-1] LOG:  last completed transaction
was at log time 2009-03-10 11:15:17.533913+01
cp: cannot stat `/exchange/wal_from_iweblin/0000000300000001000000AA':
No such file or directory
cp: cannot stat `/exchange/wal_from_iweblin/00000004.history': No such
file or directory
2009-03-10 11:16:33 CET [11015]: [9-1] LOG:  selected new timeline ID: 4
cp: cannot stat `/exchange/wal_from_iweblin/00000003.history': No such
file or directory
2009-03-10 11:16:33 CET [11015]: [10-1] LOG:  archive recovery complete
2009-03-10 11:16:33 CET [11015]: [11-1] LOG:  checkpoint starting:
shutdown immediate
2009-03-10 11:16:34 CET [11015]: [12-1] LOG:  checkpoint complete: wrote
4 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0
recycled; write=0.005 s, sync=0.086 s, total=0.242 s
2009-03-10 11:16:34 CET [11023]: [1-1] LOG:  autovacuum launcher started
2009-03-10 11:16:34 CET [29026]: [7-1] LOG:  database system is ready to
accept connections
=======================================



b)
==========================================
2009-03-10 11:16:23 DEBUG: pid 19059: read kind from frontend X(58)
2009-03-10 11:16:31 DEBUG: pid 27107: read kind from frontend Q(51)
2009-03-10 11:16:31 LOG:   pid 27107: statement: SELECT d.oid, datname
as name, pg_get_userbyid(datdba) as owner, ds.des
cription, pg_encoding_to_char(encoding) as dbencoding, datistemplate,
datlastsysoid, datconfig, s.spcname as tblspace, d
atconnlimit AS conlimit FROM pg_database d LEFT OUTER JOIN
pg_description ds ON d.oid = ds.objoid LEFT OUTER JOIN pg_tab
lespace s ON d.dattablespace = s.oid WHERE datname = 'test'

2009-03-10 11:16:31 DEBUG: pid 27107: waiting for backend 0 completing
the query
2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
from 0 th backend T NUM_BACKENDS: 1
2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
backend: T
2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
from 0 th backend D NUM_BACKENDS: 1
2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
backend: D
2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
from 0 th backend C NUM_BACKENDS: 1
2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
backend: C
2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
from 0 th backend Z NUM_BACKENDS: 1
2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
backend: Z
2009-03-10 11:16:31 DEBUG: pid 27107: pool_read_message_length: slot: 0
length: 5
2009-03-10 11:16:31 DEBUG: pid 27107: ReadyForQuery: message length: 5
2009-03-10 11:16:31 DEBUG: pid 27107: ReadyForQuery: transaction state: I
2009-03-10 11:16:31 DEBUG: pid 27107: end_load_balance: end load balance
mode
2009-03-10 11:16:31 DEBUG: pid 27107: read kind from frontend X(58)
2009-03-10 11:16:36 DEBUG: pid 27107: I am 27107 accept fd 5
2009-03-10 11:16:36 LOG:   pid 27107: connection received:
host=172.28.1.126 port=1946
2009-03-10 11:16:36 DEBUG: pid 27107: Protocol Major: 1234 Minor: 5679
database:  user:
2009-03-10 11:16:36 DEBUG: pid 27107: SSLRequest: sent N; retry startup
2009-03-10 11:16:36 DEBUG: pid 27107: Protocol Major: 3 Minor: 0
database: test user: test
2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: connecting 0 backend
2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: connecting 1 backend
2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: skipping slot 1
because backend_status = 3

Re: panic: index siblings mismatch

От
Scott Marlowe
Дата:
On Tue, Mar 10, 2009 at 3:35 PM, Gerd König <koenig@transporeon.com> wrote:
> Hello,
>
> today I recognized an error while playing with pgpool-II 2.1 and
> postgresql 8.3.5 (on 64bit linux).
> I really don't know if the error is caused by pgpool or if it's
> postgresql-related...?!?
> any hints are much appreciated, since we don't know what happened...
>
> thanks in advance...::GERD::...
>
> a) postgresql log of second db node where the error occured
> b) pgpool log
>
> a)
> ==================================================================
> 2009-03-10 11:16:33 CET [11008]: [1-1] PANIC:  right sibling's left-link
> doesn't match: block 695 links to 785 instead of expected 694 in index
> "PartyAddress_index"

This is a sign of index corruption.  Have you had any unexpected power
outages lately?  you could try reindexing the table that's affected
and see if the problem goes away, but I'd be far more worried about
what caused the original corruption to the index.

Re: [Pgpool-general] panic: index siblings mismatch

От
Tatsuo Ishii
Дата:
Have you turned off autovacuum on the first db node before starting
online recovery?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Hello,
>
> today I recognized an error while playing with pgpool-II 2.1 and
> postgresql 8.3.5 (on 64bit linux).
> I really don't know if the error is caused by pgpool or if it's
> postgresql-related...?!?
>
> There are up to 200 active connections in parallel and thereby sometimes
> high load (with high write:read ratio, peak level ~60% write).
> Can this be a problem in such a usage scenario, are there some
> limitations in pgpool-II <-> number of connections / load ???
>
> any hints are much appreciated, since we don't know what happened...
>
> thanks in advance...::GERD::...
>
>
>
> LOG DETAILS:
>
> a) postgresql log of second db node where the error occured
> b) pgpool log
>
>
> a)
> ==================================================================
> 2009-03-10 11:16:33 CET [11008]: [1-1] PANIC:  right sibling's left-link
> doesn't match: block 695 links to 785 instead of expected 694 in index
> "PartyAddress_index"
> 2009-03-10 11:16:33 CET [11008]: [2-1] STATEMENT:  INSERT INTO
> "PartyAddress" ("Name1", "Name2", "Street1", "Street2", "
> City", "Postcode", "Region", "Country", "Timezone", "Qualifier",
> "Compressed", "PositionId") VALUES (E'Testing', E'', E'
> Public street', E'', E'somewhere', E'12345', E'-', E'DE', E'UTC+1',
> E'address.loading', FALSE, NULL); select lastval()
> 2009-03-10 11:16:33 CET [29026]: [3-1] LOG:  server process (PID 11008)
> was terminated by signal 6: Aborted
> 2009-03-10 11:16:33 CET [29026]: [4-1] LOG:  terminating any other
> active server processes
> 2009-03-10 11:16:33 CET [10978]: [5-1] WARNING:  terminating connection
> because of crash of another server process
> 2009-03-10 11:16:33 CET [10978]: [6-1] DETAIL:  The postmaster has
> commanded this server process to roll back the curren
> t transaction and exit, because another server process exited abnormally
> and possibly corrupted shared memory.
> 2009-03-10 11:16:33 CET [10978]: [7-1] HINT:  In a moment you should be
> able to reconnect to the database and repeat you
> r command.
> 2009-03-10 11:16:33 CET [10947]: [1-1] WARNING:  terminating connection
> because of crash of another server process
> .....
> .....
> 2009-03-10 11:16:33 CET [29026]: [6-1] LOG:  all server processes
> terminated; reinitializing
> 2009-03-10 11:16:33 CET [11015]: [1-1] LOG:  database system was
> interrupted; last known up at 2009-03-10 11:09:10 CET
> 2009-03-10 11:16:33 CET [11015]: [2-1] LOG:  starting archive recovery
> 2009-03-10 11:16:33 CET [11015]: [3-1] LOG:  restore_command = 'cp
> /exchange/wal_from_iweblin/%f %p'
> cp: cannot stat `/exchange/wal_from_iweblin/00000003.history': No such
> file or directory
> cp: cannot stat `/exchange/wal_from_iweblin/0000000300000001000000AA':
> No such file or directory
> 2009-03-10 11:16:33 CET [11015]: [4-1] LOG:  automatic recovery in progress
> 2009-03-10 11:16:33 CET [11015]: [5-1] LOG:  redo starts at 1/AABEBFA8
> 2009-03-10 11:16:33 CET [11015]: [6-1] LOG:  unexpected pageaddr
> 0/36BEE000 in log file 1, segment 170, offset 12509184
> 2009-03-10 11:16:33 CET [11015]: [7-1] LOG:  redo done at 1/AABED938
> 2009-03-10 11:16:33 CET [11015]: [8-1] LOG:  last completed transaction
> was at log time 2009-03-10 11:15:17.533913+01
> cp: cannot stat `/exchange/wal_from_iweblin/0000000300000001000000AA':
> No such file or directory
> cp: cannot stat `/exchange/wal_from_iweblin/00000004.history': No such
> file or directory
> 2009-03-10 11:16:33 CET [11015]: [9-1] LOG:  selected new timeline ID: 4
> cp: cannot stat `/exchange/wal_from_iweblin/00000003.history': No such
> file or directory
> 2009-03-10 11:16:33 CET [11015]: [10-1] LOG:  archive recovery complete
> 2009-03-10 11:16:33 CET [11015]: [11-1] LOG:  checkpoint starting:
> shutdown immediate
> 2009-03-10 11:16:34 CET [11015]: [12-1] LOG:  checkpoint complete: wrote
> 4 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0
> recycled; write=0.005 s, sync=0.086 s, total=0.242 s
> 2009-03-10 11:16:34 CET [11023]: [1-1] LOG:  autovacuum launcher started
> 2009-03-10 11:16:34 CET [29026]: [7-1] LOG:  database system is ready to
> accept connections
> =======================================
>
>
>
> b)
> ==========================================
> 2009-03-10 11:16:23 DEBUG: pid 19059: read kind from frontend X(58)
> 2009-03-10 11:16:31 DEBUG: pid 27107: read kind from frontend Q(51)
> 2009-03-10 11:16:31 LOG:   pid 27107: statement: SELECT d.oid, datname
> as name, pg_get_userbyid(datdba) as owner, ds.des
> cription, pg_encoding_to_char(encoding) as dbencoding, datistemplate,
> datlastsysoid, datconfig, s.spcname as tblspace, d
> atconnlimit AS conlimit FROM pg_database d LEFT OUTER JOIN
> pg_description ds ON d.oid = ds.objoid LEFT OUTER JOIN pg_tab
> lespace s ON d.dattablespace = s.oid WHERE datname = 'test'
>
> 2009-03-10 11:16:31 DEBUG: pid 27107: waiting for backend 0 completing
> the query
> 2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
> from 0 th backend T NUM_BACKENDS: 1
> 2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
> backend: T
> 2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
> from 0 th backend D NUM_BACKENDS: 1
> 2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
> backend: D
> 2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
> from 0 th backend C NUM_BACKENDS: 1
> 2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
> backend: C
> 2009-03-10 11:16:31 DEBUG: pid 27107: read_kind_from_backend: read kind
> from 0 th backend Z NUM_BACKENDS: 1
> 2009-03-10 11:16:31 DEBUG: pid 27107: pool_process_query: kind from
> backend: Z
> 2009-03-10 11:16:31 DEBUG: pid 27107: pool_read_message_length: slot: 0
> length: 5
> 2009-03-10 11:16:31 DEBUG: pid 27107: ReadyForQuery: message length: 5
> 2009-03-10 11:16:31 DEBUG: pid 27107: ReadyForQuery: transaction state: I
> 2009-03-10 11:16:31 DEBUG: pid 27107: end_load_balance: end load balance
> mode
> 2009-03-10 11:16:31 DEBUG: pid 27107: read kind from frontend X(58)
> 2009-03-10 11:16:36 DEBUG: pid 27107: I am 27107 accept fd 5
> 2009-03-10 11:16:36 LOG:   pid 27107: connection received:
> host=172.28.1.126 port=1946
> 2009-03-10 11:16:36 DEBUG: pid 27107: Protocol Major: 1234 Minor: 5679
> database:  user:
> 2009-03-10 11:16:36 DEBUG: pid 27107: SSLRequest: sent N; retry startup
> 2009-03-10 11:16:36 DEBUG: pid 27107: Protocol Major: 3 Minor: 0
> database: test user: test
> 2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: connecting 0 backend
> 2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: connecting 1 backend
> 2009-03-10 11:16:36 DEBUG: pid 27107: new_connection: skipping slot 1
> because backend_status = 3
> _______________________________________________
> Pgpool-general mailing list
> Pgpool-general@pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgpool-general