Обсуждение: Still big problems with pg_dump!

Поиск
Список
Период
Сортировка

Still big problems with pg_dump!

От
Wim
Дата:

Hello guys,

I have still problems with dumping my database....

I have postgres 7.2.1 running on a solaris 8 server. When I try to do a
pg_dump of my database, I get the following message:
pg_dump: query to obtain list of tables failed: server closed the
connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

I connect to the database and try to view the tables with \dt and \dS.
Now I get:
ERROR:  AllocSetFree: cannot find block containing chunk 4c5ad0
I retry:
ERROR:  AllocSetFree: cannot find block containing chunk 4860d0

I can view the tables with: \d tablename

Some people suggest a drive failure, but I checked that and found no
problems...
I REINDEXED the whole database... problem still the same...
Tried a VACUUM... still not working...


I must say that one of the table contains more than 3.000.000 rows,
another more than 1.400.000...
Select, update, delete, insert works, just the pg_dump(all) and the \dt
\dS commands...
I must say that I had this problem a few months before, I got some help
then, but that couldn't solve my problem,
I recreated the database from scratch and copied the data, to fix thing
quickly. Thing went well for about two months :-(
Now the problem raises again, and I'm trying to find a solution without
reinstalling the whole thing allover again.


Some advice/help from the specialists?

Cheers!

Wim.

Some info from the debug logfile:
---------------------------------

DEBUG:  StartTransactionCommand
DEBUG:  query: SELECT c.relname as "Name",
  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'i'
THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' END as "Type",
  u.usename as "Owner"
FROM pg_class c LEFT JOIN pg_user u ON c.relowner = u.usesysid
WHERE c.relkind IN ('r','')
  AND c.relname !~ '^pg_'
ORDER BY 1;
DEBUG:  parse tree: { QUERY :command 1  :utility <> :resultRelation 0
:into <> :isPortal false :isBinary false :isTemp false :hasAggs false
:hasSubLinks false :rtable ({ RTE :relname pg_class :relid 1259
:subquery <> :alias { ATTR :relname c :attrs <>} :eref { ATTR :relname c
:attrs ( "relname"   "reltype"   "relowner"   "relam"   "relfilenode"
"relpages"   "reltuples"   "reltoastrelid"   "reltoastidxid"
"relhasindex"   "relisshared"   "relkind"   "relnatts"   "relchecks"
"reltriggers"   "relukeys"   "relfkeys"   "relrefs"   "relhasoids"
"relhaspkey"   "relhasrules"   "relhassubclass"   "relacl" )} :inh true
:inFromCl true :checkForRead true :checkForWrite false :checkAsUser 0} {
RTE :relname pg_user :relid 16478  :subquery <> :alias { ATTR :relname u
:attrs <>} :eref { ATTR :relname u :attrs ( "usename"   "usesysid"
"usecreatedb"   "usetrace"   "usesuper"   "usecatupd"   "passwd"
"valuntil" )} :inh true :inFromCl true :checkForRead true :checkForWrite
false :checkAsUser 0}) :jointree { FROMEXPR :fromlist ({ JOINEXPR
:jointype 1 :isNatural false :larg { RANGETBLREF 1 } :rarg { RANGETBLREF
2 } :using <> :quals { EXPR :typeOid 16  :opType op :oper { OPER :opno
96 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 3 :vartype
23 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 3} { VAR :varno
2 :varattno 2 :vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 2
:varoattno 2})} :alias <> :colnames ( "relname"   "reltype"
"relowner"   "relam"   "relfilenode"   "relpages"   "reltuples"
"reltoastrelid"   "reltoastidxid"   "relhasindex"   "relisshared"
"relkind"   "relnatts"   "relchecks"   "reltriggers"   "relukeys"
"relfkeys"   "relrefs"   "relhasoids"   "relhaspkey"   "relhasrules"
"relhassubclass"   "relacl"   "usename"   "usesysid"   "usecreatedb"
"usetrace"   "usesuper"   "usecatupd"   "passwd"   "valuntil" ) :colvars
({ VAR :varno 1 :varattno 1 :vartype 19 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 1} { VAR :varno 1 :varattno 2 :vartype 26
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 2} { VAR :varno 1
:varattno 3 :vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 3} { VAR :varno 1 :varattno 4 :vartype 26 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 4} { VAR :varno 1 :varattno 5
:vartype 26 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 5} {
VAR :varno 1 :varattno 6 :vartype 23 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 6} { VAR :varno 1 :varattno 7 :vartype 700
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 7} { VAR :varno 1
:varattno 8 :vartype 26 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 8} { VAR :varno 1 :varattno 9 :vartype 26 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 9} { VAR :varno 1 :varattno 10
:vartype 16 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 10} {
VAR :varno 1 :varattno 11 :vartype 16 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 11} { VAR :varno 1 :varattno 12 :vartype 18
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 12} { VAR :varno 1
:varattno 13 :vartype 21 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 13} { VAR :varno 1 :varattno 14 :vartype 21 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 14} { VAR :varno 1 :varattno 15
:vartype 21 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 15} {
VAR :varno 1 :varattno 16 :vartype 21 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 16} { VAR :varno 1 :varattno 17 :vartype 21
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 17} { VAR :varno 1
:varattno 18 :vartype 21 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 18} { VAR :varno 1 :varattno 19 :vartype 16 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 19} { VAR :varno 1 :varattno 20
:vartype 16 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 20} {
VAR :varno 1 :varattno 21 :vartype 16 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 21} { VAR :varno 1 :varattno 22 :vartype 16
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 22} { VAR :varno 1
:varattno 23 :vartype 1034 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 23} { VAR :varno 2 :varattno 1 :vartype 19 :vartypmod -1
:varlevelsup 0 :varnoold 2 :varoattno 1} { VAR :varno 2 :varattno 2
:vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 2 :varoattno 2} {
VAR :varno 2 :varattno 3 :vartype 16 :vartypmod -1  :varlevelsup 0
:varnoold 2 :varoattno 3} { VAR :varno 2 :varattno 4 :vartype 16
:vartypmod -1  :varlevelsup 0 :varnoold 2 :varoattno 4} { VAR :varno 2
:varattno 5 :vartype 16 :vartypmod -1  :varlevelsup 0 :varnoold 2
:varoattno 5} { VAR :varno 2 :varattno 6 :vartype 16 :vartypmod -1
:varlevelsup 0 :varnoold 2 :varoattno 6} { VAR :varno 2 :varattno 7
:vartype 25 :vartypmod -1  :varlevelsup 0 :varnoold 2 :varoattno 7} {
VAR :varno 2 :varattno 8 :vartype 702 :vartypmod -1  :varlevelsup 0
:varnoold 2 :varoattno 8})}) :quals { EXPR :typeOid 16  :opType and
:oper <> :args ({ EXPR :typeOid 16  :opType or :oper <> :args ({ EXPR
:typeOid 16  :opType op :oper { OPER :opno 92 :opid 0 :opresulttype 16 }
:args ({ VAR :varno 1 :varattno 12 :vartype 18 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 12} { CONST :consttype 18
:constlen 1 :constbyval true :constisnull false :constvalue  1 [ 0 0 0
114 ] })} { EXPR :typeOid 16  :opType op :oper { OPER :opno 92 :opid 0
:opresulttype 16 } :args ({ VAR :varno 1 :varattno 12 :vartype 18
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 12} { CONST
:consttype 18 :constlen 1 :constbyval true :constisnull false
:constvalue  1 [ 0 0 0 0 ] })})} { EXPR :typeOid 16  :opType op :oper {
OPER :opno 640 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1
:varattno 1 :vartype 19 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 1} { CONST :consttype 25 :constlen -1 :constbyval false
:constisnull false :constvalue  8 [ 0 0 0 8 94 112 103 95 ] })})}}
:rowMarks () :targetList ({ TARGETENTRY :resdom { RESDOM :resno 1
:restype 19 :restypmod -1 :resname Name :reskey 0 :reskeyop 0
:ressortgroupref 1 :resjunk false } :expr { VAR :varno 1 :varattno 1
:vartype 19 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 1}} {
TARGETENTRY :resdom { RESDOM :resno 2 :restype 25 :restypmod -1 :resname
Type :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr {
CASE :casetype 25 :arg <> :args ({ WHEN { EXPR :typeOid 16  :opType op
:oper { OPER :opno 92 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1
:varattno 12 :vartype 18 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 12} { CONST :consttype 18 :constlen 1 :constbyval true
:constisnull false :constvalue  1 [ 0 0 0 114 ] })} :then { CONST
:consttype 25 :constlen -1 :constbyval false :constisnull false
:constvalue  9 [ 0 0 0 9 116 97 98 108 101 ] }} { WHEN { EXPR :typeOid
16  :opType op :oper { OPER :opno 92 :opid 0 :opresulttype 16 } :args ({
VAR :varno 1 :varattno 12 :vartype 18 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 12} { CONST :consttype 18 :constlen 1 :constbyval
true :constisnull false :constvalue  1 [ 0 0 0 118 ] })} :then { CONST
:consttype 25 :constlen -1 :constbyval false :constisnull false
:constvalue  8 [ 0 0 0 8 118 105 101 119 ] }} { WHEN { EXPR :typeOid 16
:opType op :oper { OPER :opno 92 :opid 0 :opresulttype 16 } :args ({ VAR
:varno 1 :varattno 12 :vartype 18 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 12} { CONST :consttype 18 :constlen 1 :constbyval
true :constisnull false :constvalue  1 [ 0 0 0 105 ] })} :then { CONST
:consttype 25 :constlen -1 :constbyval false :constisnull false
:constvalue  9 [ 0 0 0 9 105 110 100 101 120 ] }} { WHEN { EXPR :typeOid
16  :opType op :oper { OPER :opno 92 :opid 0 :opresulttype 16 } :args ({
VAR :varno 1 :varattno 12 :vartype 18 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno 12} { CONST :consttype 18 :constlen 1 :constbyval
true :constisnull false :constvalue  1 [ 0 0 0 83 ] })} :then { CONST
:consttype 25 :constlen -1 :constbyval false :constisnull false
:constvalue  12 [ 0 0 0 12 115 101 113 117 101 110 99 101 ] }} { WHEN {
EXPR :typeOid 16  :opType op :oper { OPER :opno 92 :opid 0 :opresulttype
16 } :args ({ VAR :varno 1 :varattno 12 :vartype 18 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 12} { CONST :consttype 18
:constlen 1 :constbyval true :constisnull false :constvalue  1 [ 0 0 0
115 ] })} :then { CONST :consttype 25 :constlen -1 :constbyval false
:constisnull false :constvalue  11 [ 0 0 0 11 115 112 101 99 105 97 108
] }}) :defresult { CONST :consttype 25 :constlen -1 :constbyval false
:constisnull true :constvalue <>}}} { TARGETENTRY :resdom { RESDOM
:resno 3 :restype 19 :restypmod -1 :resname Owner :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 2 :varattno 1
:vartype 19 :vartypmod -1  :varlevelsup 0 :varnoold 2 :varoattno 1}})
:groupClause <> :havingQual <> :distinctClause <> :sortClause ({
SORTCLAUSE :tleSortGroupRef 1 :sortop 660 }) :limitOffset <> :limitCount
<> :setOperations <> :resultRelations ()}
ERROR:  AllocSetFree: cannot find block containing chunk 48a6d8
DEBUG:  AbortCurrentTransaction
DEBUG:  BackendStartup: forked pid=9789 socket=9
postmaster child[9789]: starting with (postgres -d3 -v131072 -p
belbonedb_v2 )
DEBUG:  InitPostgres
DEBUG:  StartTransactionCommand
DEBUG:  query: set DateStyle to 'ISO'
DEBUG:  parse tree: { QUERY :command 5  :utility ?  :resultRelation 0
:into <> :isPortal false :isBinary false :isTemp false :hasAggs false
:hasSubLinks false :rtable <> :jointree <> :rowMarks () :targetList <>
:groupClause <> :havingQual <> :distinctClause <> :sortClause <>
:limitOffset <> :limitCount <> :setOperations <> :resultRelations ()}
DEBUG:  ProcessUtility: set DateStyle to 'ISO'
DEBUG:  CommitTransactionCommand
DEBUG:  StartTransactionCommand
DEBUG:  query: set geqo to 'OFF'
DEBUG:  parse tree: { QUERY :command 5  :utility ?  :resultRelation 0
:into <> :isPortal false :isBinary false :isTemp false :hasAggs false
:hasSubLinks false :rtable <> :jointree <> :rowMarks () :targetList <>
:groupClause <> :havingQual <> :distinctClause <> :sortClause <>
:limitOffset <> :limitCount <> :setOperations <> :resultRelations ()}
DEBUG:  ProcessUtility: set geqo to 'OFF'
DEBUG:  CommitTransactionCommand
DEBUG:  StartTransactionCommand
DEBUG:  query: set ksqo to 'ON'
DEBUG:  parse tree: { QUERY :command 5  :utility ?  :resultRelation 0
:into <> :isPortal false :isBinary false :isTemp false :hasAggs false
:hasSubLinks false :rtable <> :jointree <> :rowMarks () :targetList <>
:groupClause <> :havingQual <> :distinctClause <> :sortClause <>
:limitOffset <> :limitCount <> :setOperations <> :resultRelations ()}
DEBUG:  ProcessUtility: set ksqo to 'ON'
DEBUG:  CommitTransactionCommand
DEBUG:  StartTransactionCommand
DEBUG:  query: select oid from pg_type where typname='lo'
DEBUG:  parse tree: { QUERY :command 1  :utility <> :resultRelation 0
:into <> :isPortal false :isBinary false :isTemp false :hasAggs false
:hasSubLinks false :rtable ({ RTE :relname pg_type :relid 1247
:subquery <> :alias <> :eref { ATTR :relname pg_type :attrs (
"typname"   "typowner"   "typlen"   "typprtlen"   "typbyval"
"typtype"   "typisdefined"   "typdelim"   "typrelid"   "typelem"
"typinput"   "typoutput"   "typreceive"   "typsend"   "typalign"
"typstorage"   "typdefault" )} :inh true :inFromCl true :checkForRead
true :checkForWrite false :checkAsUser 0}) :jointree { FROMEXPR
:fromlist ({ RANGETBLREF 1 }) :quals { EXPR :typeOid 16  :opType op
:oper { OPER :opno 93 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1
:varattno 1 :vartype 19 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 1} { CONST :consttype 19 :constlen 32 :constbyval false
:constisnull false :constvalue  32 [ 108 111 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] })}} :rowMarks () :targetList ({
TARGETENTRY :resdom { RESDOM :resno 1 :restype 26 :restypmod -1 :resname
oid :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr {
VAR :varno 1 :varattno -2 :vartype 26 :vartypmod -1  :varlevelsup 0
:varnoold 1 :varoattno -2}}) :groupClause <> :havingQual <>
:distinctClause <> :sortClause <> :limitOffset <> :limitCount <>
:setOperations <> :resultRelations ()}
DEBUG:  ProcessQuery
DEBUG:  CommitTransactionCommand
DEBUG:  StartTransactionCommand
DEBUG:  query: select version()
DEBUG:  parse tree: { QUERY :command 1  :utility <> :resultRelation 0
:into <> :isPortal false :isBinary false :isTemp false :hasAggs false
:hasSubLinks false :rtable <> :jointree { FROMEXPR :fromlist <> :quals
<>} :rowMarks () :targetList ({ TARGETENTRY :resdom { RESDOM :resno 1
:restype 25 :restypmod -1 :resname version :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { EXPR :typeOid 25  :opType
func :oper { FUNC :funcid 89 :functype 25 } :args <>}}) :groupClause <>
:havingQual <> :distinctClause <> :sortClause <> :limitOffset <>
:limitCount <> :setOperations <> :resultRelations ()}
DEBUG:  ProcessQuery
DEBUG:  CommitTransactionCommand
DEBUG:  StartTransactionCommand
DEBUG:  query: select pg_client_encoding()
DEBUG:  parse tree: { QUERY :command 1  :utility <> :resultRelation 0
:into <> :isPortal false :isBinary false :isTemp false :hasAggs false
:hasSubLinks false :rtable <> :jointree { FROMEXPR :fromlist <> :quals
<>} :rowMarks () :targetList ({ TARGETENTRY :resdom { RESDOM :resno 1
:restype 19 :restypmod -1 :resname pg_client_encoding :reskey 0
:reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { EXPR :typeOid
19  :opType func :oper { FUNC :funcid 810 :functype 19 } :args <>}})
:groupClause <> :havingQual <> :distinctClause <> :sortClause <>
:limitOffset <> :limitCount <> :setOperations <> :resultRelations ()}
DEBUG:  ProcessQuery
DEBUG:  CommitTransactionCommand
DEBUG:  StartTransactionCommand
DEBUG:  query: SELECT Config, nValue FROM MSysConf
ERROR:  Relation "msysconf" does not exist
DEBUG:  AbortCurrentTransaction
DEBUG:  StartTransactionCommand





Re: Still big problems with pg_dump!

От
Andrew Sullivan
Дата:
-hackers removed.

On Tue, Sep 17, 2002 at 10:11:41AM +0200, Wim wrote:

> ERROR:  AllocSetFree: cannot find block containing chunk 4c5ad0

This is definitely some sort of disk problem.  Either you've written
bad data to the disk in some way, or else the disk is corrupted or
damaged.

If it is a hardware problem, the obvious suspects are memory (I'd
discount this idea unless everything else doesn't check out), a disk
failure, or a controller failure.

It could be OS related as well.  Several of the 2.4 Linux kernel
series, for instance, had roblems with massive filesystem corruption.

> Some people suggest a drive failure, but I checked that and found no
> problems...

How did you check?

> I must say that one of the table contains more than 3.000.000 rows,
> another more than 1.400.000...

When is your most recent backup?  If you can't pg_dump, you will be
needing that backup.

> I must say that I had this problem a few months before, I got some help
> then, but that couldn't solve my problem,
> I recreated the database from scratch and copied the data, to fix thing
> quickly. Thing went well for about two months :-(

So you re-installed the data set on a machine that had somehow
failed, you don't know why, and hoped that the problem would
solve itself?  Uh, that wasn't a good idea.  In the future, if you
have a problem which people suggest might be, for instance, a bad
disk, it'd be a _very good_ idea to figure out precisely what the
problem is before relying on the identical hardware again.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Still big problems with pg_dump!

От
Wim
Дата:

Andrew Sullivan wrote:

>-hackers removed.
>
>On Tue, Sep 17, 2002 at 10:11:41AM +0200, Wim wrote:
>
>
>
>>ERROR:  AllocSetFree: cannot find block containing chunk 4c5ad0
>>
>>
>
>This is definitely some sort of disk problem.  Either you've written
>bad data to the disk in some way, or else the disk is corrupted or
>damaged.
>
>If it is a hardware problem, the obvious suspects are memory (I'd
>discount this idea unless everything else doesn't check out), a disk
>failure, or a controller failure.
>
>It could be OS related as well.  Several of the 2.4 Linux kernel
>series, for instance, had roblems with massive filesystem corruption.
>
>
Postgres is running on solaris 8...
It is the same database as previous time that has the problem, but not
the same table.
I think it's an error is the system tables.

>
>
>>Some people suggest a drive failure, but I checked that and found no
>>problems...
>>
>>
>
>How did you check?
>
>
with fsck.

>
>
>>I must say that one of the table contains more than 3.000.000 rows,
>>another more than 1.400.000...
>>
>>
>
>When is your most recent backup?  If you can't pg_dump, you will be
>needing that backup.
>
>
Have backup... I can still SQL COPY to a text file, so that's no problem
so far.

>
>
>>I must say that I had this problem a few months before, I got some help
>>then, but that couldn't solve my problem,
>>I recreated the database from scratch and copied the data, to fix thing
>>quickly. Thing went well for about two months :-(
>>
>>
>
>So you re-installed the data set on a machine that had somehow
>failed, you don't know why, and hoped that the problem would
>solve itself?  Uh, that wasn't a good idea.  In the future, if you
>have a problem which people suggest might be, for instance, a bad
>disk, it'd be a _very good_ idea to figure out precisely what the
>problem is before relying on the identical hardware again.
>
>A
>
>
>
I know, I don't have much spare hardware, and the database had to work
quickly, it was the
only solution then.
Checked the disk, reinstalled the OS and still waiting for a CPU and
memory upgrade.



Re: Still big problems with pg_dump!

От
Andrew Sullivan
Дата:
On Tue, Sep 17, 2002 at 04:25:45PM +0200, Wim wrote:
> >
> >>ERROR:  AllocSetFree: cannot find block containing chunk 4c5ad0
> >>
> >>
> >
> >This is definitely some sort of disk problem.  Either you've written
> >bad data to the disk in some way, or else the disk is corrupted or
> >damaged.
> >
> Postgres is running on solaris 8...
> It is the same database as previous time that has the problem, but not
> the same table.

Someone else suggested that this would not be the error when you have
written bad data to the disk (I thought you could have this if the
controller was flakey and wrote bad data in the past.  Maybe I'm
wrong.  Probably).

> >How did you check?
> >
> >
> with fsck.

That won't help you if the controller is coming and going; you might
find that it works one time, and not another.  Indeed, a disk on its
way out can even pass fsck sometimes, although it's pretty unusual.

> Have backup... I can still SQL COPY to a text file, so that's no problem
> so far.

Well, that's good.  I'd suggest backing up _really often_ until you
know what the problem is, especially since this is production.

> I know, I don't have much spare hardware, and the database had to work
> quickly, it was the
> only solution then.
> Checked the disk, reinstalled the OS and still waiting for a CPU and
> memory upgrade.

Do you have another place to store the database in the meantime -- an
Intel box with a cheap disk, or anything?  At least you'd have
another copy of the database somewhere that way.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Still big problems with pg_dump!

От
"Shridhar Daithankar"
Дата:
On 17 Sep 2002 at 10:33, Andrew Sullivan wrote:

> On Tue, Sep 17, 2002 at 04:25:45PM +0200, Wim wrote:
> > I know, I don't have much spare hardware, and the database had to work
> > quickly, it was the
> > only solution then.
> > Checked the disk, reinstalled the OS and still waiting for a CPU and
> > memory upgrade.
>
> Do you have another place to store the database in the meantime -- an
> Intel box with a cheap disk, or anything?  At least you'd have
> another copy of the database somewhere that way.

Yeah. NFS or SMB mounted database would work.. albeit slowly.. Hopefully it's
not your root disk..

Bye
 Shridhar

--
Davis's Dictum:    Problems that go away by themselves, come back by themselves.


Re: Still big problems with pg_dump!

От
Wim
Дата:

Andrew Sullivan wrote:

<snip>...

>>>How did you check?
>>>
>>>
>>>
>>>
>>with fsck.
>>
>>
>
>That won't help you if the controller is coming and going; you might
>find that it works one time, and not another.  Indeed, a disk on its
>way out can even pass fsck sometimes, although it's pretty unusual.
>
The DB is located on a RAID5 disk array...

>
>
>>Have backup... I can still SQL COPY to a text file, so that's no problem
>>so far.
>>
>>
>
>Well, that's good.  I'd suggest backing up _really often_ until you
>know what the problem is, especially since this is production.
>
>
>
>>I know, I don't have much spare hardware, and the database had to work
>>quickly, it was the
>>only solution then.
>>Checked the disk, reinstalled the OS and still waiting for a CPU and
>>memory upgrade.
>>
>>
>
>Do you have another place to store the database in the meantime -- an
>Intel box with a cheap disk, or anything?  At least you'd have
>another copy of the database somewhere that way.
>
>A
>
>
>
Don't have a disk that can store my database...


Still searching....


Cheers!

Wim


Re: Still big problems with pg_dump!

От
Andrew Sullivan
Дата:
On Tue, Sep 17, 2002 at 04:46:00PM +0200, Wim wrote:
> >
> >That won't help you if the controller is coming and going; you might
> >find that it works one time, and not another.  Indeed, a disk on its
> >way out can even pass fsck sometimes, although it's pretty unusual.
> >
> The DB is located on a RAID5 disk array...

Hmm.  _That's_ interesting.  I'd bet on a flakey controller, then.
Is it hardware RAID?  (I assume so.)

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: [ADMIN] Still big problems with pg_dump!

От
Tom Lane
Дата:
Andrew Sullivan <andrew@libertyrms.info> writes:
> On Tue, Sep 17, 2002 at 04:25:45PM +0200, Wim wrote:
>> ERROR:  AllocSetFree: cannot find block containing chunk 4c5ad0
>>
>> Postgres is running on solaris 8...

> Someone else suggested that this would not be the error when you have
> written bad data to the disk (I thought you could have this if the
> controller was flakey and wrote bad data in the past.  Maybe I'm
> wrong.  Probably).

Actually, what it looks like to me is a memory clobber; I don't think
bad data on disk would be likely to lead to this particular type of
failure.  But writing one byte too many into a string, and thereby
zeroing the high-order byte of an adjacent pointer, could lead to
exactly this message when we later try to pfree() the pointer.

I am wondering if Wim is running into that same Solaris snprintf() bug
that we discovered awhile back --- it was not clear if the bug still
exists in Solaris 8, but the symptoms sure match.  See
http://archives.postgresql.org/pgsql-bugs/2002-07/msg00059.php

It would be useful to see a stack traceback from the point of the error,
if possible.

            regards, tom lane

Re: [ADMIN] Still big problems with pg_dump!

От
Wim
Дата:

Andrew Sullivan wrote:

>On Tue, Sep 17, 2002 at 04:46:00PM +0200, Wim wrote:
>
>
>>>That won't help you if the controller is coming and going; you might
>>>find that it works one time, and not another.  Indeed, a disk on its
>>>way out can even pass fsck sometimes, although it's pretty unusual.
>>>
>>>
>>>
>>The DB is located on a RAID5 disk array...
>>
>>
>
>Hmm.  _That's_ interesting.  I'd bet on a flakey controller, then.
>Is it hardware RAID?  (I assume so.)
>
>A
>
>
>
Yep, hardware RAID, infact it's a SUN T3 disk array  with 9*36GB SCSI
disks...

Cheers!

Wim


Re: [ADMIN] Still big problems with pg_dump!

От
Andrew Sullivan
Дата:
On Tue, Sep 17, 2002 at 10:55:18AM -0400, Tom Lane wrote:
>
> I am wondering if Wim is running into that same Solaris snprintf() bug
> that we discovered awhile back --- it was not clear if the bug still
> exists in Solaris 8, but the symptoms sure match.  See
> http://archives.postgresql.org/pgsql-bugs/2002-07/msg00059.php

Hmm, good point.  That was only a problem when compiled with the
64-bit libraries, IIRC.  Wim, what does 'file postmaster' say?

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: [ADMIN] Still big problems with pg_dump!

От
Wim
Дата:

Tom Lane wrote:

>Andrew Sullivan <andrew@libertyrms.info> writes:
>
>
>>On Tue, Sep 17, 2002 at 04:25:45PM +0200, Wim wrote:
>>
>>
>>>ERROR:  AllocSetFree: cannot find block containing chunk 4c5ad0
>>>
>>>Postgres is running on solaris 8...
>>>
>>>
>
>
>
>>Someone else suggested that this would not be the error when you have
>>written bad data to the disk (I thought you could have this if the
>>controller was flakey and wrote bad data in the past.  Maybe I'm
>>wrong.  Probably).
>>
>>
>
>Actually, what it looks like to me is a memory clobber; I don't think
>bad data on disk would be likely to lead to this particular type of
>failure.  But writing one byte too many into a string, and thereby
>zeroing the high-order byte of an adjacent pointer, could lead to
>exactly this message when we later try to pfree() the pointer.
>
>I am wondering if Wim is running into that same Solaris snprintf() bug
>that we discovered awhile back --- it was not clear if the bug still
>exists in Solaris 8, but the symptoms sure match.  See
>http://archives.postgresql.org/pgsql-bugs/2002-07/msg00059.php
>
>It would be useful to see a stack traceback from the point of the error,
>if possible.
>
>            regards, tom lane
>
I Would like to send a stack traceback, but I need some halp on this
(never done this before).

some add. info:

SELECT relname FROM pg_class WHERE relname like 'pg_%' AND relkind = 'r';
gives:
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

SELECT relname FROM pg_class;
works well...

SELECT relname, relkind from pg_class WHERE relkind='r';
works also...

SELECT relname, relkind from pg_class WHERE relname like 'pg_%';
produces the same error as above...

>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/users-lounge/docs/faq.html
>
>
>
>
Cheers!

Wim


Re: [ADMIN] Still big problems with pg_dump!

От
Tom Lane
Дата:
Wim <wdh@belbone.be> writes:
> Tom Lane wrote:
>> It would be useful to see a stack traceback from the point of the error,
>> if possible.

> I Would like to send a stack traceback, but I need some halp on this
> (never done this before).

> some add. info:

> SELECT relname FROM pg_class WHERE relname like 'pg_%' AND relkind = 'r';
> gives:
> server closed the connection unexpectedly

This should be producing a core file in your database directory
($PGDATA/base/yourdboid/).  With gdb you'd do
    gdb /path/to/postgres-executable /path/to/corefile
    gdb> bt
    gdb> quit
I don't remember the equivalent incantations with Solaris' debugger.

            regards, tom lane

Re: [ADMIN] Still big problems with pg_dump!

От
Andrew Sullivan
Дата:
On Tue, Sep 17, 2002 at 05:04:23PM +0200, Wim wrote:
>
> SELECT relname FROM pg_class WHERE relname like 'pg_%' AND relkind = 'r';
> gives:
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> SELECT relname, relkind from pg_class WHERE relname like 'pg_%';
> produces the same error as above...

That does rather suggest a memory clobber, as Tom suggested.  Wim's
'file postmaster' shows it's a 32-bit binary, though, and I verified
in my notes that the snprintf bug was only in the 64-bit library.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: [ADMIN] Still big problems with pg_dump!

От
Andrew Sullivan
Дата:
On Tue, Sep 17, 2002 at 11:08:46AM -0400, Tom Lane wrote:

> This should be producing a core file in your database directory
> ($PGDATA/base/yourdboid/).  With gdb you'd do
>     gdb /path/to/postgres-executable /path/to/corefile
>     gdb> bt
>     gdb> quit
> I don't remember the equivalent incantations with Solaris' debugger.

I think it's

adb /path/to/postgres-executable /path/to/corefile
$c

[or]

$C

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: [ADMIN] Still big problems with pg_dump!

От
Wim
Дата:
gdb gives me this:

bash-2.05$ adb /usr/local/pgsql/bin/postgres
/data/postgres/base/17903709/core
core file = /data/postgres/base/17903709/core -- program
``/usr/local/pgsql/bin/postgres'' on platform SUNW,Ultra-60
SIGBUS: Bus Error
$c
AllocSetAlloc+0x18c(476120, 3, 226018, 0, 0, 13)
MemoryContextAlloc+0x68(476120, 3, 70670000, 7efefeff, 81010100, ff00)
MemoryContextStrdup+0x28(476120, 4c8568, ffffffff, fffffff8, 0, ffbfe541)
make_greater_string+0x1c(4c8568, 13, 297, 4c8730, 0, ffbfe5d1)
prefix_selectivity+0xcc(4c0780, 4c7ab8, 4c8568, ffbfe670, ffbfe68b,
ffbfe6a0)
patternsel+0x278(ffbfe7a0, 0, 476120, 0, 0, ffbfe728)
likesel+0x10(ffbfe7a0, ffbfe7a0, 1d7080, fffffff8, 0, ffbfe809)
OidFunctionCall4+0x124(71b, 4c0780, 4b7, 4c7b90, 1, ffbfe7e1)
restriction_selectivity+0x64(4c0780, 4b7, 4c7b90, 1, 0, ffbfe8a0)
clauselist_selectivity+0x164(4c0780, 4c8478, 1, 0, 0, 4c8295)
restrictlist_selectivity+0x2c(4c0780, 4c7c18, 1, 0, 0, ff0000)
set_baserel_size_estimates+0x2c(4c0780, 4c7d70, ffffffff, fffffff8, 0,
4c83f9)
set_plain_rel_pathlist+0x18(4c0780, 4c7d70, 4c0808, 53, 4c0348, 20)
set_base_rel_pathlists+0xf8(4c0780, 4c8460, 1, 0, 0, 4c7c00)
make_one_rel+0xc(4c0780, 0, ffbfecb8, ffbfecb0, 0, 0)
subplanner+0x148(4c0780, 4c7cb8, 0, 0, 0, 0)
query_planner+0x98(4c0780, 4c7a48, 0, 0, 0, 0)
grouping_planner+0x7cc(4c0780, bff00000, 0, ff13a000, 0, 0)
subquery_planner+0x260(4c0780, bff00000, 0, 7efefeff, 81010100, ff0000)
planner+0x54(4c0780, 4c1e00, 4c15e0, fffffff8, 0, ffbfffd5)
pg_plan_query+0x54(4c0780, 29b99c, 0, 53, 4c0348, 20)
pg_exec_query_string+0x388(4c0348, 2, 476010, 4c0330, 800000, 0)
PostgresMain+0x1398(5, ffbff2d0, 40c1d9, 473, 0, ffbff1c8)
DoBackend+0x7d8(40c0a8, 1, 22730, 1552dc, 0, 40c2d9)
BackendStartup+0xb0(40c0a8, 5, ffbff800, ffbff5d8, ffbff658, 0)
ServerLoop+0x370(297024, 49a0, 0, 3f1f88, 297004, 2d560000)
PostmasterMain+0xbe4(5, 3f2980, 65720000, 0, 65720000, 65720000)
main+0x294(5, ffbffd8c, ffbffda4, 3e64c0, 0, 0)
_start+0x5c(0, 0, 0, 0, 0, 0)
$C
ffbfe340 AllocSetAlloc+0x18c(476120, 3, 226018, 0, 0, 13)
ffbfe3e0 MemoryContextAlloc+0x68(476120, 3, 70670000, 7efefeff,
81010100, ff00)
ffbfe450 MemoryContextStrdup+0x28(476120, 4c8568, ffffffff, fffffff8, 0,
ffbfe541)
ffbfe4c8 make_greater_string+0x1c(4c8568, 13, 297, 4c8730, 0, ffbfe5d1)
ffbfe548 prefix_selectivity+0xcc(4c0780, 4c7ab8, 4c8568, ffbfe670,
ffbfe68b, ffbfe6a0)
ffbfe5d8 patternsel+0x278(ffbfe7a0, 0, 476120, 0, 0, ffbfe728)
ffbfe6b0 likesel+0x10(ffbfe7a0, ffbfe7a0, 1d7080, fffffff8, 0, ffbfe809)
ffbfe728 OidFunctionCall4+0x124(71b, 4c0780, 4b7, 4c7b90, 1, ffbfe7e1)
ffbfe828 restriction_selectivity+0x64(4c0780, 4b7, 4c7b90, 1, 0, ffbfe8a0)
ffbfe8b0 clauselist_selectivity+0x164(4c0780, 4c8478, 1, 0, 0, 4c8295)
ffbfe958 restrictlist_selectivity+0x2c(4c0780, 4c7c18, 1, 0, 0, ff0000)
ffbfe9d8 set_baserel_size_estimates+0x2c(4c0780, 4c7d70, ffffffff,
fffffff8, 0, 4c83f9)
ffbfea48 set_plain_rel_pathlist+0x18(4c0780, 4c7d70, 4c0808, 53, 4c0348, 20)
ffbfeab8 set_base_rel_pathlists+0xf8(4c0780, 4c8460, 1, 0, 0, 4c7c00)
ffbfeb40 make_one_rel+0xc(4c0780, 0, ffbfecb8, ffbfecb0, 0, 0)
ffbfebb8 subplanner+0x148(4c0780, 4c7cb8, 0, 0, 0, 0)
ffbfec70 query_planner+0x98(4c0780, 4c7a48, 0, 0, 0, 0)
ffbfecf8 grouping_planner+0x7cc(4c0780, bff00000, 0, ff13a000, 0, 0)
ffbfedc8 subquery_planner+0x260(4c0780, bff00000, 0, 7efefeff, 81010100,
ff0000)
ffbfee58 planner+0x54(4c0780, 4c1e00, 4c15e0, fffffff8, 0, ffbfffd5)
ffbfeed8 pg_plan_query+0x54(4c0780, 29b99c, 0, 53, 4c0348, 20)
ffbfef50 pg_exec_query_string+0x388(4c0348, 2, 476010, 4c0330, 800000, 0)
ffbff038 PostgresMain+0x1398(5, ffbff2d0, 40c1d9, 473, 0, ffbff1c8)
ffbff0f8 DoBackend+0x7d8(40c0a8, 1, 22730, 1552dc, 0, 40c2d9)
ffbff4e8 BackendStartup+0xb0(40c0a8, 5, ffbff800, ffbff5d8, ffbff658, 0)
ffbff568 ServerLoop+0x370(297024, 49a0, 0, 3f1f88, 297004, 2d560000)
ffbff810 PostmasterMain+0xbe4(5, 3f2980, 65720000, 0, 65720000, 65720000)
ffbffca0 main+0x294(5, ffbffd8c, ffbffda4, 3e64c0, 0, 0)
ffbffd28 _start+0x5c(0, 0, 0, 0, 0, 0)



Andrew Sullivan wrote:

>On Tue, Sep 17, 2002 at 11:08:46AM -0400, Tom Lane wrote:
>
>
>>This should be producing a core file in your database directory
>>($PGDATA/base/yourdboid/).  With gdb you'd do
>>    gdb /path/to/postgres-executable /path/to/corefile
>>    gdb> bt
>>    gdb> quit
>>I don't remember the equivalent incantations with Solaris' debugger.
>>
>
>I think it's
>
>adb /path/to/postgres-executable /path/to/corefile
>$c
>
>[or]
>
>$C
>
>A
>
>



Re: [GENERAL] Still big problems with pg_dump!

От
Tom Lane
Дата:
Wim <wdh@belbone.be> writes:
> gdb gives me this:
> bash-2.05$ adb /usr/local/pgsql/bin/postgres
> /data/postgres/base/17903709/core
> core file = /data/postgres/base/17903709/core -- program
> ``/usr/local/pgsql/bin/postgres'' on platform SUNW,Ultra-60
> SIGBUS: Bus Error
> $c
> AllocSetAlloc+0x18c(476120, 3, 226018, 0, 0, 13)
> MemoryContextAlloc+0x68(476120, 3, 70670000, 7efefeff, 81010100, ff00)
> MemoryContextStrdup+0x28(476120, 4c8568, ffffffff, fffffff8, 0, ffbfe541)
> make_greater_string+0x1c(4c8568, 13, 297, 4c8730, 0, ffbfe5d1)
> prefix_selectivity+0xcc(4c0780, 4c7ab8, 4c8568, ffbfe670, ffbfe68b,
> ffbfe6a0)
> patternsel+0x278(ffbfe7a0, 0, 476120, 0, 0, ffbfe728)
> likesel+0x10(ffbfe7a0, ffbfe7a0, 1d7080, fffffff8, 0, ffbfe809)

Hm.  Are you running in a multibyte character encoding?  I had a note
that make_greater_string may have problems in the MULTIBYTE case.

            regards, tom lane

Re: [ADMIN] Still big problems with pg_dump!

От
Wim
Дата:

Tom Lane wrote:

>Wim <wdh@belbone.be> writes:
>
>
>>gdb gives me this:
>>bash-2.05$ adb /usr/local/pgsql/bin/postgres
>>/data/postgres/base/17903709/core
>>core file = /data/postgres/base/17903709/core -- program
>>``/usr/local/pgsql/bin/postgres'' on platform SUNW,Ultra-60
>>SIGBUS: Bus Error
>>$c
>>AllocSetAlloc+0x18c(476120, 3, 226018, 0, 0, 13)
>>MemoryContextAlloc+0x68(476120, 3, 70670000, 7efefeff, 81010100, ff00)
>>MemoryContextStrdup+0x28(476120, 4c8568, ffffffff, fffffff8, 0, ffbfe541)
>>make_greater_string+0x1c(4c8568, 13, 297, 4c8730, 0, ffbfe5d1)
>>prefix_selectivity+0xcc(4c0780, 4c7ab8, 4c8568, ffbfe670, ffbfe68b,
>>ffbfe6a0)
>>patternsel+0x278(ffbfe7a0, 0, 476120, 0, 0, ffbfe728)
>>likesel+0x10(ffbfe7a0, ffbfe7a0, 1d7080, fffffff8, 0, ffbfe809)
>>
>>
>
>Hm.  Are you running in a multibyte character encoding?  I had a note
>that make_greater_string may have problems in the MULTIBYTE case.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>
>
>
Yes, I compiled postgres with multibyte and ODBC support . Is there a
workaround possibel to fix my problem?

Cheers!

Wim


Re: [ADMIN] Still big problems with pg_dump!

От
Tom Lane
Дата:
Wim <wdh@belbone.be> writes:
>> Hm.  Are you running in a multibyte character encoding?  I had a note
>> that make_greater_string may have problems in the MULTIBYTE case.

> Yes, I compiled postgres with multibyte and ODBC support .

But are you actually *using* the multibyte code?  What does "psql -l"
show as the encoding for your database?

            regards, tom lane

Re: [ADMIN] Still big problems with pg_dump!

От
Wim
Дата:

Tom Lane wrote:

>Wim <wdh@belbone.be> writes:
>
>>>Hm.  Are you running in a multibyte character encoding?  I had a note
>>>that make_greater_string may have problems in the MULTIBYTE case.
>>>
>
>>Yes, I compiled postgres with multibyte and ODBC support .
>>
>
>But are you actually *using* the multibyte code?  What does "psql -l"
>show as the encoding for your database?
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/users-lounge/docs/faq.html
>
>
>
psql -l shows:

          List of databases
     Name      |  Owner   | Encoding
---------------+----------+-----------
 addressIP     | postgres | SQL_ASCII
 belbonedb_v2  | postgres | SQL_ASCII
 belbonedb_v21 | postgres | SQL_ASCII
 peering       | postgres | SQL_ASCII
 peering_v2    | postgres | SQL_ASCII
 postgres      | postgres | SQL_ASCII
 smsbilling    | postgres | SQL_ASCII
 template0     | postgres | SQL_ASCII
 template1     | postgres | SQL_ASCII
(9 rows)

Maybe I should recompile Postgres without multibyte support...


Cheers!

Wim