Обсуждение: 8.3.5 broken after power fail

Поиск
Список
Период
Сортировка

8.3.5 broken after power fail

От
Michael Monnerie
Дата:
Now I'm sure that the following configuration can destroy your database:

* Linux host with XEN, XFS filesystem with "nobarrier", RAID controller
with battery backed cache.
* XEN vm with XFS filesystem with "nobarrier" with postgresql
* your daughter with 3.5 years switching off the power supply of the
server
I guess one shouldn't use "nobarrier" on a XEN XFS domU machine.

Now, can somebody help me fixing this error? I did "vacuum analyze
verbose" and it stopped here:

INFO:  vacuuming "pg_toast.pg_toast_1281127"
ERROR:  could not access status of transaction 2299723776
DETAIL:  Could not open file "pg_clog/0891": No such file or directory.

What can I do?

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4




Вложения

Re: 8.3.5 broken after power fail

От
Achilleas Mantzios
Дата:
Στις Tuesday 17 February 2009 10:54:52 ο/η Michael Monnerie έγραψε:

> * your daughter with 3.5 years switching off the power supply of the
> server
1st line of defense is to set your system to shutdown normally when the power button is pressed.
2nd line of defense is to get your self a decent UPS unit

My daughter does this all the time on our family FreeBSD box. No probs.

Also at work at more than 20 tanker vessels running 7.4.2, the captains do that on a constant
basis and PgSQL always has survived (more than the rest of the system anyways..)

> What can I do?
>
> mfg zmi



--
Achilleas Mantzios

Re: 8.3.5 broken after power fail

От
Scott Marlowe
Дата:
2009/2/17 Achilleas Mantzios <achill@matrix.gatewaynet.com>:
> Στις Tuesday 17 February 2009 10:54:52 ο/η Michael Monnerie έγραψε:
>
>> * your daughter with 3.5 years switching off the power supply of the
>> server
> 1st line of defense is to set your system to shutdown normally when the power button is pressed.
> 2nd line of defense is to get your self a decent UPS unit
>
> My daughter does this all the time on our family FreeBSD box. No probs.
>
> Also at work at more than 20 tanker vessels running 7.4.2, the captains do that on a constant
> basis and PgSQL always has survived (more than the rest of the system anyways..)

Those are all good to have.  But no UPS is a replacement for hard
drives / RAID controllers / file systems that don't lie about fsync.

Nothing makes your database shine like being the only one in the
hosting center that survives sudden catastrophic power failure.

>> What can I do?

tether your daughter to the other side of the room?  I'm not sure
which parts of those mount options are dangerous or not.  I use ext3
stock with noatime.  And a battery backed RAID.  Smaller slower work
group / station controllers (i.e. 5 year old server conrollers) go for
pretty cheap and give pretty good performance with 2 or 4 drives.

AS for fixing it, I believe the answer involves creating a clog file
full of zeros 16Meg or so, and pg_reset_xlog.  Don't count on all your
data being there or all your FK-PK type relationships to be correct,
Immediately dump it, initdb and reload your data, fixing it as you go.

Re: 8.3.5 broken after power fail

От
Michael Monnerie
Дата:
> 1st line of defense is to set your system to shutdown normally when
> the power button is pressed. 2nd line of defense is to get your self
> a decent UPS unit

I have both. She managed to switch off behind the UPS :-)

> Also at work at more than 20 tanker vessels running 7.4.2, the
> captains do that on a constant basis and PgSQL always has survived
> (more than the rest of the system anyways..)

Even with a XEN vm running PostgreSQL? Both on XFS with nobarrier mount
option? Try it, it's fun ;-)

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


Вложения

Re: 8.3.5 broken after power fail

От
Michael Monnerie
Дата:
>  tether your daughter to the other side of the room? I'm not sure
>  which parts of those mount options are dangerous or not. I use ext3
>  stock with noatime. And a battery backed RAID. Smaller slower work
>  group / station controllers (i.e. 5 year old server conrollers) go
> for pretty cheap and give pretty good performance with 2 or 4 drives.

I know the problem is with XFS and mount option "nobarrier" within a XEN
vm. Just last week I discussed with the XFS devs if that would be save
to use with a battery backed RAID. They said "depends on the
hypervisor". So now I proofed (not that I wanted!) that within XEN, XFS
should always have "barrier" on.

>  AS for fixing it, I believe the answer involves creating a clog file
>  full of zeros 16Meg or so, and pg_reset_xlog. Don't count on all
> your data being there or all your FK-PK type relationships to be
> correct, Immediately dump it, initdb and reload your data, fixing it
> as you go.

OK, that's a start. Trying... still an error:

pg_dump: Fehlermeldung vom Server: ERROR:  missing chunk number 0 for
toast value 1460201 in pg_toast_1281127
pg_dump: Die Anweisung war: COPY public.dbmail_messageblks
(messageblk_idnr, physmessage_id, messageblk, blocksize, is_header) TO
stdout;

What now?

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4



Re: 8.3.5 broken after power fail

От
Michael Monnerie
Дата:
"Michael Monnerie" <michael.monnerie@is.it-management.at> schrieb:
> pg_dump: Fehlermeldung vom Server: ERROR:  missing chunk number 0 for
> toast value 1460201 in pg_toast_1281127
> pg_dump: Die Anweisung war: COPY public.dbmail_messageblks
> (messageblk_idnr, physmessage_id, messageblk, blocksize, is_header) TO
> stdout;

The file is there:
# ls -l base/16386/1281127
-rw------- 1 postgres postgres 417447936 17. Feb 12:14 base/16386/1281127

What can I do about the missing chunk? I need the data in there...

mfg zmi



Re: 8.3.5 broken after power fail

От
Achilleas Mantzios
Дата:
Στις Tuesday 17 February 2009 15:53:33 ο/η Michael Monnerie έγραψε:
> "Michael Monnerie" <michael.monnerie@is.it-management.at> schrieb:
> > pg_dump: Fehlermeldung vom Server: ERROR:  missing chunk number 0 for
> > toast value 1460201 in pg_toast_1281127
> > pg_dump: Die Anweisung war: COPY public.dbmail_messageblks
> > (messageblk_idnr, physmessage_id, messageblk, blocksize, is_header) TO
> > stdout;
>
> The file is there:
> # ls -l base/16386/1281127
> -rw------- 1 postgres postgres 417447936 17. Feb 12:14 base/16386/1281127
>

did you reindex pg_toast_1281127?
take a look at
http://archives.free.net.ph/message/20080924.191644.d692f468.el.html

> What can I do about the missing chunk? I need the data in there...
>
> mfg zmi
>
>
>



--
Achilleas Mantzios

Re: 8.3.5 broken after power fail

От
Michael Monnerie
Дата:
"Achilleas Mantzios" <achill@matrix.gatewaynet.com> schrieb:
> did you reindex pg_toast_1281127?
> take a look at
> http://archives.free.net.ph/message/20080924.191644.d692f468.el.html

REINDEX TABLE pg_toast.pg_toast_1281127;

That runs now since 33 CPU minutes, using 100% with no visible I/O
happening. top says:
29687 postgres  20   0  605m 187m 3024 R  100 15.4  32:58.65 postmaster

And "strace -p 29687" displays nothing. Could it be the process hangs? It
shouldn't take that long I guess, but I'm not sure how to find out if it's
borked or really working... ideas?

mfg zmi



Re: 8.3.5 broken after power fail

От
Achilleas Mantzios
Дата:
Στις Tuesday 17 February 2009 17:11:41 ο/η Michael Monnerie έγραψε:
> "Achilleas Mantzios" <achill@matrix.gatewaynet.com> schrieb:
> > did you reindex pg_toast_1281127?
> > take a look at
> > http://archives.free.net.ph/message/20080924.191644.d692f468.el.html
>
> REINDEX TABLE pg_toast.pg_toast_1281127;
>
> That runs now since 33 CPU minutes, using 100% with no visible I/O
> happening. top says:
> 29687 postgres  20   0  605m 187m 3024 R  100 15.4  32:58.65 postmaster
>
> And "strace -p 29687" displays nothing. Could it be the process hangs? It
> shouldn't take that long I guess, but I'm not sure how to find out if it's
> borked or really working... ideas?

iostat systat vmstat report anything?

>
> mfg zmi
>
>
>



--
Achilleas Mantzios

Re: 8.3.5 broken after power fail

От
Michael Monnerie
Дата:
"Achilleas Mantzios" <achill@matrix.gatewaynet.com> schrieb:
>> That runs now since 33 CPU minutes, using 100% with no visible I/O
>> happening. top says:
>> 29687 postgres  20   0  605m 187m 3024 R  100 15.4  32:58.65 postmaster
>>
>> And "strace -p 29687" displays nothing. Could it be the process hangs? It
>> shouldn't take that long I guess, but I'm not sure how to find out if
it's
>> borked or really working... ideas?
>
> iostat systat vmstat report anything?

Sorry, I wasn't detailed enough. With "no visible I/O" I meant I looked with
iostat.

After 45 CPU minutes I tried to stop it - no chance.
Stop postgres - no chance. Wow, a really hard problem.
Had to reboot the VM.

Can I somehow dump that table without using an index?

Here's the info about \d dbmail_messageblks
                              Tabelle »public.dbmail_messageblks«
     Spalte      |   Typ    |                            Attribute
-----------------+----------+------------------------------------------------------------------
 messageblk_idnr | bigint   | not null default
nextval('dbmail_messageblk_idnr_seq'::regclass)
 physmessage_id  | bigint   |
 messageblk      | bytea    | not null
 blocksize       | bigint   | not null default (0)::bigint
 is_header       | smallint | not null default (0)::smallint
Indexe:
    »dbmail_messageblks_pkey« PRIMARY KEY, btree (messageblk_idnr)
    »dbmail_messageblks_1« UNIQUE, btree (physmessage_id, messageblk_idnr)
CLUSTER
    »dbmail_messageblks_physmessage_is_header_idx« btree (physmessage_id,
is_header)
Fremdschlüssel-Constraints:
    »dbmail_messageblks_physmessage_id_fkey« FOREIGN KEY (physmessage_id)
REFERENCES dbmail_physmessage(id) ON UPDATE CASCADE ON DELETE CASCADE

This works:
COPY public.dbmail_messageblks (messageblk_idnr, physmessage_id, blocksize,
is_header) TO stdout;
This does NOT work:
COPY public.dbmail_messageblks (messageblk_idnr, physmessage_id, messageblk,
blocksize, is_header) TO stdout;

So I'm sure the "messageblk" toast got toasted too much. Now I need a way to
get out all values that are readable and piss on the rest. I looked into the
output of the first and second COPY, it should be messageblk_idnr=3904492
that's broken. I tried
select * from dbmail_messageblks where messageblk_idnr not in (3904492);

That helps. Slowly I may be able to find all broken entries. Is there a more
automated way?

mfg zmi



Re: 8.3.5 broken after power fail

От
Michael Monnerie
Дата:
"Achilleas Mantzios" <achill@matrix.gatewaynet.com> schrieb:
>> That runs now since 33 CPU minutes, using 100% with no visible I/O
>> happening. top says:
>> 29687 postgres  20   0  605m 187m 3024 R  100 15.4  32:58.65 postmaster
>>
>> And "strace -p 29687" displays nothing. Could it be the process hangs? It
>> shouldn't take that long I guess, but I'm not sure how to find out if
it's
>> borked or really working... ideas?
>
> iostat systat vmstat report anything?

Sorry, I wasn't detailed enough. With "no visible I/O" I meant I looked with
iostat.

After 45 CPU minutes I tried to stop it - no chance.
Stop postgres - no chance. Wow, a really hard problem.
Had to reboot the VM.

Can I somehow dump that table without using an index?

Here's the info about \d dbmail_messageblks
                              Tabelle »public.dbmail_messageblks«
     Spalte      |   Typ    |                            Attribute
-----------------+----------+------------------------------------------------------------------
 messageblk_idnr | bigint   | not null default
nextval('dbmail_messageblk_idnr_seq'::regclass)
 physmessage_id  | bigint   |
 messageblk      | bytea    | not null
 blocksize       | bigint   | not null default (0)::bigint
 is_header       | smallint | not null default (0)::smallint
Indexe:
    »dbmail_messageblks_pkey« PRIMARY KEY, btree (messageblk_idnr)
    »dbmail_messageblks_1« UNIQUE, btree (physmessage_id, messageblk_idnr)
CLUSTER
    »dbmail_messageblks_physmessage_is_header_idx« btree (physmessage_id,
is_header)
Fremdschlüssel-Constraints:
    »dbmail_messageblks_physmessage_id_fkey« FOREIGN KEY (physmessage_id)
REFERENCES dbmail_physmessage(id) ON UPDATE CASCADE ON DELETE CASCADE

This works:
COPY public.dbmail_messageblks (messageblk_idnr, physmessage_id, blocksize,
is_header) TO stdout;
This does NOT work:
COPY public.dbmail_messageblks (messageblk_idnr, physmessage_id, messageblk,
blocksize, is_header) TO stdout;

So I'm sure the "messageblk" toast got toasted too much. Now I need a way to
get out all values that are readable and piss on the rest. I looked into the
output of the first and second COPY, it should be messageblk_idnr=3904492
that's broken. I tried
select * from dbmail_messageblks where messageblk_idnr not in (3904492);

That helps. Slowly I may be able to find all broken entries. Is there a more
automated way?

mfg zmi



Re: 8.3.5 broken after power fail

От
Michael Monnerie
Дата:
"Michael Monnerie" <michael.monnerie@is.it-management.at> schrieb:
> select * from dbmail_messageblks where messageblk_idnr not in (3904492);
>
> That helps. Slowly I may be able to find all broken entries. Is there a
more
> automated way?

Now I already have
select * from dbmail_messageblks where messageblk_idnr not in
(3904492,3904495,3904496,3904499,3904500,3904501,3904503);
ERROR:  unexpected chunk number 1 (expected 0) for toast value 1460210 in
pg_toast_1281127

Is there a way to find out what "toast value 1460210" stands for? I'd like
to find the connection between messageblk_idnr and that toast value, as it's
not easy to find that.

mfg zmi



Re: 8.3.5 broken after power fail

От
Michael Monnerie
Дата:
"Michael Monnerie" <michael.monnerie@is.it-management.at> schrieb:
> Now I already have
> select * from dbmail_messageblks where messageblk_idnr not in
> (3904492,3904495,3904496,3904499,3904500,3904501,3904503);
> ERROR:  unexpected chunk number 1 (expected 0) for toast value 1460210 in
> pg_toast_1281127
>
> Is there a way to find out what "toast value 1460210" stands for? I'd like
> to find the connection between messageblk_idnr and that toast value, as
it's
> not easy to find that.

My problem is that there seems to be a jump in messageblk_idnr, and I cannot
find the relation to the missing toast value. Is there a way to find this?

Or, I'd need to use something like COPY .. WHERE .. to exclude the broken
toasts and see where it stops next. SELECT only shows the error with the
toast value number, I'd like to see what it the last row that is processed
then I could find the next and know it's messageblk_idnr. Is that possible?

mfg zmi



Re: 8.3.5 broken after power fail

От
Michael Monnerie
Дата:
News! The server even crashes when accessing certain messageblks:

The "physmessage" table contains mails with date, mails are stored in pieces
in "messageblks" (see ER model at
http://www.dbmail.org/dokuwiki/doku.php?id=er-model ). I try to find first
and last message with a problem, here the last:

select substring(m.messageblk,0,5) from dbmail_messageblks m join
dbmail_physmessage p on (p.id=m.physmessage_id) where p.id >= 1831634;

But that crashes the server completely:

2009-02-17 23:48:43 CET pid=5481 1/2 idle: LOG:  statement: select
substring(m.messageblk,0,5) from dbmail_messageblks m join
dbmail_physmessage p on (p.id=m.physmessage_id) where p.id >= 1831634;
2009-02-17 23:48:44 CET pid=5482  /usr/bin/postmaster: LOG:  connection
received: host=[local]
2009-02-17 23:48:44 CET pid=5413  : LOG:  server process (PID 5481) was
terminated by signal 11: Segmentation fault
2009-02-17 23:48:44 CET pid=5413  : LOG:  terminating any other active
server processes
2009-02-17 23:48:44 CET pid=5413  : LOG:  all server processes terminated;
reinitializing
2009-02-17 23:48:44 CET pid=5483 -1/0 : LOG:  database system was
interrupted; last known up at 2009-02-17 23:48:32 CET
2009-02-17 23:48:44 CET pid=5483 -1/0 : LOG:  database system was not
properly shut down; automatic recovery in progress
2009-02-17 23:48:44 CET pid=5483 -1/0 : LOG:  record with zero length at
E/382481B8
2009-02-17 23:48:44 CET pid=5483 -1/0 : LOG:  redo is not required
2009-02-17 23:48:44 CET pid=5483 -1/0 : LOG:  checkpoint starting: shutdown
immediate
2009-02-17 23:48:44 CET pid=5483 -1/0 : LOG:  checkpoint complete: wrote 0
buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=0.003 s, sync=0.000 s, total=0.013 s
2009-02-17 23:48:44 CET pid=5486 -1/0 : LOG:  autovacuum launcher started
2009-02-17 23:48:44 CET pid=5413  : LOG:  database system is ready to accept
connections

Damn, that crash really busted my db. I'd only like to have the rest of the
data from that table...

mfg zmi



Re: 8.3.5 broken after power fail

От
Michael Monnerie
Дата:
I have the impression I'm asking the wrong things. Still I try.
Could I do something about this?

psql:/tmp/x:119: ERROR:  invalid page header in block 973698 of relation
"pg_toast_1281127"

It seems that toast table got truncated after host+XEN crash and XFS
recovery. Would adding some zeroes to that file help? If yes, how could I
find out what the file is named?

mfg zmi



Re: 8.3.5 broken after power fail

От
Scott Marlowe
Дата:
Look into zero damaged pages setting.

On Wed, Feb 18, 2009 at 2:35 PM, Michael Monnerie
<michael.monnerie@is.it-management.at> wrote:
> I have the impression I'm asking the wrong things. Still I try.
> Could I do something about this?
>
> psql:/tmp/x:119: ERROR:  invalid page header in block 973698 of relation
> "pg_toast_1281127"
>
> It seems that toast table got truncated after host+XEN crash and XFS
> recovery. Would adding some zeroes to that file help? If yes, how could I
> find out what the file is named?
>
> mfg zmi
>
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>



--
When fascism comes to America, it will be the intolerant selling it as
diversity.

how to check locale of a cluster?

От
Kasia Tuszynska
Дата:
Hi Everybody,

I am comparing selection results between postgres clusters on RHEL4 and Windows Server 2003.

The query is:
select count(*)from schema.table where route < 'Interstate 12'
on the RHEL machine the result has: 102 records
on the windows machine the result has: 586 records
the entire table has 687 records

On both clusters the column route is of type character varying (40)
On both clusters the server encoding is UTF8 - which I can see in pgAdminIII or by querying pg_database

I would like to figure out if the locale on both clusters is the same, because if the locales are different that would
explainthe different result sets for the same query.  

Does anyone know how to check the locale of a cluster?

The documentation has yielded this " You can check the LC_COLLATE and LC_CTYPE settings of a database with the
contrib/pg_controldatautility program." 
I do not have a contrib folder on either machine, I suppose that I could go install it but is there an easier way?

Thank you,
Sincerely,
Kasia

Re: how to check locale of a cluster?

От
Tom Lane
Дата:
Kasia Tuszynska <ktuszynska@esri.com> writes:
> Does anyone know how to check the locale of a cluster?

"show lc_collate" has worked since 7.4.

> The documentation has yielded this " You can check the LC_COLLATE and LC_CTYPE settings of a database with the
contrib/pg_controldatautility program." 

The docs have not said that since 7.3.  I really hope that is not what
you are running ... especially not on the Windows machine.

            regards, tom lane

Re: 8.3.5 broken after power fail

От
Michael Monnerie
Дата:
On Mittwoch 18 Februar 2009 Scott Marlowe wrote:
> Look into zero damaged pages setting.

Thanks for the hint Scott, I tried it and did:

select * into mb from dbmail_messageblks;
WARNING:  invalid page header in block 973075 of relation
"pg_toast_1281127"; zeroing out page
ERROR:  missing chunk number 0 for toast value 1623370 in
pg_toast_1281127

That missing chunks still stop the query. I found out what messageblks
are destroyed, and can get out all the rest. There was something
interesting:

As the data in that table is about 10GB and therefore time consuming to
read, I tried reading only small parts of that big "messageblk" field of
type "bytea", which is the longest field. So I tried reading every row
with char_length(messageblk,5) and on success noted that block as good.
But upon reading there still could be errors, meaning you must really
test to read the full messageblk field. Just if someone else needs to
perform such a test.

Also a question: Because I must read all data, the psql client runs out
of memory, trying to cache all the 10GB from that table. I circumvented
this with selecting only parts of the table all the time. Is there a
smart way to do such a select without caching the results in memory? Is
that what temporary tables and "select into" are made for? I just want
to know the recommended way for doing huge queries.

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


Вложения

Re: 8.3.5 broken after power fail SOLVED

От
Michael Monnerie
Дата:
I managed to recover the data that was still readable. About 650
messageblock entries got lost. What makes me nervous a bit is that
postgres kept running despite (partially) being destroyed. It should
really have shutdown itself after the first problem was found. That
database is for mails, and I would understand to have lost some from the
time before the power loss, but I even lost entries from *after* the
crash. That means the error happened after the system was up again and
happily did it's work, but lost entries to a table. Could there be a
better way to check the db at crash recovery startup time?

If someone is interested, I have a full postgres log with every single
command done to the database.

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


Вложения

Re: 8.3.5 broken after power fail SOLVED

От
Scott Marlowe
Дата:
On Sat, Feb 21, 2009 at 1:43 AM, Michael Monnerie
<michael.monnerie@is.it-management.at> wrote:
> I managed to recover the data that was still readable. About 650
> messageblock entries got lost. What makes me nervous a bit is that
> postgres kept running despite (partially) being destroyed. It should
> really have shutdown itself after the first problem was found. That
> database is for mails, and I would understand to have lost some from the
> time before the power loss, but I even lost entries from *after* the
> crash. That means the error happened after the system was up again and
> happily did it's work, but lost entries to a table. Could there be a
> better way to check the db at crash recovery startup time?
>
> If someone is interested, I have a full postgres log with every single
> command done to the database.

We preach this again and again.  PostgreSQL can only survive a power
outage type failure ONLY if the hardware / OS / filesystem don't lie
about fsync.  If they do, all bets are off, and this kind of failure
means you should really failover to another machine or restore a
backup.

It's why you have to do possibly destructive tests to see if your
server stands at least some chance of surviving this kind of failure,
log shipping for recovery, and / or replication of another form (slony
etc...) to have a reliable server.

The recommendations for recovery of data are just that, recovery
oriented.  They can't fix a broken database at that point.  You need
to take it offline after this kind of failure if you can't trust your
hardware.

Usually when it finds something wrong it just won't start up.

Re: 8.3.5 broken after power fail

От
Scott Marlowe
Дата:
On Sat, Feb 21, 2009 at 1:23 AM, Michael Monnerie
<michael.monnerie@is.it-management.at> wrote:

> Also a question: Because I must read all data, the psql client runs out
> of memory, trying to cache all the 10GB from that table. I circumvented
> this with selecting only parts of the table all the time. Is there a
> smart way to do such a select without caching the results in memory? Is
> that what temporary tables and "select into" are made for? I just want
> to know the recommended way for doing huge queries.

You can dump individual tables with pg_dump -t table1 -t table2.  That
should work without running out of memory.  And yeah, temp tables and
select into are a good way to get your data ready to be pg_dumped.

Re: 8.3.5 broken after power fail

От
Michael Monnerie
Дата:
On Samstag 21 Februar 2009 Scott Marlowe wrote:
> You can dump individual tables with pg_dump -t table1 -t table2.
>  That should work without running out of memory.  And yeah, temp
> tables and select into are a good way to get your data ready to be
> pg_dumped.

Yes, actually I did pg_dump -T broken_table, and got all others out.
Thanks to Murphy's Law that one broken table was the biggest, the one
containing the e-mail data.

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


Вложения

Re: 8.3.5 broken after power fail SOLVED

От
Michael Monnerie
Дата:
On Samstag 21 Februar 2009 Scott Marlowe wrote:
> We preach this again and again.  PostgreSQL can only survive a power
> outage type failure ONLY if the hardware / OS / filesystem don't lie
> about fsync.  If they do, all bets are off, and this kind of failure
> means you should really failover to another machine or restore a
> backup.

The shit thing is, I just discussed with the XFS devs last week, whether
it is save to have a virtualization like VMware or XEN, and the answer
was "depends on the hypervisor". I had such an issue with VMware 2 years
ago, and now with XEN, so I would say they are not save. But there must
be something you can configure in order not to have such drastic errors
on power fail. It's just nobody seems to know (or want to tell) how to
do that. At least, not to me ;-)

> It's why you have to do possibly destructive tests to see if your
> server stands at least some chance of surviving this kind of failure,
> log shipping for recovery, and / or replication of another form
> (slony etc...) to have a reliable server.

As I need another Postgres setup with a server syncing dbmail to
another, I guess I'll do that with WAL, so at least then I can recover
to that latest entry.

> The recommendations for recovery of data are just that, recovery
> oriented.  They can't fix a broken database at that point.  You need
> to take it offline after this kind of failure if you can't trust your
> hardware.
>
> Usually when it finds something wrong it just won't start up.

The problem was I wasn't working this week, and did just a basic check
if everything is up again. There were e-mails arriving, so I thought
it's OK. I was very pissed when some days later I found strange things
happening, and then to see that a table was broken and ate nearly all e-
mails. If at least Postgres would have whined and stopped working...

I know it's not Postgres' fault to have fsync messed up, but at least
error recovery should have found the problem, latest at the moment the
first transaction touched the problematic table. Instead of throwing the
data effectively to /dev/null :-(

mfg zmi
--
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


Вложения

Re: 8.3.5 broken after power fail SOLVED

От
Naomi Walker
Дата:
Other than disaster tests, how would I know if I have an system that
lies about fsync?


>
> We preach this again and again.  PostgreSQL can only survive a power
> outage type failure ONLY if the hardware / OS / filesystem don't lie
> about fsync.  If they do, all bets are off, and this kind of failure
> means you should really failover to another machine or restore a
> backup.
>
> It's why you have to do possibly destructive tests to see if your
> server stands at least some chance of surviving this kind of failure,
> log shipping for recovery, and / or replication of another form (slony
> etc...) to have a reliable server.
>
> The recommendations for recovery of data are just that, recovery
> oriented.  They can't fix a broken database at that point.  You need
> to take it offline after this kind of failure if you can't trust your
> hardware.
>
> Usually when it finds something wrong it just won't start up.
>
>


--
------------------------------------------------------------------------
Naomi Walker                          Chief Information Officer
Eldorado Computing, Inc               nwalker@eldocomp.com
  ---An EDS Company                   602-604-3100
------------------------------------------------------------------------
Hard things are put in our way, not to stop us, but to call out our
courage and strength. -- Anonymous
------------------------------------------------------------------------


Re: 8.3.5 broken after power fail SOLVED

От
Ron Mayer
Дата:
Naomi Walker wrote:
> Other than disaster tests, how would I know if I have an system that
> lies about fsync?

Well, the linux kernel tries to detect it on bootup and
will give messages like this:
  %dmesg | grep 'disabling barriers'
  JBD: barrier-based sync failed on md1 - disabling barriers
  JBD: barrier-based sync failed on hda3 - disabling barriers
when it detects certain types of unreliable fsync's. The command
  %hdparm -I /dev/hdf | grep FLUSH_CACHE_EXT
will give you clues if a hard drive itself even can support
a non-lying fsync when it's internal cache is enabled.


Sadly some filesystems (ext3) lie even above and beyond what
Linux does - by only using the write barriers correctly
when the inode itself is modified; not when the data is modified.
A test program here:
http://archives.postgresql.org/pgsql-performance/2008-08/msg00159.php
can detect those cases where the kernel & drive don't lie
about fsync but ext3 lies in spite of them; with more background
info here:
 http://article.gmane.org/gmane.linux.file-systems/21373
 http://thread.gmane.org/gmane.linux.kernel/646040


Elsewhere in the archives you can find programs that measure
how fast fsyncs happen - but on your hardware, and you can
try to see if those numbers approximately match how fast your
disks spin.  But then you still need to make sure the test
program used the same methods for syncing the drive that your
postgres configuration files are choosing.

I wonder if the only really safe way is to run a very
write intensive database script and pull and kill your
system in a number of ways, including yanking power to
the system; to disk arrays, etc and see if your database died.


>
>
>>
>> We preach this again and again.  PostgreSQL can only survive a power
>> outage type failure ONLY if the hardware / OS / filesystem don't lie
>> about fsync.  If they do, all bets are off, and this kind of failure
>> means you should really failover to another machine or restore a
>> backup.
>>
>> It's why you have to do possibly destructive tests to see if your
>> server stands at least some chance of surviving this kind of failure,
>> log shipping for recovery, and / or replication of another form (slony
>> etc...) to have a reliable server.
>>
>> The recommendations for recovery of data are just that, recovery
>> oriented.  They can't fix a broken database at that point.  You need
>> to take it offline after this kind of failure if you can't trust your
>> hardware.
>>
>> Usually when it finds something wrong it just won't start up.
>>
>>
>
>


Re: 8.3.5 broken after power fail SOLVED

От
Scott Marlowe
Дата:
On Sat, Feb 21, 2009 at 3:41 PM, Ron Mayer
<rm_pg@cheapcomplexdevices.com> wrote:
> Naomi Walker wrote:
>> Other than disaster tests, how would I know if I have an system that
>> lies about fsync?
>
> Well, the linux kernel tries to detect it on bootup and
> will give messages like this:
>  %dmesg | grep 'disabling barriers'
>  JBD: barrier-based sync failed on md1 - disabling barriers
>  JBD: barrier-based sync failed on hda3 - disabling barriers
> when it detects certain types of unreliable fsync's. The command
>  %hdparm -I /dev/hdf | grep FLUSH_CACHE_EXT
> will give you clues if a hard drive itself even can support
> a non-lying fsync when it's internal cache is enabled.
>
>
> Sadly some filesystems (ext3) lie even above and beyond what
> Linux does - by only using the write barriers correctly
> when the inode itself is modified; not when the data is modified.
> A test program here:
> http://archives.postgresql.org/pgsql-performance/2008-08/msg00159.php
> can detect those cases where the kernel & drive don't lie
> about fsync but ext3 lies in spite of them; with more background
> info here:
>  http://article.gmane.org/gmane.linux.file-systems/21373
>  http://thread.gmane.org/gmane.linux.kernel/646040
>
>
> Elsewhere in the archives you can find programs that measure
> how fast fsyncs happen - but on your hardware, and you can
> try to see if those numbers approximately match how fast your
> disks spin.  But then you still need to make sure the test
> program used the same methods for syncing the drive that your
> postgres configuration files are choosing.
>
> I wonder if the only really safe way is to run a very
> write intensive database script and pull and kill your
> system in a number of ways, including yanking power to
> the system; to disk arrays, etc and see if your database died.

Well, you can't prove it's 100% safe but you can usually find most of
the not safe systems this way.  I usually setup a big pgbench db, run
500 or so concurrent, wait 5 or 10 minutes, run a checkpoint, and
halfway through it pull the plug.  It's still possible for a system to
fail after passing this test, but I feel a lot better knowing I've
done it a couple of times and the db came back up without problems.