Обсуждение: Disk Failure Scenarios

Поиск
Список
Период
Сортировка

Disk Failure Scenarios

От
"Michael Artz"
Дата:
(Sorry if this gets posted twice ... forgot that the list doesn't like new, unregistered email addresses)

I'm setting up PG, and am curious about the failure scenarios of
Postgres with respect to crashed disks.  In a given Postgres
installation across many disks, which sections of Postgres can fail
"gracefully" (i.e. the customer data is safe and the installation can
be recreated without backups)?  I'm thinking of the scenario where you
have numerous tablespaces with tables and indexes spread across them
and have separated pg_xlog onto a separate disk.  So the setup might be
something like this:

Disk 1: OS + Postgres install
Disk 2: pg_xlog
Disk 3: initialized tablespace containing table_master
Disk 4: tablespaceA containing tableA
Disk 5: tablespaceB containing indexB

In this simplistic configuration, only Disk 4 contains any real
customer data, right?  If any of the other disks fail, would it be
possible to slap in a replacement disk and rebuild the database install
around disk 4?  I.e.:

-Disk1: If the OS/Postgres install disk fails, its possible to
reinstall the OS and the same version of Postgres and point it at disk
3 and everything should run, right?

-Disk 2: If the transaction log dies, all changes since the last
checkpoint are lost, right?  Again, if I set up an empty pg_xlog
directory somewhere else, the DB should run just fine, right?

-Disk 3: This holds all the pg_* tables, which means the structure of
the DB, right?  If this disk goes, would it be possible to reinitialize
the database directory, create the new database, create a new
tablespaceA on Disk 4, and create a new tableA, and somehow have it use
the data pages for tableA that are already on disk?  Does it change if
tableA inherits from table_master?

-Disk 4: We're screwed without backups.

-Disk 5: I figure that we can just recreate any indexes, right?  Can we
safely drop indexB if the data pages for the index don't exist on disk
(i.e. the tablespace is empty)?  Will Postgres do the "right" thing and
delete the knowledge of the index from the pg_* tables and then stop?

Thanks for any help,
-Mike

Re: Disk Failure Scenarios

От
"Jim C. Nasby"
Дата:
On Wed, Apr 26, 2006 at 11:54:23PM -0400, Michael Artz wrote:
> -Disk1: If the OS/Postgres install disk fails, its possible to
> reinstall the OS and the same version of Postgres and point it at disk
> 3 and everything should run, right?

Only if that doesn't include $PGDATA

> -Disk 2: If the transaction log dies, all changes since the last
> checkpoint are lost, right?  Again, if I set up an empty pg_xlog
> directory somewhere else, the DB should run just fine, right?

No, because there's no way to know what state the data pages are in.
Data may have made it to disk, may not have, partial page write, etc...

> -Disk 3: This holds all the pg_* tables, which means the structure of
> the DB, right?  If this disk goes, would it be possible to reinitialize
> the database directory, create the new database, create a new
> tablespaceA on Disk 4, and create a new tableA, and somehow have it use
> the data pages for tableA that are already on disk?  Does it change if
> tableA inherits from table_master?

*maybe*, but it's likely to be extremely painful, if it works at all.

> -Disk 4: We're screwed without backups.
>
> -Disk 5: I figure that we can just recreate any indexes, right?  Can we
> safely drop indexB if the data pages for the index don't exist on disk
> (i.e. the tablespace is empty)?  Will Postgres do the "right" thing and
> delete the knowledge of the index from the pg_* tables and then stop?

You'll probably need to re-create the appropriate files and then
REINDEX. This is the only disk where you have any real chance of
recovering from a failure without losing data (other than the binaries).

Now the real question is: why are you trying to run without raid?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Disk Failure Scenarios

От
"Michael Artz"
Дата:

> -Disk 2: If the transaction log dies, all changes since the last
> checkpoint are lost, right?  Again, if I set up an empty pg_xlog
> directory somewhere else, the DB should run just fine, right?

No, because there's no way to know what state the data pages are in.
Data may have made it to disk, may not have, partial page write, etc...

As far as I understand it, data is only written to the WAL except when the WAL is checkpointing, right?  So if your WAL disk crashes and you aren't int the middle of a checkpoint, there is a chance that you would just lose data since the last checkpoint.  Am I missing something?

Now the real question is: why are you trying to run without raid?

I have a single, very fast disk lying around, and I was just wondering what parts of the DB I could "safely" put on it.  I was thinking either the WAL or and index.  I have essentially 15 10K drives and 1 15K drive, and don't quite know what to do with it.

-Mike