Обсуждение: How to design for cheap reliability with PostgreSQL?

Поиск
Список
Период
Сортировка

How to design for cheap reliability with PostgreSQL?

От
"Michael Meier"
Дата:
Hi,

I am in the process of designing a distributed application (Java EE 5)
which has demand for database functionality at several points. Some of
these databases need to provide really mission critical reliability,
i.e. it would be the end of our world, if we would suffer from data
loss or silent data corruption. Availabilty or servicability are nice
but not that important. This means that it isn't that important that a
transaction is committed, but if the transaction is committed, the
data should be 100% safe - without any chance of data loss or
corruption.

On the other hand, real proper reliability is expensive and we do not
have the money for serious hardware or "unbreakable" database
licences. I assume a DB size of < 50GB with moderate write I/O and
less moderate read I/O, so I would like to get a machine with 64 GB
Ram for in-memory caching.

I am thinking of PostgreSQL 8.3.n with n>=5  on top of Solaris 10 with ZFS.

So an extremely cheap low end hardware platform for the problem might
look like this:
Dual Quad Core Xeon Machine
64 GB FB-DIMM ECC-Ram
Two 8-Port SAS Controller
Internal 16 x 73 GB 15k hard Drives, partitioned as follows:
- 4 drives:  3-way raid-1 zfs mirror with one additional hot spare for Solaris
- 6 drives: 4-way raid-1 zfs mirror with two additional hot spares for
Postgresql Tables
- 6 drives: 4-way raid-1 zfs mirror with two additional hot spares for
Postgresql Logging
ZFS-Scrubbing every night.
approx. 24.000$ list price from Sun (x4250) with Sun Platinum support
for the hardware and Solaris,
or approx. 12.000$ self-assembled from newegg.

And an extremely expensive high-end solution for the problem might
look like this:
Sun SPARC Enterprise M4000 Server with two SPARC64 VII CPUs
64GB registered ECC-Ram
Sun M-Series RAS-Features (Checksumming of CPU-Registers, etc.)
An external Drive Array (e.g. J4400) with 16 drives. Partioned like
the internal drives above.
ZFS-Scrubbing every night.
approx. 160.000$

This insane price difference would get us data integrity beyond the
ZFS checksumming feature: It would protect the data even in the CPU
registers.

So the questions are:

1) Is this necessary? Do bit errors happen with configurations like
the cheap xeon one above?
2) If this is necessary, is there any (software) way around it? Is
there a PostgreSQL clustering solution available, where the cluster
nodes check each others data integrity?

Thanks a lot for any hints!

Michael

Re: How to design for cheap reliability with PostgreSQL?

От
Richard Huxton
Дата:
Michael Meier wrote:
> Hi,
>
> I am in the process of designing a distributed application (Java EE 5)
> which has demand for database functionality at several points. Some of
> these databases need to provide really mission critical reliability,
> i.e. it would be the end of our world, if we would suffer from data
> loss or silent data corruption. Availabilty or servicability are nice
> but not that important. This means that it isn't that important that a
> transaction is committed, but if the transaction is committed, the
> data should be 100% safe - without any chance of data loss or
> corruption.

I'm assuming you mean 99.99..9% safe.

> On the other hand, real proper reliability is expensive and we do not
> have the money for serious hardware or "unbreakable" database
> licences. I assume a DB size of < 50GB with moderate write I/O and
> less moderate read I/O, so I would like to get a machine with 64 GB
> Ram for in-memory caching.

An "unbreakable" database isn't necessarily going to help you with the
sort of errors you're thinking of.

> I am thinking of PostgreSQL 8.3.n with n>=5  on top of Solaris 10 with ZFS.

Not sure if it's of interest, but there's been discussion of block-level
checksums for 8.4. You might want to check the archives for the -hackers
mailing list. I've never used ZFS, but from what I've read it would make
sense.

> So an extremely cheap low end hardware platform for the problem might
> look like this:

You and I are at different scales of "cheap" :-)

> Dual Quad Core Xeon Machine
> 64 GB FB-DIMM ECC-Ram
> Two 8-Port SAS Controller
> Internal 16 x 73 GB 15k hard Drives, partitioned as follows:
> - 4 drives:  3-way raid-1 zfs mirror with one additional hot spare for Solaris
> - 6 drives: 4-way raid-1 zfs mirror with two additional hot spares for
> Postgresql Tables
> - 6 drives: 4-way raid-1 zfs mirror with two additional hot spares for
> Postgresql Logging

That's a lot of disks for a 50GB database, unless you've got a *lot* of
writes.

> ZFS-Scrubbing every night.
> approx. 24.000$ list price from Sun (x4250) with Sun Platinum support
> for the hardware and Solaris,
> or approx. 12.000$ self-assembled from newegg.

Well, several of either must be preferable to your expensive version.

> And an extremely expensive high-end solution for the problem might
> look like this:
> Sun SPARC Enterprise M4000 Server with two SPARC64 VII CPUs
> 64GB registered ECC-Ram
> Sun M-Series RAS-Features (Checksumming of CPU-Registers, etc.)
> An external Drive Array (e.g. J4400) with 16 drives. Partioned like
> the internal drives above.
> ZFS-Scrubbing every night.
> approx. 160.000$
>
> This insane price difference would get us data integrity beyond the
> ZFS checksumming feature: It would protect the data even in the CPU
> registers.
>
> So the questions are:
>
> 1) Is this necessary? Do bit errors happen with configurations like
> the cheap xeon one above?

You'll probably want to estimate the odds of earthquake/volcano/nuclear
war and see how much effort it's worth.

> 2) If this is necessary, is there any (software) way around it? Is
> there a PostgreSQL clustering solution available, where the cluster
> nodes check each others data integrity?

Hmm - there are commercial variants of PG that do clustering/replication
(Greenplum and EnterpriseDB for a start) but I'm not sure if it's quite
what you're after.

I don't know what sort of queries you're running, but you could do
something like:
- pg-pool to send queries to two front-end servers
- both replicate to a third server over slony, which compares the
replicated copies (synchronisation issues here though)
- you keep the replication stream or keep separate WAL archives so if a
problem is found you stop everything and rewind until you get to a known
good point.

Of course, all this just introduces more code with more chances of a
bug. It would *look* more reliable, but I'm not sure it would be.

HTH

--
  Richard Huxton
  Archonet Ltd