Обсуждение: stability issues

Поиск
Список
Период
Сортировка

stability issues

От
"Willy-Bas Loos"
Дата:
Hi,

I'm running a PostgreSQL 8.1.9 server on Debian etch (default installation, w/ PostGIS from Debian repositories).
The machine has double Xeon 5130, 4 GB of DDR2 ECC Reg. RAM and a two-sata-disc Raid 0 cluster. I don't know the brand of the memory nor the HDDs. The Riad controller is a 3Ware 8006-2.

Lately i've been confronted by some failures in several PostgreSQL clusters.
3 weeks ago a cluster crashed when i was reindexing it, beond my repair.
Last week we've been getting Toast warnings and shared index problems on another cluster. I solved the shared index issue, but the Toast trouble remained, in more than one database on that cluster. I dumped the databases, created another cluster and shut the troubled one down. (problems were solved).
I've asked our provider to do a filesystem check, (e2fsck -f /dev/sda1) which indicated that "there's nothing wrong with the filesystem".

This doesn't feel right.
Is there anything else i should check?

thx,

WBL




Re: stability issues

От
Tom Lane
Дата:
"Willy-Bas Loos" <willybas@gmail.com> writes:
> I'm running a PostgreSQL 8.1.9 server on Debian etch (default installation,
> w/ PostGIS from Debian repositories).
> The machine has double Xeon 5130, 4 GB of DDR2 ECC Reg. RAM and a
> two-sata-disc Raid 0 cluster. I don't know the brand of the memory nor the
> HDDs. The Riad controller is a 3Ware 8006-2.

> Lately i've been confronted by some failures in several PostgreSQL clusters.
> 3 weeks ago a cluster crashed when i was reindexing it, beond my repair.
> Last week we've been getting Toast warnings and shared index problems on
> another cluster. I solved the shared index issue, but the Toast trouble
> remained, in more than one database on that cluster. I dumped the databases,
> created another cluster and shut the troubled one down. (problems were
> solved).
> I've asked our provider to do a filesystem check, (e2fsck -f /dev/sda1)
> which indicated that "there's nothing wrong with the filesystem".

> This doesn't feel right.

No, it sure doesn't.  It sounds to me like the hardware is getting
flaky.  Memory tests might be the first thing to run.  There's also the
old clean-and-reseat-all-the-boards-and-connectors exercise...

            regards, tom lane

Re: stability issues

От
"Scott Marlowe"
Дата:
On 9/18/07, Willy-Bas Loos <willybas@gmail.com> wrote:
> Hi,
>
> I'm running a PostgreSQL 8.1.9 server on Debian etch (default installation,
> w/ PostGIS from Debian repositories).
> The machine has double Xeon 5130, 4 GB of DDR2 ECC Reg. RAM and a
> two-sata-disc Raid 0 cluster. I don't know the brand of the memory nor the
> HDDs. The Riad controller is a 3Ware 8006-2.
>
> Lately i've been confronted by some failures in several PostgreSQL clusters.
> 3 weeks ago a cluster crashed when i was reindexing it, beond my repair.
> Last week we've been getting Toast warnings and shared index problems on
> another cluster. I solved the shared index issue, but the Toast trouble
> remained, in more than one database on that cluster. I dumped the databases,
> created another cluster and shut the troubled one down. (problems were
> solved).
> I've asked our provider to do a filesystem check, (e2fsck -f /dev/sda1)
> which indicated that "there's nothing wrong with the filesystem".

Too little really.  This system needs to have serious diagnostics run
on it to find out what the problem is.

> This doesn't feel right.
> Is there anything else i should check?

Yes.  If you can't get them to run real tests with memtest86 etc, then
try running very large compiles, like the linux kernel with -j4 or -j8
and watch for sig 11s while doing it.  Set it up to run the compiles
in a look looking for errors in compiling.

Better yet, find a hosting provider who knows what they're doing.
Your data / uptime are worth it.