Re: BUG #5929: ERROR: found toasted toast chunk for toast value 260340218 in pg_toast_260339342

Поиск

Список

Период

Сортировка

От	Tambet Matiisen
Тема	Re: BUG #5929: ERROR: found toasted toast chunk for toast value 260340218 in pg_toast_260339342
Дата	17 марта 2011 г. 06:35:36
Msg-id	4D810A97.3000602@gmail.com обсуждение исходный текст
Ответ на	Re: BUG #5929: ERROR: found toasted toast chunk for toast value 260340218 in pg_toast_260339342 ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список	pgsql-bugs

Дерево обсуждения

On 16.03.2011 17:09, Kevin Grittner wrote:
> Tambet Matiisen<tambet.matiisen@gmail.com>  wrote:
>
>> Pre-live database is restored from live database dump every night.
>
> How is that done?  A single pg_dump of the entire live database
> restored using psql?  Are both database servers at the same
> PostgreSQL version?

Yes, I use pg_dump on live server and the result is rdiff-backupped into
development server. Whole SQL dump is 12G without compression and the
rdiff delta is about 10-20MB every day. Then I drop pre-live database on
development server and recreate it using createdb and psql.

For a while development server was running 8.4 and live server 8.1. Now
both are 8.4, but this shouldn't matter, as I do backup and restore via SQL.

>
>> So far the errors have been in pre-live database,
>
> You're running pg_dump against a database you just restored from a
> pg_dump image?

Hmm, yeah. This sounds rather dumb, but haven't got to that yet.
Development server contains some additional databases as well, that do
not exist on live server.

>
>> Usually the next day error was gone. I mostly blamed badly timed
>> backup and restore scripts, although this shouldn't result in
>> errors.
>
> No it shouldn't -- if you're following any of the documented backup
> and restore techniques.  I have a suspicion that you're just doing a
> file copy without stopping the live database or properly following
> the documented PITR backup and recovery techniques.

No, I don't do any advanced backup tricks. Just plain pg_dump and psql.

>
> This time the error is not in pre-live database and therefore it
>> doesn't go away.
>
> If I understand you, this sounds like corruption in the live
> database; nothing on the pre-live database is part of causing this
> problem.

This would be the case when I do filesystem level copy, but I do not.

>
>> The server is also running [...] Samba [...]
>
> I hope you're not trusting Samba too far.  For a while we were using
> it in backups across our WAN, and it mangled at least one file
> almost every day.  We had to take to running md5sum against both
> ends for each file to ensure we didn't get garbage (until we
> converted everything to use TCP communications, which have never
> mangled anything for us).

As I said, I'm using rdiff-backup to transfer pure SQL files.

>
>> Both fsync and full_page_writes are on.
>
> Good.  Without those an OS or hardware crash can corrupt your
> database.

Actually they are commented out, but I suppose this means "on".

>
>> OK, I don't have UPS for this machine, but power has been stable.
>> Current uptime is 32 days, which I bet is from the last kernel
>> update.
>
> OK.  A power outage wouldn't be too likely to matter if you have
> fsync and full_page_writes on.

That's a relief :).

>
>> Currently I blame either faulty memory or faulty software RAID
>> driver.  I can easily eliminate the memory cause by running
>> memtest86 for few hours
>
> Is this ECC memory?  If not, even a good test doesn't prove that a
> RAM problem didn't cause the corruption.

It's not ECC memory.

>
>> Now, off to buy UPS...
>
> Not a bad idea, but it doesn't sound like lack of that is likely to
> have caused the corruption in your live database, based on the
> settings you mentioned.  (Assuming those settings are in use on the
> live server.)

Checked live server, it has also fsync=on and full_page_writes=on. But
it shouldn't matter, because backup of live server doesn't give any errors.

It is possible, that restore of pre-live database using psql lasts so
long, that backup of the same database using pg_dump is already kicking
in. But again, this shouldn't matter and it doesn't explain why the last
error is in another database, that hasn't changed for months.

Now I have to find time to run memtest.

   Tambet

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #5929: ERROR: found toasted toast chunk for toast value 260340218 in pg_toast_260339342