Re: PANIC during VACUUM

Поиск

Список

Период

Сортировка

От	German Becker
Тема	Re: PANIC during VACUUM
Дата	30 апреля 2013 г. 15:26:18
Msg-id	CALyjCLvoWybyrcmvH65J=rOYPp1bO6fUbMSBEOUjh8mQd3uv1Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: PANIC during VACUUM (Kevin Grittner <kgrittn@ymail.com>)
Список	pgsql-admin

Дерево обсуждения

OK I apologise for the lack of clarity of the first message. Let me summarize the steps that lead me to the error.

I have 2 servers running Ubuntu 12.04 on which I am testing Postgres 9.1.9. I set up streaming replication between them (no synchronous replication)

Both servers have 4 SATA hard drives with ext3 file system set up as follows

sda --> / main os and the database files, except for the ones defined below

sdb ---> pg_xlog directory

sdc ----> one tablespace where heavy transaction tables are stored

sdd --> another tablespace where big historic tables are stored.

archiving mode is on and the archive location is sda (and from there to the hot-standby server)

For testing I Populate the database with the data currently in production (currently Postgres 8.3).

Then I run several load testing etc.

For tunning / improving the archiving process I needed to generate big ammount of WAL. To do so I just deleted the contents of one big table, and then VACUUM it, like this

DELETE form bigtable;

VACUUM bigtable;

And I found the error reported.

I repeated the whole process (creating a new cluster, populating it with data - allways the same data- , seting up replication) a couple of times after that and I found the error again about 90% of the time. I tried deleting a big portion of the table and the error did not appeard. It only appears after deleting ALL. Also in some cases I didn't run the VACUUM command manually, and the error ocurred during auto-vacuum-

My last test, was, in case there was a hardware problem in the primary, to trigger the standby server and try the vacuum there. With the same results.

Here a chunk of the log:

2013-04-29 17:02:21 ART [12024]: [32-1] PANIC: XX001: corrupted item pointer: offset = 8128, size = 80

2013-04-29 17:02:21 ART [12024]: [33-1] LOCATION: PageIndexMultiDelete, bufpage.c:779

2013-04-29 17:02:21 ART [12024]: [34-1] STATEMENT: VACUUM callshopcdrs ;

2013-04-29 17:02:21 ART [23787]: [8-1] LOG: server process (PID 12024) was terminated by signal 6: Aborte

2013-04-29 17:02:21 ART [23787]: [9-1] LOG: terminating any other active server processes

2013-04-29 17:02:21 ART [7300]: [2-1] WARNING: terminating connection because of crash of another server

process

2013-04-29 17:02:21 ART [7300]: [3-1] DETAIL: The postmaster has commanded this server process to roll ba

ck the current transaction and exit, because another server process exited abnormally and possibly corrupt

ed shared memory.

2013-04-29 17:02:21 ART [7300]: [4-1] HINT: In a moment you should be able to reconnect to the database a

nd repeat your command.

2013-04-29 17:02:21 ART [30304]: [1-1] FATAL: the database system is in recovery mode

2013-04-29 17:02:21 ART [23787]: [10-1] LOG: archiver process (PID 7301) exited with exit code 1

2013-04-29 17:02:21 ART [23787]: [11-1] LOG: all server processes terminated; reinitializing

2013-04-29 17:02:21 ART [30305]: [1-1] LOG: database system was interrupted; last known up at 2013-04-29

16:59:01 ART

2013-04-29 17:02:21 ART [30305]: [2-1] LOG: database system was not properly shut down; automatic recover

y in progress

2013-04-29 17:02:21 ART [30305]: [3-1] LOG: redo starts at 11/497D4338

2013-04-29 17:02:21 ART [30305]: [4-1] LOG: invalid magic number 0000 in log file 17, segment 73, offset

8216576

2013-04-29 17:02:21 ART [30305]: [5-1] LOG: redo done at 11/497D4440

2013-04-29 17:02:22 ART [30308]: [1-1] LOG: autovacuum launcher started

2013-04-29 17:02:22 ART [23787]: [12-1] LOG: database system is ready to accept connections

There is a core file generated, it is 7GB big:

$ file core

core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'postgres: postgres tvoip3 [local] VACUUM'

Many thanks for your help and let me know any extra information that might be useful.

German

On Tue, Apr 30, 2013 at 8:51 AM, Kevin Grittner <kgrittn@ymail.com> wrote:

[please don't top-post]

German Becker <german.becker@gmail.com> wrote:
> Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
>> German Becker wrote:

>>> I am testing version 9.1.9 before putting it in production. One
>>> of my tests involved deleting a the contents of a big table ( ~
>>> 13 GB size) and then VACUUMing it. During VACUUM PANICS.

>> If you mess with the database files, errors like this are to be
>> expected.

> Thanks for your reply. In which sense did I mess with the
> database files?

You didn't say how you deleted the contents of that big table, and
it appears that Albe assumed you deleted or truncated the
underlying disk file rather than using the DELETE or TRUNCATE SQL
statement.

In any event, more details would help people come up with ideas on
what might be wrong.

http://wiki.postgresql.org/wiki/Guide_to_reporting_problems

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-admin по дате отправления:

Предыдущее

От: Albe Laurenz
Дата: 30 апреля 2013 г., 15:08:29
Сообщение: Re: PANIC during VACUUM

Следующее

От: Scott Whitney
Дата: 30 апреля 2013 г., 22:43:14
Сообщение: Some replication-related notes and questions

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: PANIC during VACUUM

Предыдущее

Следующее