[ADMIN] database corruption on disk-full?

Поиск
Список
Период
Сортировка
От Peter
Тема [ADMIN] database corruption on disk-full?
Дата
Msg-id 20170905092146.GA32634@gate.oper.dinoex.org
обсуждение исходный текст
Список pgsql-admin
On Sep 1 I found strange errors on my database:

Sep  1 18:55:52 <local0.notice> edge postgres[1691]: [7-1] :[] WARNING:  archiving transaction log file
"0000000100000022000000C9"failed too many times, will try again later 
Sep  1 19:01:11 <local0.warn> edge postgres[77410]: [11-1] bacula:bacula[192.168.98.2(37891)] ERROR:  could not read
block80461 in file "pg_tblspc/16391/PG_9.5_201510051/16394/1435214": read only 0 of 8192 bytes 
Sep  1 19:01:11 <local0.warn> edge postgres[77410]: [11-2] bacula:bacula[192.168.98.2(37891)] STATEMENT:  INSERT INTO
File... 
Sep  1 19:04:59 <local0.warn> edge postgres[68877]: [8-1] bacula:bacula[192.168.98.2(36617)] ERROR:  could not read
block80461 in file "pg_tblspc/16391/PG_9.5_201510051/16394/1435214": read only 0 of 8192 bytes 
Sep  1 19:04:59 <local0.warn> edge postgres[68877]: [8-2] bacula:bacula[192.168.98.2(36617)] STATEMENT:  INSERT INTO
File... 
Sep  1 19:43:48 <local0.notice> edge postgres[1691]: [12-1] :[] WARNING:  archiving transaction log file
"0000000100000022000000CE"failed too many times, will try again later 
Sep  1 19:57:48 <local0.warn> edge postgres[92864]: [14-1] bacula:bacula[192.168.98.2(12726)] ERROR:  deadlock detected
atcharacter 13 
Sep  1 19:57:48 <local0.warn> edge postgres[92864]: [14-2] bacula:bacula[192.168.98.2(12726)] STATEMENT:  DELETE FROM
jobmediaWHERE jobid IN 
Sep  1 19:57:48 <local0.warn> edge postgres[92864]: [14-3]        (SELECT jobid FROM idliste);
Sep  1 19:57:48 <local0.warn> edge postgres[92864]: [15-1] bacula:bacula[192.168.98.2(12726)] ERROR:  current
transactionis aborted, commands ignored until end of transaction block 
Sep  1 19:57:48 <local0.warn> edge postgres[92864]: [15-2] bacula:bacula[192.168.98.2(12726)] STATEMENT:  DELETE FROM
jobWHERE jobid IN 
Sep  1 19:57:48 <local0.warn> edge postgres[92864]: [15-3]        (SELECT jobid FROM idliste);
Sep  1 20:07:25 <local0.warn> edge postgres[76452]: [8-1] bacula:bacula[192.168.98.2(53112)] ERROR:  could not read
block80461 in file "pg_tblspc/16391/PG_9.5_201510051/16394/1435214": read only 0 of 8192 bytes 
Sep  1 20:07:25 <local0.warn> edge postgres[76452]: [8-2] bacula:bacula[192.168.98.2(53112)] STATEMENT:  INSERT INTO
File... 

I found the unreadable relation 16394/1435214 being an index to the
"File" table, found that being much too short on disk (less than
80461 blocks), did a "REINDEX" on it, and since then everything seems
fine again (except that I am missing some payload data, because the
"INSERT" commands obviousely did not execute).

Now I am trying to do the forensics, because in my opinion such
behaviour should not appear.

What had happened: at about 3am a daily VACUUM was started (there is no
autovacuum in place). Also some other housekeeping stuff was started
which made the system very slow. Then around 5.30am a monthly VACUUM FULL
was started, and from then on the two VACUUM were busily competing for
locks, until finally after 7pm both got thru:

Sep  1 19:29:10 <local0.info> edge postgres[76238]: [15-1] pgsql:bacula[192.168.97.9(21132)] LOG:  duration:
45103183.199ms  statement: VACUUM (ANALYZE); 
Sep  1 19:30:26 <local0.info> edge postgres[76730]: [20-1] pgsql:bacula[192.168.97.9(25710)] LOG:  duration:
44924828.356ms  statement: VACUUM (FULL, ANALYZE); 

There are no errors reported from either of them, neither any other
errors in the system logs. But something must have gone wrong before
19:01:11, yielding a truncated index. And the only thing that
should rewrite the index is the VACUUM FULL.

Doing the math, I figure that the disk might have been too small:
The bacula database is on a separate tablespace, which was occupied
by 48% when I recognized the issue. Considering that VACUUM FULL
will need double space, and that the index is temporarily written
to pgsql_tmp directory (on the same disk/tablespace), it might likely
have hit the limit.
The log shows the temporary files for the indexes being used, all
with their proper sizes - so it must have been the final create of the
index file that did fail.

The base postgres cluster installation with the default tablespace
(and pg_xlog etc.) cannot have been harmed by a disk-full on that
tablespace.
The reason why there is also a WAL archiving failure at 18:55 should
be that my temporary staging area was full due to a huge amount of
journals created from VACUUM FULL, and bacula at the same time being
unable to move them to permamant storage because it was blocked by
the VACUUM FULL.
But there is very much reserve space in pg_xlog, so since the complaint
at 18:55 is the first of its kind, this cannot have run full. So,
these complaints should be harmless "works as designed" and not
related to the issue.

Finally, I cannot get any clue from the deadlock at 19:57:48. This
might be somehow related to the broken index, but I do not see how
that could be.
-----------------------------------

Comcluding:
There was a broken index (which I would consider a database
corruption, although nothing too bad has arisen from it) and there
are NO messages/errors of any kind giving a clue on how this might
have happened.

There MAY have been a disk-full event, but then again there are NO
messages/errors reporting a disk-full.
The underlying file system is ZFS, and from what I know it gets
annoyingly slow when filling up, but I don't precisely know what it
does when actually getting full. Probably it does not report directly
to syslog, but at least it should return a failed system call (which
then should yield some error reported) - it should not happen that
files are silently truncated.

So,  I might assume somewhere here is hidden some bug. But then,
is the problem in postgres (9.5.7), in the OS (FBSD 11.1). or in ZFS?

Anyway, documentation section 28.2. ("disk-full will not result in
data corruption") seems questionable.

PMc


--
Peter Much * Fichtenstr. 28 * 65527 Niedernhausen
home +49-6127-967111 * road +49-151-21416768


В списке pgsql-admin по дате отправления:

Предыдущее
От: Ranjan Gajare
Дата:
Сообщение: [ADMIN] Re: pg_current_xlog*_location andpg_stat_replication.replay_location > 0 for synced replication connection
Следующее
От: Elvis Flesborg
Дата:
Сообщение: Re: [ADMIN] pgdump and restore results in different sizes DB