WAL logging problem in 9.4.3?

Поиск
Список
Период
Сортировка
От Martijn van Oosterhout
Тема WAL logging problem in 9.4.3?
Дата
Msg-id 20150702220524.GA9392@svana.org
обсуждение исходный текст
Ответы Re: WAL logging problem in 9.4.3?  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hoi,

I ran into this in our CI setup and I thinks it's an actual bug. The
issue appears to be that when a table is created *and truncated* in a
single transaction, that the WAL log is logging a truncate record it
shouldn't, such that if the database crashes you have a broken index.
It would also lose any data that was in the table at commit time.

I didn't test 9.4.4 yet, though I don't see anything in the release
notes that resemble this.

Detail:

=== Start with an empty database

martijn@martijn-jessie:$ psql ctmp -h localhost -U username
Password for user username:
psql (9.4.3)
Type "help" for help.

ctmp=# begin;
BEGIN
ctmp=# create table test(id serial primary key);
CREATE TABLE
ctmp=# truncate table test;
TRUNCATE TABLE
ctmp=# commit;
COMMIT
ctmp=# select relname, relfilenode from pg_class where relname like 'test%';  relname   | relfilenode
-------------+-------------test        |       16389test_id_seq |       16387test_pkey   |       16393
(3 rows)

=== Note: if you do a CHECKPOINT here the issue doesn't happen
=== obviously.

ctmp=# \q
martijn@martijn-jessie:$ sudo ls -l /data/postgres/base/16385/{16389,16387,16393}
[sudo] password for martijn:
-rw------- 1 messagebus ssl-cert 8192 Jul  2 23:34 /data/postgres/base/16385/16387
-rw------- 1 messagebus ssl-cert    0 Jul  2 23:34 /data/postgres/base/16385/16389
-rw------- 1 messagebus ssl-cert 8192 Jul  2 23:34 /data/postgres/base/16385/16393

=== Note the index file is 8KB.
=== At this point nuke the database server (in this case it was simply
=== destroying the container it was running in.

=== Dump the xlogs just to show what got recorded. Note there's a
=== truncate for the data file and the index file.

martijn@martijn-jessie:$ sudo /usr/lib/postgresql/9.4/bin/pg_xlogdump -p /data/postgres/pg_xlog/
000000010000000000000001|grep -wE '16389|16387|16393' 
rmgr: XLOG        len (rec/tot):     72/   104, tx:          0, lsn: 0/016A9240, prev 0/016A9200, bkp: 0000, desc:
checkpoint:redo 0/16A9240; tli 1; prev tli 1; fpw true; xid 0/686; oid 16387; multi 1; offset 0; oldest xid 673 in DB
1;oldest multi 1 in DB 1; oldest running xid 0; shutdown 
rmgr: Storage     len (rec/tot):     16/    48, tx:          0, lsn: 0/016A92D0, prev 0/016A92A8, bkp: 0000, desc: file
create:base/16385/16387 
rmgr: Sequence    len (rec/tot):    158/   190, tx:        686, lsn: 0/016B5E50, prev 0/016B5D88, bkp: 0000, desc: log:
rel1663/16385/16387 
rmgr: Storage     len (rec/tot):     16/    48, tx:        686, lsn: 0/016B5F10, prev 0/016B5E50, bkp: 0000, desc: file
create:base/16385/16389 
rmgr: Storage     len (rec/tot):     16/    48, tx:        686, lsn: 0/016BB028, prev 0/016BAFD8, bkp: 0000, desc: file
create:base/16385/16393 
rmgr: Sequence    len (rec/tot):    158/   190, tx:        686, lsn: 0/016BE4F8, prev 0/016BE440, bkp: 0000, desc: log:
rel1663/16385/16387 
rmgr: Storage     len (rec/tot):     16/    48, tx:        686, lsn: 0/016BE6B0, prev 0/016BE660, bkp: 0000, desc: file
truncate:base/16385/16389 to 0 blocks 
rmgr: Storage     len (rec/tot):     16/    48, tx:        686, lsn: 0/016BE6E0, prev 0/016BE6B0, bkp: 0000, desc: file
truncate:base/16385/16393 to 0 blocks 
pg_xlogdump: FATAL:  error in WAL record at 0/16BE710: record with zero length at 0/16BE740

=== Start the DB up again

database_1 | LOG:  database system was interrupted; last known up at 2015-07-02 21:08:05 UTC
database_1 | LOG:  database system was not properly shut down; automatic recovery in progress
database_1 | LOG:  redo starts at 0/16A92A8
database_1 | LOG:  record with zero length at 0/16BE740
database_1 | LOG:  redo done at 0/16BE710
database_1 | LOG:  last completed transaction was at log time 2015-07-02 21:34:45.664989+00
database_1 | LOG:  database system is ready to accept connections
database_1 | LOG:  autovacuum launcher started

=== Oops, the index file is empty now

martijn@martijn-jessie:$ sudo ls -l /data/postgres/base/16385/{16389,16387,16393}
-rw------- 1 messagebus ssl-cert 8192 Jul  2 23:37 /data/postgres/base/16385/16387
-rw------- 1 messagebus ssl-cert    0 Jul  2 23:34 /data/postgres/base/16385/16389
-rw------- 1 messagebus ssl-cert    0 Jul  2 23:37 /data/postgres/base/16385/16393

martijn@martijn-jessie:$ psql ctmp -h localhost -U username
Password for user username:
psql (9.4.3)
Type "help" for help.

=== And now the index is broken. I think the only reason it doesn't
=== complain about the data file is because zero bytes there is OK.  But if
=== the table had data before it would be gone now.

ctmp=# select * from test;
ERROR:  could not read block 0 in file "base/16385/16393": read only 0 of 8192 bytes

ctmp=# select version();                                           version
-----------------------------------------------------------------------------------------------PostgreSQL 9.4.3 on
x86_64-unknown-linux-gnu,compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit 
(1 row)

Hope this helps.
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Refactoring speculative insertion with unique indexes a little
Следующее
От: Qingqing Zhou
Дата:
Сообщение: Re: Memory Accounting v11