Re: WAL logging problem in 9.4.3?

Поиск
Список
Период
Сортировка
От Martijn van Oosterhout
Тема Re: WAL logging problem in 9.4.3?
Дата
Msg-id 20150703195736.GB2841@svana.org
обсуждение исходный текст
Ответ на Re: WAL logging problem in 9.4.3?  (Andres Freund <andres@anarazel.de>)
Ответы Re: WAL logging problem in 9.4.3?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Fri, Jul 03, 2015 at 07:21:21PM +0200, Andres Freund wrote:
> On 2015-07-03 19:14:26 +0200, Martijn van Oosterhout wrote:
> > Am I missing something. ISTM that if the truncate record was simply not
> > logged at all everything would work fine. The whole point is that the
> > table was created in this transaction and so if it exists the table on
> > disk must be the correct representation.
>
> That'd not work either. Consider:
>
> BEGIN;
> CREATE TABLE ...
> INSERT;
> TRUNCATE;
> INSERT;
> COMMIT;
>
> If you replay that without a truncation wal record the second INSERT
> will try to add stuff to already occupied space. And they can have
> different lengths and stuff, so you cannot just ignore that fact.

I was about to disagree with you by suggesting that if the table was
created in this transaction then WAL logging is skipped. But testing
shows that inserts are indeed logged, as you point out.

With inserts the WAL records look as follows (relfilenodes changed):

martijn@martijn-jessie:~/git/ctm/docker$ sudo /usr/lib/postgresql/9.4/bin/pg_xlogdump -p /tmp/pgtest/postgres/pg_xlog/
000000010000000000000001|grep -wE '16386|16384|16390' 
rmgr: Storage     len (rec/tot):     16/    48, tx:          0, lsn: 0/016A79C8, prev 0/016A79A0, bkp: 0000, desc: file
create:base/12139/16384 
rmgr: Sequence    len (rec/tot):    158/   190, tx:        683, lsn: 0/016B4258, prev 0/016B2508, bkp: 0000, desc: log:
rel1663/12139/16384 
rmgr: Storage     len (rec/tot):     16/    48, tx:        683, lsn: 0/016B4318, prev 0/016B4258, bkp: 0000, desc: file
create:base/12139/16386 
rmgr: Storage     len (rec/tot):     16/    48, tx:        683, lsn: 0/016B9468, prev 0/016B9418, bkp: 0000, desc: file
create:base/12139/16390 
rmgr: Sequence    len (rec/tot):    158/   190, tx:        683, lsn: 0/016BC938, prev 0/016BC880, bkp: 0000, desc: log:
rel1663/12139/16384 
rmgr: Sequence    len (rec/tot):    158/   190, tx:        683, lsn: 0/016BCAF0, prev 0/016BCAA0, bkp: 0000, desc: log:
rel1663/12139/16384 
rmgr: Heap        len (rec/tot):     35/    67, tx:        683, lsn: 0/016BCBB0, prev 0/016BCAF0, bkp: 0000, desc:
insert(init):rel 1663/12139/16386; tid 0/1 
rmgr: Btree       len (rec/tot):     20/    52, tx:        683, lsn: 0/016BCBF8, prev 0/016BCBB0, bkp: 0000, desc:
newroot:rel 1663/12139/16390; root 1 lev 0 
rmgr: Btree       len (rec/tot):     34/    66, tx:        683, lsn: 0/016BCC30, prev 0/016BCBF8, bkp: 0000, desc:
insert:rel 1663/12139/16390; tid 1/1 
rmgr: Storage     len (rec/tot):     16/    48, tx:        683, lsn: 0/016BCC78, prev 0/016BCC30, bkp: 0000, desc: file
truncate:base/12139/16386 to 0 blocks 
rmgr: Storage     len (rec/tot):     16/    48, tx:        683, lsn: 0/016BCCA8, prev 0/016BCC78, bkp: 0000, desc: file
truncate:base/12139/16390 to 0 blocks 
rmgr: Heap        len (rec/tot):     35/    67, tx:        683, lsn: 0/016BCCD8, prev 0/016BCCA8, bkp: 0000, desc:
insert(init):rel 1663/12139/16386; tid 0/1 
rmgr: Btree       len (rec/tot):     20/    52, tx:        683, lsn: 0/016BCD20, prev 0/016BCCD8, bkp: 0000, desc:
newroot:rel 1663/12139/16390; root 1 lev 0 
rmgr: Btree       len (rec/tot):     34/    66, tx:        683, lsn: 0/016BCD58, prev 0/016BCD20, bkp: 0000, desc:
insert:rel 1663/12139/16390; tid 1/1  relname   | relfilenode  
-------------+-------------test        |       16386test_id_seq |       16384test_pkey   |       16390
(3 rows)

And amazingly, the database cluster successfuly recovers and there's no
error now.  So the problem is *only* because there is no data in the
table at commit time.  Which indicates that it's the 'newroot" record
that saves the day normally.  And it's apparently generated by the
first insert.

> Agreed. I think the problem is something else though. Namely that we
> reuse the relfilenode for heap_truncate_one_rel(). That's just entirely
> broken afaics. We need to allocate a new relfilenode and write stuff
> into that. Then we can forgo WAL logging the truncation record.

Would that properly initialise the index though?

Anyway, this is way outside my expertise, so I'll bow out now. Let me
know if I can be of more assistance.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: psql :: support for \ev viewname and \sv viewname
Следующее
От: Tom Lane
Дата:
Сообщение: Re: WAL logging problem in 9.4.3?