Re: WAL logging problem in 9.4.3?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: WAL logging problem in 9.4.3?
Дата
Msg-id 27532.1436195680@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: WAL logging problem in 9.4.3?  (Fujii Masao <masao.fujii@gmail.com>)
Ответы Re: WAL logging problem in 9.4.3?  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Fujii Masao <masao.fujii@gmail.com> writes:
> On Sat, Jul 4, 2015 at 2:26 AM, Andres Freund <andres@anarazel.de> wrote:
>> We actually used to use a different relfilenode, but optimized that
>> away: cab9a0656c36739f59277b34fea8ab9438395869
>> 
>> commit cab9a0656c36739f59277b34fea8ab9438395869
>> Author: Tom Lane <tgl@sss.pgh.pa.us>
>> Date:   Sun Aug 23 19:23:41 2009 +0000
>> 
>> Make TRUNCATE do truncate-in-place when processing a relation that was created
>> or previously truncated in the current (sub)transaction.  This is safe since
>> if the (sub)transaction later rolls back, we'd just discard the rel's current
>> physical file anyway.  This avoids unreasonable growth in the number of
>> transient files when a relation is repeatedly truncated.  Per a performance
>> gripe a couple weeks ago from Todd Cook.
>> 
>> to me the reasoning here looks flawed.

> Before this commit, when I ran the above test scenario, no data loss happened.

Actually I think what is broken here is COPY's test to decide whether it
can omit writing WAL:
    * Check to see if we can avoid writing WAL    *    * If archive logging/streaming is not enabled *and* either    *
 - table was created in same transaction as this COPY    *    - data is being written to relfilenode created in this
transaction   * then we can skip writing WAL.  It's safe because if the transaction    * doesn't commit, we'll discard
thetable (or the new relfilenode file).    * If it does commit, we'll have done the heap_sync at the bottom of this
*routine first.
 

The problem with that analysis is that it supposes that, if we crash and
recover, the WAL replay sequence will not touch the data.  What's killing
us in this example is the replay of the TRUNCATE, but that is not the only
possibility.  For example consider this modification of Fujii-san's test
case:

BEGIN;
CREATE TABLE test (i int primary key);
INSERT INTO test VALUES(-1);
\copy test from /tmp/num.csv with csv
COMMIT;
SELECT COUNT(*) FROM test;

The COUNT() correctly says 11 rows, but after crash-and-recover,
only the row with -1 is there.  This is because the INSERT writes
out an INSERT+INIT WAL record, which we happily replay, clobbering
the data added later by COPY.

We might have to give up on this COPY optimization :-(.  I'm not
sure what would be a safe rule for deciding that we can skip WAL
logging in this situation, but I am pretty sure that it would
require keeping information we don't currently keep about what's
happened earlier in the transaction.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: ERROR: unexpected data beyond EOF
Следующее
От: Andres Freund
Дата:
Сообщение: Re: WAL logging problem in 9.4.3?