Re: WAL logging problem in 9.4.3?

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: WAL logging problem in 9.4.3?
Дата
Msg-id CAHGQGwHGMeqA7PbRj9e-cgA7-Sy09c+Ysyy=6Ts=vLOynpU2Hg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WAL logging problem in 9.4.3?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: WAL logging problem in 9.4.3?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Tue, Jul 7, 2015 at 12:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2015-07-06 11:14:40 -0400, Tom Lane wrote:
>>> The COUNT() correctly says 11 rows, but after crash-and-recover,
>>> only the row with -1 is there.  This is because the INSERT writes
>>> out an INSERT+INIT WAL record, which we happily replay, clobbering
>>> the data added later by COPY.
>
>> ISTM any WAL logged action that touches a relfilenode essentially needs
>> to disable further optimization based on the knowledge that the relation
>> is new.
>
> After a bit more thought, I think it's not so much "any WAL logged action"
> as "any unconditionally-replayed action".  INSERT+INIT breaks this
> example because heap_xlog_insert will unconditionally replay the action,
> even if the page is valid and has same or newer LSN.  Similarly, TRUNCATE
> is problematic because we redo it unconditionally (and in that case it's
> hard to see an alternative).
>
>> It'd not be impossible to add more state to the relcache entry for the
>> relation. Whether it's likely that we'd find all the places that'd need
>> updating that state, I'm not sure.
>
> Yeah, the sticking point is mainly being sure that the state is correctly
> tracked, both now and after future changes.  We'd need to identify a state
> invariant that we could be pretty confident we'd not break.
>
> One idea I had was to allow the COPY optimization only if the heap file is
> physically zero-length at the time the COPY starts.

This seems not helpful for the case where TRUNCATE is executed
before COPY. No?

> That would still be
> able to optimize in all the cases we care about making COPY fast for.
> Rather than reverting cab9a0656c36739f, which would re-introduce a
> different performance problem, perhaps we could have COPY create a new
> relfilenode when it does this.  That should be safe if the table was
> previously empty.

So, if COPY is executed multiple times at the same transaction,
only first COPY can be optimized?


After second thought, I'm thinking that we can safely optimize
COPY if no problematic WAL records like INSERT+INIT or TRUNCATE
are generated since current REDO location or the table was created
at the same transaction. That is, if INSERT or TRUNCATE is executed
after the table creation, but if CHECKPOINT happens subsequently,
we don't need to log COPY. The subsequent crash recovery will not
replay such problematic WAL records. So the example cases where
we can optimize COPY are:
   BEGIN   CREATE TABLE   COPY   COPY    -- subsequent COPY also can be optimized
   BEGIN   CREATE TABLE   TRUNCATE   CHECKPOINT   COPY
   BEGIN   CREATE TABLE   INSERT   CHECKPOINT   COPY

A crash recovery can start from previous REDO location (i.e., REDO
location of the last checkpoint record). So we might need to check
whether such problematic WAL records are generated since the previous
REDO location instead of current one.

Regards,

-- 
Fujii Masao



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Petr Jelinek
Дата:
Сообщение: Re: creating extension including dependencies
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Solaris testers wanted for strxfrm() behavior