Re: WAL logging problem in 9.4.3?

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: WAL logging problem in 9.4.3?
Дата
Msg-id 559F8759.2090401@iki.fi
обсуждение исходный текст
Ответ на Re: WAL logging problem in 9.4.3?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: WAL logging problem in 9.4.3?  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 07/10/2015 02:06 AM, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2015-07-06 11:49:54 -0400, Tom Lane wrote:
>>> Rather than reverting cab9a0656c36739f, which would re-introduce a
>>> different performance problem, perhaps we could have COPY create a new
>>> relfilenode when it does this.  That should be safe if the table was
>>> previously empty.
>
>> I'm not convinced that cab9a0656c36739f needs to survive in that
>> form. To me only allowing one COPY to benefit from the wal_level =
>> minimal optimization has a significantly higher cost than
>> cab9a0656c36739f.
>
> What evidence have you got to base that value judgement on?
>
> cab9a0656c36739f was based on an actual user complaint, so we have good
> evidence that there are people out there who care about the cost of
> truncating a table many times in one transaction.

Yeah, if we specifically made that case cheap, in response to a 
complaint, it would be a regression to make it expensive again. We might 
get away with it in a major version, but would hate to backpatch that.

>> My tentative guess is that the best course is to
>> a) Make heap_truncate_one_rel() create a new relfeilnode. That fixes the
>>     truncation replay issue.
>> b) Force new pages to be used when using the heap_sync mode in
>>     COPY. That avoids the INIT danger you found. It seems rather
>>     reasonable to avoid using pages that have already been the target of
>>     WAL logging here in general.
>
> And what reason is there to think that this would fix all the problems?
> We know of those two, but we've not exactly looked hard for other cases.

Hmm. Perhaps that could be made to work, but it feels pretty fragile. 
For example, you could have an insert trigger on the table that inserts 
additional rows to the same table, and those inserts would be intermixed 
with the rows inserted by COPY. You'll have to avoid that somehow. 
Full-page images in general are a problem. If a checkpoint happens, and 
a trigger modifies the page we're COPYing to in any way, you have the 
same issue. Even reading a page can cause a full-page image of it to be 
written: If you update a hint bit on the page while reading it, and 
checksums are enabled, and a checkpoint happened since the page was last 
updated, bang. I don't think that's a problem in this case because there 
are no hint bits to be set on pages that we're COPYing to, but it's a 
whole new subtle assumption.

I think we should
1. reliably and explicitly keep track of whether we've WAL-logged any 
TRUNCATE, INSERT/UPDATE+INIT, or any other full-page-logging operations 
on the relation, and
2. make sure we never skip WAL-logging again if we have.

Let's add a flag, rd_skip_wal_safe, to RelationData that's initially set 
when a new relfilenode is created, i.e. whenever rd_createSubid or 
rd_newRelfilenodeSubid is set. Whenever a TRUNCATE or a full-page image 
(including INSERT/UPDATE+INIT) is WAL-logged, clear the flag. In copy.c, 
only skip WAL-logging if the flag is still set. To deal with the case 
that the flag gets cleared in the middle of COPY, also check the flag 
whenever we're about to skip WAL-logging in heap_insert, and if it's 
been cleared, ignore the HEAP_INSERT_SKIP_WAL option and WAL-log anyway.

Compared to the status quo, that disables the WAL-skipping optimization 
in the scenario where you CREATE, INSERT, then COPY to a table in the 
same transaction. I think that's acceptable.

(Alternatively, to handle the case that the flag gets cleared in the 
middle of COPY, add another flag to RelationData indicating that a 
WAL-skipping COPY is in-progress, and refrain from WAL-logging any 
FPW-writing operations on the table when it's set (or any operations 
whatsoever). That'd be more efficient, but it's such a rare corner case 
that it hardly matters.)

- Heikki




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sawada Masahiko
Дата:
Сообщение: Re: Freeze avoidance of very large table.
Следующее
От: Andres Freund
Дата:
Сообщение: Re: WAL logging problem in 9.4.3?