Re: WAL logging problem in 9.4.3?
От | Andres Freund |
---|---|
Тема | Re: WAL logging problem in 9.4.3? |
Дата | |
Msg-id | 20150710091420.GK340@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: WAL logging problem in 9.4.3? (Heikki Linnakangas <hlinnaka@iki.fi>) |
Ответы |
Re: WAL logging problem in 9.4.3?
|
Список | pgsql-hackers |
On 2015-07-10 11:50:33 +0300, Heikki Linnakangas wrote: > On 07/10/2015 02:06 AM, Tom Lane wrote: > >Andres Freund <andres@anarazel.de> writes: > >>On 2015-07-06 11:49:54 -0400, Tom Lane wrote: > >>>Rather than reverting cab9a0656c36739f, which would re-introduce a > >>>different performance problem, perhaps we could have COPY create a new > >>>relfilenode when it does this. That should be safe if the table was > >>>previously empty. > > > >>I'm not convinced that cab9a0656c36739f needs to survive in that > >>form. To me only allowing one COPY to benefit from the wal_level = > >>minimal optimization has a significantly higher cost than > >>cab9a0656c36739f. > > > >What evidence have you got to base that value judgement on? > > > >cab9a0656c36739f was based on an actual user complaint, so we have good > >evidence that there are people out there who care about the cost of > >truncating a table many times in one transaction. > > Yeah, if we specifically made that case cheap, in response to a complaint, > it would be a regression to make it expensive again. We might get away with > it in a major version, but would hate to backpatch that. Sure. But making COPY slower would also be one. Of a longer standing behaviour, with massively bigger impact if somebody relies on it? I mean a new relfilenode includes a couple heap and storage options. Missing the skip wal optimization can easily double or triple COPY durations. I generally find it to be very dubious to re-use a relfilenode after a truncation. I bet most hackers didn't ever know we ever did that, and the rest probably forgot it. We can still retain a portion of the optimizations from cab9a0656c36739f - there's no need to keep the old relfilenode's contents around after all. > >>My tentative guess is that the best course is to > >>a) Make heap_truncate_one_rel() create a new relfeilnode. That fixes the > >> truncation replay issue. > >>b) Force new pages to be used when using the heap_sync mode in > >> COPY. That avoids the INIT danger you found. It seems rather > >> reasonable to avoid using pages that have already been the target of > >> WAL logging here in general. > > > >And what reason is there to think that this would fix all the problems? > >We know of those two, but we've not exactly looked hard for other cases. > > Hmm. Perhaps that could be made to work, but it feels pretty fragile. It does. I'm not very happy about this mess. > For > example, you could have an insert trigger on the table that inserts > additional rows to the same table, and those inserts would be intermixed > with the rows inserted by COPY. That should be fine? As long as copy only uses new pages INSERT can use the same ones without problem. I think... > Full-page images in general are a problem. With the above rules I don't think it'd be. They'd contain the previous contents, and we'll not target them again with COPY. > I think we should > 1. reliably and explicitly keep track of whether we've WAL-logged any > TRUNCATE, INSERT/UPDATE+INIT, or any other full-page-logging operations on > the relation, and > 2. make sure we never skip WAL-logging again if we have. > > Let's add a flag, rd_skip_wal_safe, to RelationData that's initially set > when a new relfilenode is created, i.e. whenever rd_createSubid or > rd_newRelfilenodeSubid is set. Whenever a TRUNCATE or a full-page image > (including INSERT/UPDATE+INIT) is WAL-logged, clear the flag. In copy.c, > only skip WAL-logging if the flag is still set. To deal with the case that > the flag gets cleared in the middle of COPY, also check the flag whenever > we're about to skip WAL-logging in heap_insert, and if it's been cleared, > ignore the HEAP_INSERT_SKIP_WAL option and WAL-log anyway. Am I missing something or will this break the BEGIN; TRUNCATE; COPY; pattern we use ourselves and have suggested a number of times ? Andres
В списке pgsql-hackers по дате отправления:
Следующее
От: Etsuro FujitaДата:
Сообщение: Minor code improvements to create_foreignscan_plan/ExecInitForeignScan