Обсуждение: Re: [COMMITTERS] pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are
Re: [COMMITTERS] pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are
От
Andres Freund
Дата:
Hi, On 2014-07-06 14:01:11 +0000, Andres Freund wrote: > Fix decoding of MULTI_INSERTs when rows other than the last are toasted. > > When decoding the results of a HEAP2_MULTI_INSERT (currently only > generated by COPY FROM) toast columns for all but the last tuple > weren't replaced by their actual contents before being handed to the > output plugin. The reassembled toast datums where disregarded after > every REORDER_BUFFER_CHANGE_(INSERT|UPDATE|DELETE) which is correct > for plain inserts, updates, deletes, but not multi inserts - there we > generate several REORDER_BUFFER_CHANGE_INSERTs for a single > xl_heap_multi_insert record. > > To solve the problem add a clear_toast_afterwards boolean to > ReorderBufferChange's union member that's used by modifications. All > row changes but multi_inserts always set that to true, but > multi_insert sets it only for the last change generated. > > Add a regression test covering decoding of multi_inserts - there was > none at all before. > > Backpatch to 9.4 where logical decoding was introduced. > > Bug found by Petr Jelinek. Further testing unfortuantely shows that this isn't sufficient: Commit 1b86c81d2d fixed the decoding of toasted columns for the rows contained in one xl_heap_multi_insert record. But that's not actually enough because heap_multi_insert() will actually first toast all passed in rows and then emit several *_multi_insert records; one for each page it fills with tuples. I've attached a preliminary patch that: Add a XLOG_HEAP_LAST_MULTI_INSERT flag which is set in xl_heap_multi_insert->flag denoting that this multi_insert record is the last emitted by one heap_multi_insert() call. Then use that flag in decode.c to only set clear_toast_afterwards in the right situation. But since I am not exactly happy about that solution I'm wondering if somebody can think of a better approach. Alternatives I though of included: * Add a boolean to xl_heap_multi_insert instead of adding the XLOG_HEAP_LAST_MULTI_INSERT flag. I don't have an opinion either way about that one. * Only toast rows in heap_multi_insert for every page. That doesn't really work without major reworks though as the loop over all the tuples is done in a critical section. * Don't clean up toast chunks for multi insert at all. Instead do so after the next !multi insert record. I think that's out because of the amount of multi_insert calls COPY can produce. The patch also adds 200 lines of COPY FROM STDIN in the regression tests to create a long COPY that does a multi insert with toast covering two pages. Does anybody object to that? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services