Обсуждение: pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are

Поиск
Список
Период
Сортировка

pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are

От
Andres Freund
Дата:
Fix decoding of MULTI_INSERTs when rows other than the last are toasted.

When decoding the results of a HEAP2_MULTI_INSERT (currently only
generated by COPY FROM) toast columns for all but the last tuple
weren't replaced by their actual contents before being handed to the
output plugin. The reassembled toast datums where disregarded after
every REORDER_BUFFER_CHANGE_(INSERT|UPDATE|DELETE) which is correct
for plain inserts, updates, deletes, but not multi inserts - there we
generate several REORDER_BUFFER_CHANGE_INSERTs for a single
xl_heap_multi_insert record.

To solve the problem add a clear_toast_afterwards boolean to
ReorderBufferChange's union member that's used by modifications. All
row changes but multi_inserts always set that to true, but
multi_insert sets it only for the last change generated.

Add a regression test covering decoding of multi_inserts - there was
none at all before.

Backpatch to 9.4 where logical decoding was introduced.

Bug found by Petr Jelinek.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/1b86c81d2d255d3fb665ddc77c2bc3dfd751a1df

Modified Files
--------------
contrib/test_decoding/expected/toast.out        |   20 +++++++++++++++++++-
contrib/test_decoding/sql/toast.sql             |   13 +++++++++++++
src/backend/replication/logical/decode.c        |   10 ++++++++++
src/backend/replication/logical/reorderbuffer.c |    9 ++++++++-
src/include/replication/reorderbuffer.h         |    4 ++++
5 files changed, 54 insertions(+), 2 deletions(-)


Re: pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are

От
Andres Freund
Дата:
Hi,

On 2014-07-06 14:01:11 +0000, Andres Freund wrote:
> Fix decoding of MULTI_INSERTs when rows other than the last are toasted.
>
> When decoding the results of a HEAP2_MULTI_INSERT (currently only
> generated by COPY FROM) toast columns for all but the last tuple
> weren't replaced by their actual contents before being handed to the
> output plugin. The reassembled toast datums where disregarded after
> every REORDER_BUFFER_CHANGE_(INSERT|UPDATE|DELETE) which is correct
> for plain inserts, updates, deletes, but not multi inserts - there we
> generate several REORDER_BUFFER_CHANGE_INSERTs for a single
> xl_heap_multi_insert record.
>
> To solve the problem add a clear_toast_afterwards boolean to
> ReorderBufferChange's union member that's used by modifications. All
> row changes but multi_inserts always set that to true, but
> multi_insert sets it only for the last change generated.
>
> Add a regression test covering decoding of multi_inserts - there was
> none at all before.
>
> Backpatch to 9.4 where logical decoding was introduced.
>
> Bug found by Petr Jelinek.

Further testing unfortuantely shows that this isn't sufficient:

    Commit 1b86c81d2d fixed the decoding of toasted columns for the rows
    contained in one xl_heap_multi_insert record. But that's not actually
    enough because heap_multi_insert() will actually first toast all
    passed in rows and then emit several *_multi_insert records; one for
    each page it fills with tuples.

I've attached a preliminary patch that:
    Add a XLOG_HEAP_LAST_MULTI_INSERT flag which is set in
    xl_heap_multi_insert->flag denoting that this multi_insert record is
    the last emitted by one heap_multi_insert() call. Then use that flag
    in decode.c to only set clear_toast_afterwards in the right situation.

But since I am not exactly happy about that solution I'm wondering if
somebody can think of a better approach. Alternatives I though of
included:
* Add a boolean to xl_heap_multi_insert instead of adding the
  XLOG_HEAP_LAST_MULTI_INSERT flag. I don't have an opinion either way
  about that one.
* Only toast rows in heap_multi_insert for every page. That doesn't
  really work without major reworks though as the loop over all the
  tuples is done in a critical section.
* Don't clean up toast chunks for multi insert at all. Instead do so after
  the next !multi insert record. I think that's out because of the
  amount of multi_insert calls COPY can produce.

The patch also adds 200 lines of COPY FROM STDIN in the regression tests
to create a long COPY that does a multi insert with toast covering two
pages. Does anybody object to that?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения