Re: [COMMITTERS] pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [COMMITTERS] pgsql: Fix decoding of MULTI_INSERTs when rows other than the last are
Дата
Msg-id 20140706165052.GA28604@awork2.anarazel.de
обсуждение исходный текст
Список pgsql-hackers
Hi,

On 2014-07-06 14:01:11 +0000, Andres Freund wrote:
> Fix decoding of MULTI_INSERTs when rows other than the last are toasted.
>
> When decoding the results of a HEAP2_MULTI_INSERT (currently only
> generated by COPY FROM) toast columns for all but the last tuple
> weren't replaced by their actual contents before being handed to the
> output plugin. The reassembled toast datums where disregarded after
> every REORDER_BUFFER_CHANGE_(INSERT|UPDATE|DELETE) which is correct
> for plain inserts, updates, deletes, but not multi inserts - there we
> generate several REORDER_BUFFER_CHANGE_INSERTs for a single
> xl_heap_multi_insert record.
>
> To solve the problem add a clear_toast_afterwards boolean to
> ReorderBufferChange's union member that's used by modifications. All
> row changes but multi_inserts always set that to true, but
> multi_insert sets it only for the last change generated.
>
> Add a regression test covering decoding of multi_inserts - there was
> none at all before.
>
> Backpatch to 9.4 where logical decoding was introduced.
>
> Bug found by Petr Jelinek.

Further testing unfortuantely shows that this isn't sufficient:

    Commit 1b86c81d2d fixed the decoding of toasted columns for the rows
    contained in one xl_heap_multi_insert record. But that's not actually
    enough because heap_multi_insert() will actually first toast all
    passed in rows and then emit several *_multi_insert records; one for
    each page it fills with tuples.

I've attached a preliminary patch that:
    Add a XLOG_HEAP_LAST_MULTI_INSERT flag which is set in
    xl_heap_multi_insert->flag denoting that this multi_insert record is
    the last emitted by one heap_multi_insert() call. Then use that flag
    in decode.c to only set clear_toast_afterwards in the right situation.

But since I am not exactly happy about that solution I'm wondering if
somebody can think of a better approach. Alternatives I though of
included:
* Add a boolean to xl_heap_multi_insert instead of adding the
  XLOG_HEAP_LAST_MULTI_INSERT flag. I don't have an opinion either way
  about that one.
* Only toast rows in heap_multi_insert for every page. That doesn't
  really work without major reworks though as the loop over all the
  tuples is done in a critical section.
* Don't clean up toast chunks for multi insert at all. Instead do so after
  the next !multi insert record. I think that's out because of the
  amount of multi_insert calls COPY can produce.

The patch also adds 200 lines of COPY FROM STDIN in the regression tests
to create a long COPY that does a multi insert with toast covering two
pages. Does anybody object to that?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: tweaking NTUP_PER_BUCKET
Следующее
От: Sawada Masahiko
Дата:
Сообщение: Re: add line number as prompt option to psql