Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Дата
Msg-id CAA4eK1LnO=Zf_wb0TbfJzXidxbijOtGoVb2FwJ4DH+Bt1JCbVg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers
On Thu, Oct 3, 2019 at 4:03 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Wed, Oct 02, 2019 at 04:27:30AM +0530, Amit Kapila wrote:
>On Tue, Oct 1, 2019 at 7:21 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
>wrote:
>
>> On Tue, Oct 01, 2019 at 06:55:52PM +0530, Amit Kapila wrote:
>> >
>> >On further testing, I found that the patch seems to have problems with
>> >toast.  Consider below scenario:
>> >Session-1
>> >Create table large_text(t1 text);
>> >INSERT INTO large_text
>> >SELECT (SELECT string_agg('x', ',')
>> >FROM generate_series(1, 1000000)) FROM generate_series(1, 1000);
>> >
>> >Session-2
>> >SELECT * FROM pg_create_logical_replication_slot('regression_slot',
>> >'test_decoding');
>> >SELECT * FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL);
>> >*--kaboom*
>> >
>> >The second statement in Session-2 leads to a crash.
>> >
>>
>> OK, thanks for the report - will investigate.
>>
>
>It was an assertion failure in ReorderBufferCleanupTXN at below line:
>+ /* Check we're not mixing changes from different transactions. */
>+ Assert(change->txn == txn);
>

Can you still reproduce this issue with the patch I sent on 28/9? I have
been unable to trigger the failure, and it seems pretty similar to the
failure you reported (and I fixed) on 28/9.

Yes, it seems we need a similar change in ReorderBufferAddNewTupleCids.  I think in session-2 you need to create replication slot before creating table in session-1 to see this problem.

--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2196,6 +2196,7 @@ ReorderBufferAddNewTupleCids(ReorderBuffer *rb, TransactionId xid,
        change->data.tuplecid.cmax = cmax;
        change->data.tuplecid.combocid = combocid;
        change->lsn = lsn;
+       change->txn = txn;
        change->action = REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID;
        dlist_push_tail(&txn->tuplecids, &change->node);

Few more comments:
-----------------------------------
1.
+static bool
+check_logical_decoding_work_mem(int *newval, void **extra, GucSource source)
+{
+ /*
+ * -1 indicates fallback.
+ *
+ * If we haven't yet changed the boot_val default of -1, just let it be.
+ * logical decoding will look to maintenance_work_mem instead.
+ */
+ if (*newval == -1)
+ return true;
+
+ /*
+ * We clamp manually-set values to at least 64kB. The maintenance_work_mem
+ * uses a higher minimum value (1MB), so this is OK.
+ */
+ if (*newval < 64)
+ *newval = 64;

I think this needs to be changed as now we don't rely on maintenance_work_mem.  Another thing related to this is that I think the default value for logical_decoding_work_mem still seems to be -1.  We need to make it to 64MB.  I have seen this while debugging memory accounting changes.  I think this is the reason why I was not seeing toast related changes being serialized because, in that test, I haven't changed the default value of logical_decoding_work_mem.

2.
+ /*
+ * We're going modify the size of the change, so to make sure the
+ * accounting is correct we'll make it look like we're removing the
+ * change now (with the old size), and then re-add it at the end.
+ */


/going modify/going to modify/

3.
+ *
+ * While updating the existing change with detoasted tuple data, we need to
+ * update the memory accounting info, because the change size will differ.
+ * Otherwise the accounting may get out of sync, triggering serialization
+ * at unexpected times.
+ *
+ * We simply subtract size of the change before rejiggering the tuple, and
+ * then adding the new size. This makes it look like the change was removed
+ * and then added back, except it only tweaks the accounting info.
+ *
+ * In particular it can't trigger serialization, which would be pointless
+ * anyway as it happens during commit processing right before handing
+ * the change to the output plugin.
  */
 static void
 ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
@@ -3023,6 +3281,13 @@ ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
  if (txn->toast_hash == NULL)
  return;
 
+ /*
+ * We're going modify the size of the change, so to make sure the
+ * accounting is correct we'll make it look like we're removing the
+ * change now (with the old size), and then re-add it at the end.
+ */
+ ReorderBufferChangeMemoryUpdate(rb, change, false);

It is not very clear why this change is required.  Basically, this is done at commit time after which actually we shouldn't attempt to spill these changes.  This is mentioned in comments as well, but it is not clear if that is the case, then how and when accounting can create a problem.  If possible, can you explain it with an example?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Sharma
Дата:
Сообщение: Re: pgbench - allow to create partitioned tables
Следующее
От: Alexey Kondratov
Дата:
Сообщение: Re: Two pg_rewind patches (auto generate recovery conf and ensureclean shutdown)