Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions

Поиск

Список

Период

Сортировка

От	Dilip Kumar
Тема	Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Дата	21 октября 2019 г. 08:18:33
Msg-id	CAFiTN-szFoN-wuZ8SXV4Xed+ekyvBM9WZDm9ocp=iauDf6CWJQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Список	pgsql-hackers

Дерево обсуждения

On Fri, Oct 18, 2019 at 5:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

I have replied to some of your questions inline.  I will work on the
remaining comments and post the patch for the same.

> > >
> > > Sure, I wasn't really proposing to adding all stats from that patch,
> > > including those related to streaming.  We need to extract just those
> > > related to spilling. And yes, it needs to be moved right after 0001.
> > >
> > I have extracted the spilling related code to a separate patch on top
> > of 0001.  I have also fixed some bugs and review comments and attached
> > as a separate patch.  Later I can merge it to the main patch if you
> > agree with the changes.
> >
>
> Few comments
> -------------------------
> 0001-Add-logical_decoding_work_mem-to-limit-ReorderBuffer
> 1.
> + {
> + {"logical_decoding_work_mem", PGC_USERSET, RESOURCES_MEM,
> + gettext_noop("Sets the maximum memory to be used for logical decoding."),
> + gettext_noop("This much memory can be used by each internal "
> + "reorder buffer before spilling to disk or streaming."),
> + GUC_UNIT_KB
> + },
>
> I think we can remove 'or streaming' from above sentence for now.  We
> can add it later with later patch where streaming will be allowed.
>
> 2.
> @@ -206,6 +206,18 @@ CREATE SUBSCRIPTION <replaceable
> class="parameter">subscription_name</replaceabl
>           </para>
>          </listitem>
>         </varlistentry>
> +
> +       <varlistentry>
> +        <term><literal>work_mem</literal> (<type>integer</type>)</term>
> +        <listitem>
> +         <para>
> +          Limits the amount of memory used to decode changes on the
> +          publisher.  If not specified, the publisher will use the default
> +          specified by <varname>logical_decoding_work_mem</varname>. When
> +          needed, additional data are spilled to disk.
> +         </para>
> +        </listitem>
> +       </varlistentry>
>
> It is not clear why we need this parameter at least with this patch?
> I have raised this multiple times [1][2].
>
> bugs_and_review_comments_fix
> 1.
> },
>   &logical_decoding_work_mem,
> - -1, -1, MAX_KILOBYTES,
> - check_logical_decoding_work_mem, NULL, NULL
> + 65536, 64, MAX_KILOBYTES,
> + NULL, NULL, NULL
>
> I think the default value should be 1MB similar to
> maintenance_work_mem.  The same was true before this change.
>
> 2. -#logical_decoding_work_mem = 64MB # min 1MB, or -1 to use
> maintenance_work_mem
> +i#logical_decoding_work_mem = 64MB # min 64kB
>
> It seems the 'i' is a leftover character in the above change.  Also,
> change the default value considering the previous point.
>
> 3.
> @@ -2479,7 +2480,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb,
> ReorderBufferTXN *txn)
>
>   /* update the statistics */
>   rb->spillCount += 1;
> - rb->spillTxns += txn->serialized ? 1 : 0;
> + rb->spillTxns += txn->serialized ? 0 : 1;
>   rb->spillBytes += size;
>
> Why is this change required?  Shouldn't we increase the spillTxns
> count only when the txn is serialized?

Prior to this change it was increasing the rb->spillTxns, every time
we try to serialize the changes of the transaction.  Now, only we
increase first time when it is not yet serialized.

> 0002-Track-statistics-for-spilling
> 1.
> +    <row>
> +     <entry><structfield>spill_txns</structfield></entry>
> +     <entry><type>integer</type></entry>
> +     <entry>Number of transactions spilled to disk after the memory used by
> +      logical decoding exceeds <literal>logical_work_mem</literal>. The
> +      counter gets incremented both for toplevel transactions and
> +      subtransactions.
> +      </entry>
> +    </row>
>
> The parameter name is wrong here. /logical_work_mem/logical_decoding_work_mem
>
> 2.
> +    <row>
> +     <entry><structfield>spill_txns</structfield></entry>
> +     <entry><type>integer</type></entry>
> +     <entry>Number of transactions spilled to disk after the memory used by
> +      logical decoding exceeds <literal>logical_work_mem</literal>. The
> +      counter gets incremented both for toplevel transactions and
> +      subtransactions.
> +      </entry>
> +    </row>
> +    <row>
> +     <entry><structfield>spill_count</structfield></entry>
> +     <entry><type>integer</type></entry>
> +     <entry>Number of times transactions were spilled to disk. Transactions
> +      may get spilled repeatedly, and this counter gets incremented on every
> +      such invocation.
> +      </entry>
> +    </row>
> +    <row>
> +     <entry><structfield>spill_bytes</structfield></entry>
> +     <entry><type>integer</type></entry>
> +     <entry>Amount of decoded transaction data spilled to disk.
> +      </entry>
> +    </row>
>
> In all the above cases, the explanation text starts immediately after
> <entry> tag, but the general coding practice is to start from the next
> line, see the explanation of nearby parameters.
>
> It seems these parameters are added in pg-stat-wal-receiver-view in
> the docs, but in code, it is present as part of pg_stat_replication.
> It seems doc needs to be updated.  Am, I missing something?
>
> 3.
> ReorderBufferSerializeTXN()
> {
> ..
> /* update the statistics */
> rb->spillCount += 1;
> rb->spillTxns += txn->serialized ? 0 : 1;
> rb->spillBytes += size;
>
> Assert(spilled == txn->nentries_mem);
> Assert(dlist_is_empty(&txn->changes));
> txn->nentries_mem = 0;
> txn->serialized = true;
> ..
> }
>
> I am not able to understand the above code.  We are setting the
> serialized parameter a few lines after we check it and increment the
> spillTxns count. Can you please explain it?

Basically, when the first time we attempt to serialize a transaction,
txn->serialized will be false, that time we will increment the
rb->spillTxns and after that set txn->serialized to true.  From next
time onwards if we try to serialize the same transaction we will not
increment the rb->spillTxns so that we count each transaction only
once.

>
> Also, isn't spillTxns count bit confusing, because in some cases it
> will include subtransactions and other cases (where the largest picked
> transaction is a subtransaction) it won't include it?

I did not understand your comment completely.  Basically,  every
transaction which we are serializing we will increase the count first
time right? whether it is the main transaction or the sub-transaction.
Am I missing something?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Amit Kapila
Дата: 21 октября 2019 г., 08:11:20
Сообщение: Re: dropdb --force

Следующее

От: Michael Paquier
Дата: 21 октября 2019 г., 08:34:01
Сообщение: Re: Remove obsolete information schema tables

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions

Предыдущее

Следующее