Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Дата
Msg-id CAFiTN-tv=Ma=8SN7Nhtkxtx_g14Q7QBU0mXeNJGsgcSObh3eHA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Wed, Jul 8, 2020 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jul 8, 2020 at 9:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sun, Jul 5, 2020 at 8:37 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > I have compared the changes logged at command end vs logged at commit
> > > time.  I have ignored the invalidation for the transaction which has
> > > any aborted subtransaction in it.  While testing this I found one
> > > issue, the issue is that if there are some invalidation generated
> > > between last command counter increment and the commit transaction then
> > > those were not logged.  I have fixed the issue by logging the pending
> > > invalidation in RecordTransactionCommit.  I will include the changes
> > > in the next patch set.
> > >
> >
> > I think it would have been better if you could have given examples for
> > such cases where you need this extra logging.  Anyway, below are few
> > minor comments on this patch:
> >
> > 1.
> > + /*
> > + * Log any pending invalidations which are adding between the last
> > + * command counter increment and the commit.
> > + */
> > + if (XLogLogicalInfoActive())
> > + LogLogicalInvalidations();
> >
> > I think we can change this comment slightly and extend a bit to say
> > for which kind of special cases we are adding this. "Log any pending
> > invalidations which are added between the last CommandCounterIncrement
> > and the commit.  Normally for DDLs, we log this at each command end,
> > however for certain cases where we directly update the system table
> > the invalidations were not logged at command end."
> >
> > Something like above based on cases that are not covered by command
> > end WAL logging.
> >
> > 2.
> > + * Emit WAL for invalidations.  This is currently only used for logging
> > + * invalidations at the command end.
> > + */
> > +void
> > +LogLogicalInvalidations()
> >
> > After this is getting used at a new place, it is better to modify the
> > above comment to something like: "Emit WAL for invalidations.  This is
> > currently only used for logging invalidations at the command end or at
> > commit time if any invalidations are pending."
> >
>
> I have done some more review and below are my comments:
>
> Review-v31-0010-Provide-new-api-to-get-the-streaming-changes
> ----------------------------------------------------------------------------------------------
> 1.
> --- a/src/backend/catalog/system_views.sql
> +++ b/src/backend/catalog/system_views.sql
> @@ -1240,6 +1240,14 @@ LANGUAGE INTERNAL
>  VOLATILE ROWS 1000 COST 1000
>  AS 'pg_logical_slot_get_changes';
>
> +CREATE OR REPLACE FUNCTION pg_logical_slot_get_streaming_changes(
> +    IN slot_name name, IN upto_lsn pg_lsn, IN upto_nchanges int,
> VARIADIC options text[] DEFAULT '{}',
> +    OUT lsn pg_lsn, OUT xid xid, OUT data text)
> +RETURNS SETOF RECORD
> +LANGUAGE INTERNAL
> +VOLATILE ROWS 1000 COST 1000
> +AS 'pg_logical_slot_get_streaming_changes';
>
> If we are going to add a new streaming API for get_changes, don't we
> need for pg_logical_slot_get_binary_changes,
> pg_logical_slot_peek_changes and pg_logical_slot_peek_binary_changes
> as well?  I was thinking why not add a new parameter (streaming
> boolean) instead of adding the new APIs.  This could be an optional
> parameter which if user doesn't specify will be considered as false.
> We already have optional parameters for APIs like
> pg_create_logical_replication_slot.
>
> 2. You forgot to update sgml/func.sgml.  This will be required even if
> we decide to add a new parameter instead of a new API.
>
> 3.
> + /* If called has not asked for streaming changes then disable it. */
> + ctx->streaming &= streaming;
>
> /If called/If the caller
>
> 4.
> diff --git a/.gitignore b/.gitignore
> index 794e35b..6083744 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -42,3 +42,4 @@ lib*.pc
>  /Debug/
>  /Release/
>  /tmp_install/
> +/build/
>
> Why the patch contains this change?
>
> 5. If I apply the first six patches and run the regressions, it fails
> primarily because streaming got enabled by default.  And then when I
> applied this patch, the tests passed because it disables streaming by
> default.  I think this should be patch 0007.

Only replying to the replication origin point, other comment looks
fine to me so I will work on those.

> Replication Origins
> ------------------------------
> I think we also need to conclude on origins related discussion [1].
> As far as I can see, the origin_id can be sent with the first startup
> message. The origin_lsn and origin_commit can be sent with the last
> start of streaming commit if we want but not sure if that is of use.
> If we need to send it earlier then we need to record it with other WAL
> records.  The point is that those are set with
> pg_replication_origin_xact_setup but not sure how and when that
> function is called.

pg_replication_origin_xact_setup is exposed function so this will
allow a user to set an origin for their session so that all the
operation done from that session will be marked by that origin id.
And the clear use case for this is to avoid sending such transactions
by suing FilterByOrigin.   But I am not sure about the point that we
discussed at [1] that what is the use of the origin and origin_lsn we
send at pgoutput_begin_txn.

 The other alternative is that we can ignore that
> for now and once the usage is clear we can enhance it.  What do you
> think?

That seems like a sensible option to me.

> [1] - https://www.postgresql.org/message-id/CAA4eK1JwXaCezFw%2BkZwoxbLKYD0nWpC2rPgx7vUsaDAc0AZaow%40mail.gmail.com


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Binary support for pgoutput plugin
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Log the location field before any backtrace