Re: Movement of restart_lsn position movement of logical replication slots is very slow

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Movement of restart_lsn position movement of logical replication slots is very slow
Дата
Msg-id CAA4eK1LY8fLspE3GeU86MDOKa0KDHVmQ=6csiw7KuZW_nOGwCg@mail.gmail.com
обсуждение исходный текст
Ответ на Movement of restart_lsn position movement of logical replication slots is very slow  (Jammie <shailesh.jamloki@gmail.com>)
Ответы Re: Movement of restart_lsn position movement of logical replication slots is very slow  (Jammie <shailesh.jamloki@gmail.com>)
Список pgsql-hackers
On Mon, Dec 14, 2020 at 9:30 AM Jammie <shailesh.jamloki@gmail.com> wrote:
>
> Hello,
>
> We have two logical replication slots in our postgresql database (version-11) instance and we are using pgJDBC to
streamdata from these two slots. 
>

IIUC, you are using some out-of-core outputplugin to stream the data?
Are you using in  walsender mechanism to decode the changes from slots
or via SQL APIs?

> We are ensuring that when we regularly send feedback and update the confirmed_flush_lsn (every 10 minutes) for both
theslots to the same position. However From our data we have seen that the restart_lsn movement of the two are not in
syncand most of the time one of them lags too far behind to hold the WAL files unnecessarily. Here are some data points
toindicate the problem 
>
> Thu Dec 10 05:37:13 CET 2020
>                       slot_name       |  restart_lsn  | confirmed_flush_lsn
> --------------------------------------+---------------+---------------------
>  db_dsn_metadata_src_private          | 48FB/F3000208 | 48FB/F3000208
>  db_dsn_metadata_src_shared           | 48FB/F3000208 | 48FB/F3000208
> (2 rows)
>
>
>
> Thu Dec 10 13:53:46 CET 2020
>                       slot_name      |  restart_lsn  | confirmed_flush_lsn
> -------------------------------------+---------------+---------------------
>  db_dsn_metadata_src_private         | 48FC/2309B150 | 48FC/233AA1D0
>  db_dsn_metadata_src_shared          | 48FC/233AA1D0 | 48FC/233AA1D0
> (2 rows)
>
>
> Thu Dec 10 17:13:51 CET 2020
>                       slot_name      |  restart_lsn  | confirmed_flush_lsn
> -------------------------------------+---------------+---------------------
>  db_dsn_metadata_src_private         | 4900/B4C3AE8  | 4900/94FDF908
>  db_dsn_metadata_src_shared          | 48FD/D2F66F10 | 4900/94FDF908
> (2 rows)
>
> Though we are using setFlushLsn() and forceStatusUpdate for both the slot's stream regularly still the slot with name
privateis far behind the confirmed_flush_lsn and slot with name shared is also behind with confirmed_flush_lsn but not
toofar. Since the restart_lsn is not moving fast enough, causing lot of issues with WAL log file management and not
allowingto delete them to free up disk space 
>

What is this setFlushLsn? I am not able to find in the PG-code. If it
is some outside code reference then please provide the link to code.
In general, the restart_lsn and confirmed_flush_lsn are advanced in
different ways so you might see some difference but it should not be
this much. The confirmed_flush_lsn is updated when we get confirmation
from the downstream node about the flush_lsn but restart_lsn is only
incremented based on the LSN required by the oldest in-progress
transaction.

>
> Please note that for the second slot we are not doing reading from the stream rather just sending the feedback.
>

Here does the second slot refers to 'shared' or 'private'? It is not
very clear what you mean by "we are not doing reading from the
stream', do you mean to say that decoding happens in the slot but the
output plugin just throws away the streamed data and in the end just
send the feedback?

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Parallel Inserts in CREATE TABLE AS
Следующее
От: "Hou, Zhijie"
Дата:
Сообщение: RE: Parallel Inserts in CREATE TABLE AS