Logical replication keepalive flood

Поиск
Список
Период
Сортировка
От Abbas Butt
Тема Logical replication keepalive flood
Дата
Msg-id CALtH27cip5uQNJb4uHjLXtx1R52ELqXVfcP9fhHr=AvFo1dtqw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Logical replication keepalive flood  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
Hi,
I have observed the following behavior with PostgreSQL 13.3.

The WAL sender process sends approximately 500 keepalive messages per second to pg_recvlogical.
These keepalive messages are totally un-necessary.
Keepalives should be sent only if there is no network traffic and a certain time (half of wal_sender_timeout) passes.
These keepalive messages not only choke the network but also impact the performance of the receiver,
because the receiver has to process the received message and then decide whether to reply to it or not.
The receiver remains busy doing this activity 500 times a second.

On investigation it is revealed that the following code fragment in function WalSndWaitForWal in file walsender.c is responsible for sending these frequent keepalives:

if (MyWalSnd->flush < sentPtr &&
    MyWalSnd->write < sentPtr &&
    !waiting_for_ping_response)
        WalSndKeepalive(false);

waiting_for_ping_response is normally false, and flush and write will always be less than sentPtr (Receiver's LSNs cannot advance server's LSNs)

Here are the steps to reproduce:
1. Start the database server.
2. Setup pgbench tables.
  ./pgbench -i -s 50 -h 192.168.5.140 -p 7654 -U abbas postgres
3. Create a logical replication slot.
   SELECT * FROM pg_create_logical_replication_slot('my_slot', 'test_decoding');
4. Start pg_recvlogical.
  ./pg_recvlogical --slot=my_slot --verbose -d postgres -h 192.168.5.140 -p 7654 -U abbas --start -f -
5. Run pgbench
  ./pgbench -U abbas -h 192.168.5.140 -p 7654  -c 2 -j 2 -T 1200 -n postgres
6. Observer network traffic to find the keepalive flood.

Alternately modify the above code fragment to see approx 500 keepalive log messages a second

if (MyWalSnd->flush < sentPtr &&
    MyWalSnd->write < sentPtr &&
    !waiting_for_ping_response)
{
    elog(LOG, "[Keepalive]  wrt ptr %X/%X  snt ptr %X/%X ",
               (uint32) (MyWalSnd->write >> 32),
               (uint32) MyWalSnd->write,
               (uint32) (sentPtr >> 32),
               (uint32) sentPtr);
    WalSndKeepalive(false);
}

Opinions?

--
--
Abbas
Senior Architect


Skype ID: gabbasb

Follow us on Twitter
@EnterpriseDB

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: Are we missing (void) when return value of fsm_set_and_search is ignored?
Следующее
От: David Christensen
Дата:
Сообщение: Re: DELETE CASCADE