Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Дата
Msg-id CAA4eK1Lb3sY8TEfQrtZ8ceeHy3=Z-H=dsYcbjWnYonD=e8EvHA@mail.gmail.com
обсуждение исходный текст
Ответ на Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables  (Keisuke Kuroda <keisuke.kuroda.3862@gmail.com>)
Ответы Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Список pgsql-hackers
On Wed, Sep 23, 2020 at 1:09 PM Keisuke Kuroda
<keisuke.kuroda.3862@gmail.com> wrote:
>
> Hi hackers,
>
> I found a problem in logical replication.
> It seems to have the same cause as the following problem.
>
>   Creating many tables gets logical replication stuck
>   https://www.postgresql.org/message-id/flat/20f3de7675f83176253f607b5e199b228406c21c.camel%40cybertec.at
>
>   Logical decoding CPU-bound w/ large number of tables
>
https://www.postgresql.org/message-id/flat/CAHoiPjzea6N0zuCi%3D%2Bf9v_j94nfsy6y8SU7-%3Dbp4%3D7qw6_i%3DRg%40mail.gmail.com
>
> # problem
>
> * logical replication enabled
> * walsender process has RelfilenodeMap cache(2000 relations in this case)
> * TRUNCATE or DROP or CREATE many tables in same transaction
>
> At this time, walsender process continues to use 100% of the CPU 1core.
>
...
...
>
> ./src/backend/replication/logical/reorderbuffer.c
> 1746         case REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
> 1747           Assert(change->data.command_id != InvalidCommandId);
> 1748
> 1749           if (command_id < change->data.command_id)
> 1750           {
> 1751             command_id = change->data.command_id;
> 1752
> 1753             if (!snapshot_now->copied)
> 1754             {
> 1755               /* we don't use the global one anymore */
> 1756               snapshot_now = ReorderBufferCopySnap(rb, snapshot_now,
> 1757                                  txn, command_id);
> 1758             }
> 1759
> 1760             snapshot_now->curcid = command_id;
> 1761
> 1762             TeardownHistoricSnapshot(false);
> 1763             SetupHistoricSnapshot(snapshot_now, txn->tuplecid_hash);
> 1764
> 1765             /*
> 1766              * Every time the CommandId is incremented, we could
> 1767              * see new catalog contents, so execute all
> 1768              * invalidations.
> 1769              */
> 1770             ReorderBufferExecuteInvalidations(rb, txn);
> 1771           }
> 1772
> 1773           break;
>
> Do you have any solutions?
>

Yeah, I have an idea on how to solve this problem. This problem is
primarily due to the reason that we use to receive invalidations only
at commit time and then we need to execute them after each command id
change. However, after commit c55040ccd0 (When wal_level=logical,
write invalidations at command end into WAL so that decoding can use
this information.) we actually know exactly when we need to execute
each invalidation. The idea is that instead of collecting
invalidations only in ReorderBufferTxn, we need to collect them in
form of ReorderBufferChange as well similar to what we do for other
changes (for ex. REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID). In this
case, we need to collect additionally in ReorderBufferTxn because if
the transaction is aborted or some exception occurred while executing
the changes we need to perform all the invalidations.

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: Asynchronous Append on postgres_fdw nodes.
Следующее
От: Ranier Vilela
Дата:
Сообщение: Avoid suspects casts VARHDRSZ (c.h)