Re: logical decoding and replication of sequences, take 2

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: logical decoding and replication of sequences, take 2
Дата
Msg-id c6a08e9d-7b08-76d4-197d-fab54039ed91@enterprisedb.com
обсуждение исходный текст
Ответ на RE: logical decoding and replication of sequences, take 2  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Ответы RE: logical decoding and replication of sequences, take 2  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Список pgsql-hackers

On 12/1/23 12:08, Hayato Kuroda (Fujitsu) wrote:
> Dear Tomas,
> 
>> I did some micro-benchmarking today, trying to identify cases where this
>> would cause unexpected problems, either due to having to maintain all
>> the relfilenodes, or due to having to do hash lookups for every sequence
>> change. But I think it's fine, mostly ...
>>
> 
> I did also performance tests (especially case 3). First of all, there are some
> variants from yours.
> 
> 1. patch 0002 was reverted because it has an issue. So this test checks whether
>    refactoring around ReorderBufferSequenceIsTransactional seems really needed.

FWIW I also did the benchmarks without the 0002 patch, for the same
reason. I forgot to mention that.

> 2. per comments from Amit, I also measured the abort case. In this case, the
>    alter_sequence() is called but the transaction is aborted.
> 3. I measured with changing number of clients {8, 16, 32, 64, 128}. In any cases,
>    clients executed 1000 transactions. The performance machine has 128 core so that
>    result for 128 clients might be saturated.
> 4. a short sleep (0.1s) was added in alter_sequence(), especially between
>    "alter sequence" and nextval(). Because while testing, I found that the
>    transaction is too short to execute in parallel. I think it is reasonable
>    because ReorderBufferSequenceIsTransactional() might be worse when the parallelism
>    is increased.
> 
> I attached one backend process via perf and executed pg_slot_logical_get_changes().
> Attached txt file shows which function occupied CPU time, especially from
> pg_logical_slot_get_changes_guts() and ReorderBufferSequenceIsTransactional().
> Here are my observations about them.
> 
> * In case of commit, as you said, SnapBuildCommitTxn() seems dominant for 8-64
>   clients case.
> * For (commit, 128 clients) case, however, ReorderBufferRestoreChanges() waste
>   many times. I think this is because changes exceed logical_decoding_work_mem,
>   so we do not have to analyze anymore.
> * In case of abort, CPU time used by ReorderBufferSequenceIsTransactional() is linearly
>   longer. This means that we need to think some solution to avoid the overhead by
>   ReorderBufferSequenceIsTransactional().
> 
> ```
> 8 clients  3.73% occupied time
> 16 7.26%
> 32 15.82%
> 64 29.14%
> 128 46.27%
> ```

Interesting, so what exactly does the transaction do? Anyway, I don't
think this is very surprising - I believe it behaves like this because
of having to search in many hash tables (one in each toplevel xact). And
I think the solution I explained before (maintaining a single toplevel
hash, instead of many per-top-level hashes).

FWIW I find this case interesting, but not very practical, because no
practical workload has that many aborts.

> 
> * In case of abort, I also checked CPU time used by ReorderBufferAddRelFileLocator(), but
>   it seems not so depends on the number of clients.
> 
> ```
> 8 clients 3.66% occupied time
> 16 6.94%
> 32 4.65%
> 64 5.39%
> 128 3.06%
> ```
> 
> As next step, I've planned to run the case which uses setval() function, because it
> generates more WALs than normal nextval();
> How do you think?
> 

Sure, although I don't think it's much different from the test selecting
40 values from the sequence (in each transaction). That generates about
the same amount of WAL.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: logical decoding and replication of sequences, take 2
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] plpython function causes server panic