Re: Transactions involving multiple postgres foreign servers, take 2

Поиск
Список
Период
Сортировка
От Masahiro Ikeda
Тема Re: Transactions involving multiple postgres foreign servers, take 2
Дата
Msg-id 5b80c9a3-2ce8-1c2b-65a3-e2b82b95331e@oss.nttdata.com
обсуждение исходный текст
Ответ на Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: Transactions involving multiple postgres foreign servers, take 2
Список pgsql-hackers

On 2021/05/21 13:45, Masahiko Sawada wrote:
> On Fri, May 21, 2021 at 12:45 PM Masahiro Ikeda
> <ikedamsh@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2021/05/21 10:39, Masahiko Sawada wrote:
>>> On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>>>>
>>>>
>>>> On 2021/05/11 13:37, Masahiko Sawada wrote:
>>>>> I've attached the updated patches that incorporated comments from
>>>>> Zhihong and Ikeda-san.
>>>>
>>>> Thanks for updating the patches!
>>>>
>>>>
>>>> I have other comments including trivial things.
>>>>
>>>>
>>>> a. about "foreign_transaction_resolver_timeout" parameter
>>>>
>>>> Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs.
>>>> Is there any reason? Although the following is minor case, it may confuse some
>>>> users.
>>>>
>>>> Example case is that
>>>>
>>>> 1. a client executes transaction with 2PC when the resolver is processing
>>>> FdwXactResolverProcessInDoubtXacts().
>>>>
>>>> 2. the resolution of 1st transaction must be waited until the other
>>>> transactions for 2pc are executed or timeout.
>>>>
>>>> 3. if the client check the 1st result value, it should wait until resolution
>>>> is finished for atomic visibility (although it depends on the way how to
>>>> realize atomic visibility.) The clients may be waited
>>>> foreign_transaction_resolver_timeout". Users may think it's stale.
>>>>
>>>> Like this situation can be observed after testing with pgbench. Some
>>>> unresolved transaction remains after benchmarking.
>>>>
>>>> I assume that this default value refers to wal_sender, archiver, and so on.
>>>> But, I think this parameter is more like "commit_delay". If so, 60 seconds
>>>> seems to be big value.
>>>
>>> IIUC this situation seems like the foreign transaction resolution is
>>> bottle-neck and doesn’t catch up to incoming resolution requests. But
>>> how foreignt_transaction_resolver_timeout relates to this situation?
>>> foreign_transaction_resolver_timeout controls when to terminate the
>>> resolver process that doesn't have any foreign transactions to
>>> resolve. So if we set it several milliseconds, resolver processes are
>>> terminated immediately after each resolution, imposing the cost of
>>> launching resolver processes on the next resolution.
>>
>> Thanks for your comments!
>>
>> No, this situation is not related to the foreign transaction resolution is
>> bottle-neck or not. This issue may happen when the workload has very few
>> foreign transactions.
>>
>> If new foreign transaction comes while the transaction resolver is processing
>> resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction
>> waits until starting next transaction resolution. If next foreign transaction
>> doesn't come, the foreign transaction must wait starting resolution until
>> timeout. I mentioned this situation.
> 
> Thanks for your explanation. I think that in this case we should set
> the latch of the resolver after preparing all foreign transactions so
> that the resolver process those transactions without sleep.

Yes, your idea is much better. Thanks!


>>
>> Thanks for letting me know the side effect if setting resolution timeout to
>> several milliseconds. I agree. But, why termination is needed? Is there a
>> possibility to stale like walsender?
> 
> The purpose of this timeout is to terminate resolvers that are idle
> for a long time. The resolver processes don't necessarily need to keep
> running all the time for every database. On the other hand, launching
> a resolver process per commit would be a high cost. So we have
> resolver processes keep running at least for
> foreign_transaction_resolver_timeout.
Understood. I think it's reasonable.


>>>>
>>>>
>>>> b. about performance bottleneck (just share my simple benchmark results)
>>>>
>>>> The resolver process can be performance bottleneck easily although I think
>>>> some users want this feature even if the performance is not so good.
>>>>
>>>> I tested with very simple workload in my laptop.
>>>>
>>>> The test condition is
>>>> * two remote foreign partitions and one transaction inserts an entry in each
>>>> partitions.
>>>> * local connection only. If NW latency became higher, the performance became
>>>> worse.
>>>> * pgbench with 8 clients.
>>>>
>>>> The test results is the following. The performance of 2PC is only 10%
>>>> performance of the one of without 2PC.
>>>>
>>>> * with foreign_twophase_commit = requried
>>>> -> If load with more than 10TPS, the number of unresolved foreign transactions
>>>> is increasing and stop with the warning "Increase
>>>> max_prepared_foreign_transactions".
>>>
>>> What was the value of max_prepared_foreign_transactions?
>>
>> Now, I tested with 200.
>>
>> If each resolution is finished very soon, I thought it's enough because
>> 8clients x 2partitions = 16, though... But, it's difficult how to know the
>> stable values.
> 
> During resolving one distributed transaction, the resolver needs both
> one round trip and fsync-ing WAL record for each foreign transaction.
> Since the client doesn’t wait for the distributed transaction to be
> resolved, the resolver process can be easily bottle-neck given there
> are 8 clients.
> 
> If foreign transaction resolution was resolved synchronously, 16 would suffice.

OK, thanks.


>>
>>
>>> To speed up the foreign transaction resolution, some ideas have been
>>> discussed. As another idea, how about launching resolvers for each
>>> foreign server? That way, we resolve foreign transactions on each
>>> foreign server in parallel. If foreign transactions are concentrated
>>> on the particular server, we can have multiple resolvers for the one
>>> foreign server. It doesn’t change the fact that all foreign
>>> transaction resolutions are processed by resolver processes.
>>
>> Awesome! There seems to be another pros that even if a foreign server is
>> temporarily busy or stopped due to fail over, other foreign server's
>> transactions can be resolved.
> 
> Yes. We also might need to be careful about the order of foreign
> transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreign
server.

I agree it's better.

(Although this is my interest...)
Is it necessary? Although this idea seems to be for atomic visibility,
2PC can't realize that as you know. So, I wondered that.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: [HACKERS] logical decoding of two-phase transactions
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Diagnostic comment in LogicalIncreaseXminForSlot