Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
От | Masahiko Sawada |
---|---|
Тема | Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2 |
Дата | |
Msg-id | CAD21AoDn98axH1bEoMnte+S7WWR=nsmOpjz1WGH-NvJi4aLu3Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2 (Masahiko Sawada <sawada.mshk@gmail.com>) |
Ответы |
Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2
(Thomas Munro <thomas.munro@gmail.com>)
|
Список | pgsql-hackers |
On Thu, Jan 31, 2019 at 7:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Jan 29, 2019 at 5:47 PM Ildar Musin <ildar@adjust.com> wrote: > > > > Hello, > > > > The patch needs rebase as it doesn't apply to the current master. I applied it > > to the older commit to test it. It worked fine so far. > > Thank you for testing the patch! > > > > > I found one bug though which would cause resolver to finish by timeout even > > though there are unresolved foreign transactions in the list. The > > `fdw_xact_exists()` function expects database id as the first argument and xid > > as the second. But everywhere it is called arguments specified in the different > > order (xid first, then dbid). Also function declaration in header doesn't > > match its definition. > > Will fix. > > > > > There are some other things I found. > > * In `FdwXactResolveAllDanglingTransactions()` variable `n_resolved` is > > declared as bool but used as integer. > > * In fdwxact.c's module comment there are `FdwXactRegisterForeignTransaction()` > > and `FdwXactMarkForeignTransactionModified()` functions mentioned that are > > not there anymore. > > * In documentation (storage.sgml) there is no mention of `pg_fdw_xact` > > directory. > > > > Couple of stylistic notes. > > * In `FdwXactCtlData struct` there are both camel case and snake case naming > > used. > > * In `get_fdw_xacts()` `xid != InvalidTransactionId` can be replaced with > > `TransactionIdIsValid(xid)`. > > * In `generate_fdw_xact_identifier()` the `fx` prefix could be a part of format > > string instead of being processed by `sprintf` as an extra argument. > > > > I'll incorporate them at the next patch set. > > > I'll continue looking into the patch. Thanks! > > Thanks. Actually I'm updating the patch set, changing API interface as > I proposed before and improving the document and README. I'll submit > the latest patch next week. > Sorry for the very late. Attached updated version patches. The basic mechanism has not been changed since the previous version. But the updated version patch uses the single wait queue instead of two queues (active and retry) which were used in the previous version. Every backends processes has a timestamp in PGPROC (fdwXactNextResolutionTs), that is the time when they expect to be processed by foreign resolver process at. Entries in the wait queue is ordered by theirs timestamps. The wait queue and timestamp are used after a backend process prepared all transactions on foreign servers and wait for all of them to be resolved. Backend processes who are committing/aborting the distributed transaction insert itself to the wait queue (FdwXactRslvCtl->fdwxact_queue) with the current timestamp, and then request to launch a new resolver process if not launched yet. If there is resolver connecting to the same database they just set its latch. The wait queue is protected by LWLock FdwXactResolutionLock. Then the backend sleep until either user requests to cancel (press ctrl-c) or waken up by resolver process. Foreign resolver process continue to poll the wait queue, checking if there is any waiter on the database that the resolver process connects to. If there is a waiter, fetches it and check its timestamp. If the current timestamp goes over its timestamp, the resolver process start to resolve all foreign transactions. Usually backends processes insert itself to wait queue first then wake up the resolver and they use the same wall-clock, so the resolver can fetch the waiter just inserted. Once all foreign transactions are resolved, the resolver process delete the backend entry from the wait queue, and then wake up the waiting backend. On failure during foreign transaction resolution, while the backend is still sleeping, the resolver process removes and inserts the backend with the new timestamp (its timestamp foreign_transaction_resolution_interval) to appropriate position in the wait queue. This mechanism ensures that a distributed transaction is resolved as soon as the waiter inserted while ensuring that the resolver can retry to resolve the failed foreign transactions at a interval of foreign_transaction_resolution_interval time. For handling in-doubt transactions, I've removed the automatically foreign transaction resolution code from the first version patch since it's not essential feature and we can add it later. Therefore user needs to resolve unresolved foreign transactions manually using by pg_resolve_fdwxacts() function in three cases: where the foreign server crashed or we lost connectibility to it during preparing foreign transaction, where the coordinator node crashed during preparing/resolving the foreign transaction and where user canceled to resolve the foreign transaction. For foreign transaction resolver processes, they exit if they don't have any foreign transaction to resolve longer than foreign_transaction_resolver_timeout. Since we cannot drop a database while a resolver process is connecting to we can stop it call by pg_stop_fdwxact_resolver() function. The comment at top of fdwxact.c file describes about locking mechanism and recovery, and src/backend/fdwxact/README descries about status transition of FdwXact. Also the wiki page[1] describes how to use this feature with some examples. [1] https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: Amit KapilaДата:
Сообщение: Re: Unhappy about API changes in the no-fsm-for-small-rels patch