Prepared transaction releasing locks before deregistering its GID

Поиск
Список
Период
Сортировка
От Oleksii Kliukin
Тема Prepared transaction releasing locks before deregistering its GID
Дата
Msg-id BF9B38A4-2BFF-46E8-BA87-A2D00A8047A6@hintbits.com
обсуждение исходный текст
Ответы Re: Prepared transaction releasing locks before deregistering its GID  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
Hello,

We have an app that copies some metrics between predefined tables on the
source and destination clusters, truncating the table at the source
afterward.

The app locks both the source and the destination table at the beginning and
then, once copy concludes, prepares a transaction on each of the affected
hosts and commits it at last. The naming schema for the prepared transaction
involves the source and the destination table; while it doesn’t guarantee
the uniqueness if multiple copy processes run on the same host, the
possibility of that happening should be prevented by the locks held on
tables involved.

Or so we thought. From time to time, our cron jobs reported a transaction
identifier already in use. After digging into this, my colleague Ildar Musin
has produced a simple pgbench script to reproduce the issue:

----------------------------------------------------------------
$ cat custom.sql
begin;
lock abc in exclusive mode;
insert into abc values ((random() * 1000)::int);

prepare transaction 'test transaction';
commit prepared 'test transaction’;
----------------------------------------------------------------

pgbench launched with:

pgbench -T 20 -c 10 -f custom.sql postgres

(after creating the table abc) produces a number of complaints:

ERROR: transaction identifier "test transaction" is already in use.

I could easily reproduce that on Linux, but not Mac OS, with both Postgres
10.7 and 11.2.

That looks like a race condition to me. What happens is that another
transaction with the name identical to the running one can start and proceed
to the prepare phase while the original one commits, failing at last instead
of waiting for the original one to finish.

By looking at the source code of FinishPreparedTransaction() I can see the
RemoveGXact() call, which removes the prepared transaction from
TwoPhaseState->prepXacts. It is placed at the very end of the function,
after the post-commit callbacks that clear out the locks held by the
transaction. Those callbacks are not guarded by the TwoPhaseStateLock,
resulting in a possibility for a concurrent session to proceed will
MarkAsPreparing after acquiring the locks released by them.

I couldn’t find any documentation on the expected outcome in cases like
this, so I assume it might not be a bug, but an undocumented behavior.

Should I go about and produce a patch to put a note in the description of
commit prepared, or is there any interest in changing the behavior to avoid
such conflicts altogether (perhaps by holding the lock throughout the
cleanup phase)?

Regards,
Oleksii Kliukin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bernd Helmle
Дата:
Сообщение: Re: Progress reporting for pg_verify_checksums
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Reporting script runtimes in pg_regress