Обсуждение: Race condition in GetOldestActiveTransactionId()

Поиск
Список
Период
Сортировка

Race condition in GetOldestActiveTransactionId()

От
Heikki Linnakangas
Дата:
While hacking on the CSN patch, I spotted a race condition between
GetOldestActiveTransactionId() and GetNewTransactionId().
GetOldestActiveTransactionId() calculates the oldest XID that's still
running, by doing:

1. Read nextXid, without a lock. This is the upper bound, if there are
no active XIDs in the proc array.
2. Loop through the proc array, making note of the oldest XID.

While GetNewTransactionId() does:

1. Read and Increment nextXid
2. Set MyPgXact->xid.

It seems possible that if you call GetNewTransactionId() concurrently
with GetOldestActiveTransactionId(), GetOldestActiveTransactionId() sees
the new nextXid value that the concurrent GetNewTransactionId() set, but
doesn't see the old XID in the proc array. It will return a value that
doesn't cover the old XID, i.e. it won't consider the just-assigned XID
as in-progress.

Am I missing something? Commit c6d76d7c added a comment to
GetOldestActiveTransactionId() explaining why it's not necessary to
acquire XidGenLock there, but I think it missed the above race condition.

GetOldestActiveTransactionId() is not performance-critical, it's only
called when performing a checkpoint, so I think we should just bite the
bullet and grab the lock. Per attached patch.

- Heikki

Вложения

Re: [HACKERS] Race condition in GetOldestActiveTransactionId()

От
Heikki Linnakangas
Дата:
On 08/22/2016 01:46 PM, Heikki Linnakangas wrote:
> While hacking on the CSN patch, I spotted a race condition between
> GetOldestActiveTransactionId() and GetNewTransactionId().
> GetOldestActiveTransactionId() calculates the oldest XID that's still
> running, by doing:
>
> 1. Read nextXid, without a lock. This is the upper bound, if there are
> no active XIDs in the proc array.
> 2. Loop through the proc array, making note of the oldest XID.
>
> While GetNewTransactionId() does:
>
> 1. Read and Increment nextXid
> 2. Set MyPgXact->xid.
>
> It seems possible that if you call GetNewTransactionId() concurrently
> with GetOldestActiveTransactionId(), GetOldestActiveTransactionId() sees
> the new nextXid value that the concurrent GetNewTransactionId() set, but
> doesn't see the old XID in the proc array. It will return a value that
> doesn't cover the old XID, i.e. it won't consider the just-assigned XID
> as in-progress.
>
> Am I missing something? Commit c6d76d7c added a comment to
> GetOldestActiveTransactionId() explaining why it's not necessary to
> acquire XidGenLock there, but I think it missed the above race condition.
>
> GetOldestActiveTransactionId() is not performance-critical, it's only
> called when performing a checkpoint, so I think we should just bite the
> bullet and grab the lock. Per attached patch.

I did some testing, and was able to indeed construct a case where 
oldestActiveXid was off-by-one in an online checkpoint record. However, 
looking at how it's used, I think it happened to not have any 
user-visible effect. The oldestActiveXid value of an online checkpoint 
is only used to determine the point where pg_subtrans is initialized, 
and the XID missed in the computation could not be a subtransaction.

Nevertheless, it's clearly a bug and the fix is simple, so I committed a 
fix.

- Heikki