Re: Usage of epoch in txid_current

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Usage of epoch in txid_current
Дата
Msg-id 20171205174501.xleypltbtjcgf5iu@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Usage of epoch in txid_current  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Usage of epoch in txid_current
Re: Usage of epoch in txid_current
Список pgsql-hackers
On 2017-12-05 16:21:27 +0530, Amit Kapila wrote:
> On Tue, Dec 5, 2017 at 2:49 PM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > On Tue, Dec 5, 2017 at 6:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >> Currently, txid_current and friends export a 64-bit format of
> >> transaction id that is extended with an “epoch” counter so that it
> >> will not wrap around during the life of an installation.   The epoch
> >> value it uses is based on the epoch that is maintained by checkpoint
> >> (aka only checkpoint increments it).
> >>
> >> Now if epoch changes multiple times between two checkpoints
> >> (practically the chances of this are bleak, but there is a theoretical
> >> possibility), then won't the computation of xids will go wrong?
> >> Basically, it can give the same value of txid after wraparound if the
> >> checkpoint doesn't occur between the two calls to txid_current.
> >
> >
> > AFAICS, yes, if epoch changes multiple times between two checkpoints, then
> > computation will go wrong.  And it doesn't look like purely theoretical
> > possibility for me, because I think I know couple of instances of the edge
> > of this...

I think it's not terribly likely principle, due to the required WAL
size. You need at least a commit record for each of 4 billion
transactions. Each commit record is at least 24bytes long, and in a
non-artificial scenario you additionally would have a few hundred bytes
of actual content of WAL. So we're talking about a distance of at least
0.5-2TB within a single checkpoint here. Not impossible, but not likely
either.


> Okay, it is quite strange that we haven't discovered this problem till
> now.  I think we should do something to fix it.  One idea is that we
> track epoch change in shared memory (probably in the same data
> structure (VariableCacheData) where we track nextXid).  We need to
> increment it when the xid wraparound during xid allocation (in
> GetNewTransactionId).  Also, we need to make it persistent as which
> means we need to log it in checkpoint xlog record and we need to write
> a separate xlog record for the epoch change.

I think it makes a fair bit of sense to not do the current crufty
tracking of xid epochs. I don't really how we got there, but it doesn't
make terribly much sense. Don't think we need additional WAL logging
though - we should be able to piggyback this onto the already existing
clog logging.

I kinda wonder if we shouldn't just track nextXid as a 64bit integer
internally, instead of bothering with tracking the epoch
separately. Then we can "just" truncate it in the cases where it's
stored in space constrained places etc.

Greetings,

Andres Freund


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Dilger
Дата:
Сообщение: dsa_allocate could not find 4 free pages
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Speeding up pg_upgrade