Обсуждение: Usage of epoch in txid_current

Поиск

Список

Период

Сортировка

Usage of epoch in txid_current

От

Amit Kapila

Дата:

05 декабря 2017 г., 09:19:10

Hi,

Currently, txid_current and friends export a 64-bit format of
transaction id that is extended with an “epoch” counter so that it
will not wrap around during the life of an installation.   The epoch
value it uses is based on the epoch that is maintained by checkpoint
(aka only checkpoint increments it).

Now if epoch changes multiple times between two checkpoints
(practically the chances of this are bleak, but there is a theoretical
possibility), then won't the computation of xids will go wrong?
Basically, it can give the same value of txid after wraparound if the
checkpoint doesn't occur between the two calls to txid_current.

Am I missing something which ensures that epoch gets incremented at or
after wraparound?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Usage of epoch in txid_current

От

Alexander Korotkov

Дата:

05 декабря 2017 г., 15:19:49

Hi! On Tue, Dec 5, 2017 at 6:19 AM, Amit Kapila wrote: > Currently, txid_current and friends export a 64-bit format of > transaction id that is extended with an “epoch” counter so that it > will not wrap around during the life of an installation. The epoch > value it uses is based on the epoch that is maintained by checkpoint > (aka only checkpoint increments it). > > Now if epoch changes multiple times between two checkpoints > (practically the chances of this are bleak, but there is a theoretical > possibility), then won't the computation of xids will go wrong? > Basically, it can give the same value of txid after wraparound if the > checkpoint doesn't occur between the two calls to txid_current. > AFAICS, yes, if epoch changes multiple times between two checkpoints, then computation will go wrong. And it doesn't look like purely theoretical possibility for me, because I think I know couple of instances of the edge of this... Am I missing something which ensures that epoch gets incremented at or > after wraparound? > I've checked the code, and it doesn't look for me that there is something like this. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company

Re: Usage of epoch in txid_current

От

Amit Kapila

Дата:

05 декабря 2017 г., 16:51:27

On Tue, Dec 5, 2017 at 2:49 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Tue, Dec 5, 2017 at 6:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> Currently, txid_current and friends export a 64-bit format of
>> transaction id that is extended with an “epoch” counter so that it
>> will not wrap around during the life of an installation.   The epoch
>> value it uses is based on the epoch that is maintained by checkpoint
>> (aka only checkpoint increments it).
>>
>> Now if epoch changes multiple times between two checkpoints
>> (practically the chances of this are bleak, but there is a theoretical
>> possibility), then won't the computation of xids will go wrong?
>> Basically, it can give the same value of txid after wraparound if the
>> checkpoint doesn't occur between the two calls to txid_current.
>
>
> AFAICS, yes, if epoch changes multiple times between two checkpoints, then
> computation will go wrong.  And it doesn't look like purely theoretical
> possibility for me, because I think I know couple of instances of the edge
> of this...
>

Okay, it is quite strange that we haven't discovered this problem till
now.  I think we should do something to fix it.  One idea is that we
track epoch change in shared memory (probably in the same data
structure (VariableCacheData) where we track nextXid).  We need to
increment it when the xid wraparound during xid allocation (in
GetNewTransactionId).  Also, we need to make it persistent as which
means we need to log it in checkpoint xlog record and we need to write
a separate xlog record for the epoch change.

Thoughts?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Usage of epoch in txid_current

От

Andres Freund

Дата:

05 декабря 2017 г., 23:45:01

On 2017-12-05 16:21:27 +0530, Amit Kapila wrote:
> On Tue, Dec 5, 2017 at 2:49 PM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > On Tue, Dec 5, 2017 at 6:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >> Currently, txid_current and friends export a 64-bit format of
> >> transaction id that is extended with an “epoch” counter so that it
> >> will not wrap around during the life of an installation.   The epoch
> >> value it uses is based on the epoch that is maintained by checkpoint
> >> (aka only checkpoint increments it).
> >>
> >> Now if epoch changes multiple times between two checkpoints
> >> (practically the chances of this are bleak, but there is a theoretical
> >> possibility), then won't the computation of xids will go wrong?
> >> Basically, it can give the same value of txid after wraparound if the
> >> checkpoint doesn't occur between the two calls to txid_current.
> >
> >
> > AFAICS, yes, if epoch changes multiple times between two checkpoints, then
> > computation will go wrong.  And it doesn't look like purely theoretical
> > possibility for me, because I think I know couple of instances of the edge
> > of this...

I think it's not terribly likely principle, due to the required WAL
size. You need at least a commit record for each of 4 billion
transactions. Each commit record is at least 24bytes long, and in a
non-artificial scenario you additionally would have a few hundred bytes
of actual content of WAL. So we're talking about a distance of at least
0.5-2TB within a single checkpoint here. Not impossible, but not likely
either.

> Okay, it is quite strange that we haven't discovered this problem till
> now.  I think we should do something to fix it.  One idea is that we
> track epoch change in shared memory (probably in the same data
> structure (VariableCacheData) where we track nextXid).  We need to
> increment it when the xid wraparound during xid allocation (in
> GetNewTransactionId).  Also, we need to make it persistent as which
> means we need to log it in checkpoint xlog record and we need to write
> a separate xlog record for the epoch change.

I think it makes a fair bit of sense to not do the current crufty
tracking of xid epochs. I don't really how we got there, but it doesn't
make terribly much sense. Don't think we need additional WAL logging
though - we should be able to piggyback this onto the already existing
clog logging.

I kinda wonder if we shouldn't just track nextXid as a 64bit integer
internally, instead of bothering with tracking the epoch
separately. Then we can "just" truncate it in the cases where it's
stored in space constrained places etc.

Greetings,

Andres Freund

Re: Usage of epoch in txid_current

От

Stephen Frost

Дата:

06 декабря 2017 г., 00:01:43

Andres,

* Andres Freund (andres@anarazel.de) wrote:
> On 2017-12-05 16:21:27 +0530, Amit Kapila wrote:
> > On Tue, Dec 5, 2017 at 2:49 PM, Alexander Korotkov
> > <a.korotkov@postgrespro.ru> wrote:
> > > On Tue, Dec 5, 2017 at 6:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >>
> > >> Currently, txid_current and friends export a 64-bit format of
> > >> transaction id that is extended with an “epoch” counter so that it
> > >> will not wrap around during the life of an installation.   The epoch
> > >> value it uses is based on the epoch that is maintained by checkpoint
> > >> (aka only checkpoint increments it).
> > >>
> > >> Now if epoch changes multiple times between two checkpoints
> > >> (practically the chances of this are bleak, but there is a theoretical
> > >> possibility), then won't the computation of xids will go wrong?
> > >> Basically, it can give the same value of txid after wraparound if the
> > >> checkpoint doesn't occur between the two calls to txid_current.
> > >
> > >
> > > AFAICS, yes, if epoch changes multiple times between two checkpoints, then
> > > computation will go wrong.  And it doesn't look like purely theoretical
> > > possibility for me, because I think I know couple of instances of the edge
> > > of this...
>
> I think it's not terribly likely principle, due to the required WAL
> size. You need at least a commit record for each of 4 billion
> transactions. Each commit record is at least 24bytes long, and in a
> non-artificial scenario you additionally would have a few hundred bytes
> of actual content of WAL. So we're talking about a distance of at least
> 0.5-2TB within a single checkpoint here. Not impossible, but not likely
> either.

At the bottom end, with a 30-minute checkpoint, that's about 300MB/s.
Certainly quite a bit and we might have trouble getting there for other
reasons, but definitely something that can be accomplished with even a
single SSD these days.

> > Okay, it is quite strange that we haven't discovered this problem till
> > now.  I think we should do something to fix it.  One idea is that we
> > track epoch change in shared memory (probably in the same data
> > structure (VariableCacheData) where we track nextXid).  We need to
> > increment it when the xid wraparound during xid allocation (in
> > GetNewTransactionId).  Also, we need to make it persistent as which
> > means we need to log it in checkpoint xlog record and we need to write
> > a separate xlog record for the epoch change.
>
> I think it makes a fair bit of sense to not do the current crufty
> tracking of xid epochs. I don't really how we got there, but it doesn't
> make terribly much sense. Don't think we need additional WAL logging
> though - we should be able to piggyback this onto the already existing
> clog logging.

Don't you mean xact logging? ;)

> I kinda wonder if we shouldn't just track nextXid as a 64bit integer
> internally, instead of bothering with tracking the epoch
> separately. Then we can "just" truncate it in the cases where it's
> stored in space constrained places etc.

This sounds reasonable to me, at least, but I've not been in these
depths much.

Thanks!

Stephen

Re: Usage of epoch in txid_current

От

Andres Freund

Дата:

06 декабря 2017 г., 00:07:28


On December 5, 2017 10:01:43 AM PST, Stephen Frost <sfrost@snowman.net> wrote:
>Andres,
>
>* Andres Freund (andres@anarazel.de) wrote:
>> I think it makes a fair bit of sense to not do the current crufty
>> tracking of xid epochs. I don't really how we got there, but it
>doesn't
>> make terribly much sense. Don't think we need additional WAL logging
>> though - we should be able to piggyback this onto the already
>existing
>> clog logging.

>
>Don't you mean xact logging? ;)

No. We log a WAL record at clog boundaries. Wraparounds have to be at one. We could just include the 64 bit xid there
andwould have reliable tracking. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Usage of epoch in txid_current

От

Tom Lane

Дата:

06 декабря 2017 г., 00:55:36

Stephen Frost <sfrost@snowman.net> writes:
> * Andres Freund (andres@anarazel.de) wrote:
>> I kinda wonder if we shouldn't just track nextXid as a 64bit integer
>> internally, instead of bothering with tracking the epoch
>> separately. Then we can "just" truncate it in the cases where it's
>> stored in space constrained places etc.

> This sounds reasonable to me, at least, but I've not been in these
> depths much.

+1 ... I think the reason it's like that is simply that nobody's revisited
the XID generator since we decided to require 64-bit integer support.

We'd need this for support of true 64-bit XIDs, too, though I'm unsure
whether that project is going anywhere anytime soon.  In any case it
seems like a separable subset of that work.

            regards, tom lane

Re: Usage of epoch in txid_current

От

Amit Kapila

Дата:

06 декабря 2017 г., 18:09:09

On Tue, Dec 5, 2017 at 11:15 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-12-05 16:21:27 +0530, Amit Kapila wrote:
>
>> Okay, it is quite strange that we haven't discovered this problem till
>> now.  I think we should do something to fix it.  One idea is that we
>> track epoch change in shared memory (probably in the same data
>> structure (VariableCacheData) where we track nextXid).  We need to
>> increment it when the xid wraparound during xid allocation (in
>> GetNewTransactionId).  Also, we need to make it persistent as which
>> means we need to log it in checkpoint xlog record and we need to write
>> a separate xlog record for the epoch change.
>
> I think it makes a fair bit of sense to not do the current crufty
> tracking of xid epochs. I don't really how we got there, but it doesn't
> make terribly much sense. Don't think we need additional WAL logging
> though - we should be able to piggyback this onto the already existing
> clog logging.
>
> I kinda wonder if we shouldn't just track nextXid as a 64bit integer
> internally, instead of bothering with tracking the epoch
> separately. Then we can "just" truncate it in the cases where it's
> stored in space constrained places etc.
>

We are using ShmemVariableCache->nextXid at many places so always
converting/truncating it to 32-bit number before using seems slightly
awkward, so we can think of using a separate nextBigXid 64bit number
as well.  Either way, it is not clear to me how we will keep it
updated after recovery.  Right now, the mechanism is quite simple, at
the beginning of a recovery we take the value of nextXid from
checkpoint record and then if any xlog record indicates xid that
follows nextXid, we advance it.  Here, the point to note is that we
take the xid from the WAL record (which means that it assumes xids are
non-contiguous or some xids are consumed without being logged) and
increment it.  Unless we plan to change something in that logic (like
storing 64-bit xids in WAL records), it is not clear to me how to make
it work.  OTOH, recovering value of epoch which increments only at
wraparound seems fairly straightforward as described in my previous
email.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Usage of epoch in txid_current

От

Andres Freund

Дата:

06 декабря 2017 г., 23:56:53

Hi,

On 2017-12-06 17:39:09 +0530, Amit Kapila wrote:
> We are using ShmemVariableCache->nextXid at many places so always
> converting/truncating it to 32-bit number before using seems slightly
> awkward, so we can think of using a separate nextBigXid 64bit number
> as well.

-many

It's not actually that many places. And a lot of them would should be
updated anyway, to be epoch aware. Let's not add this kind of crummyness
to avoid a few truncating casts here and there.


> Either way, it is not clear to me how we will keep it
> updated after recovery.  Right now, the mechanism is quite simple, at
> the beginning of a recovery we take the value of nextXid from
> checkpoint record and then if any xlog record indicates xid that
> follows nextXid, we advance it.  Here, the point to note is that we
> take the xid from the WAL record (which means that it assumes xids are
> non-contiguous or some xids are consumed without being logged) and
> increment it.  Unless we plan to change something in that logic (like
> storing 64-bit xids in WAL records), it is not clear to me how to make
> it work.  OTOH, recovering value of epoch which increments only at
> wraparound seems fairly straightforward as described in my previous
> email.

I think it should be fairly simple if simply add the 64bit xid to the
existing clog extension WAL records.

Greetings,

Andres Freund

Re: Usage of epoch in txid_current

От

Amit Kapila

Дата:

08 декабря 2017 г., 08:36:33

On Wed, Dec 6, 2017 at 11:26 PM, Andres Freund <andres@anarazel.de> wrote:
>
>> Either way, it is not clear to me how we will keep it
>> updated after recovery.  Right now, the mechanism is quite simple, at
>> the beginning of a recovery we take the value of nextXid from
>> checkpoint record and then if any xlog record indicates xid that
>> follows nextXid, we advance it.  Here, the point to note is that we
>> take the xid from the WAL record (which means that it assumes xids are
>> non-contiguous or some xids are consumed without being logged) and
>> increment it.  Unless we plan to change something in that logic (like
>> storing 64-bit xids in WAL records), it is not clear to me how to make
>> it work.  OTOH, recovering value of epoch which increments only at
>> wraparound seems fairly straightforward as described in my previous
>> email.
>
> I think it should be fairly simple if simply add the 64bit xid to the
> existing clog extension WAL records.
>

IIUC, you mean to say that we should log the 64bit xid value in
CLOG_ZEROPAGE record while extending clog and that too we can do only
at wraparound.  Now, maybe doing it every time also doesn't hurt, but
I think doing it at wraparound should be sufficient.

Just to be clear, I am not planning to pursue writing a patch for this
at the moment.  So, if anybody else is interested or if Andres wants
to write it, I can help in the review.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

10 июля 2018 г., 05:35:59

On Fri, Dec 8, 2017 at 3:36 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Dec 6, 2017 at 11:26 PM, Andres Freund <andres@anarazel.de> wrote:
>>> Either way, it is not clear to me how we will keep it
>>> updated after recovery.  Right now, the mechanism is quite simple, at
>>> the beginning of a recovery we take the value of nextXid from
>>> checkpoint record and then if any xlog record indicates xid that
>>> follows nextXid, we advance it.  Here, the point to note is that we
>>> take the xid from the WAL record (which means that it assumes xids are
>>> non-contiguous or some xids are consumed without being logged) and
>>> increment it.  Unless we plan to change something in that logic (like
>>> storing 64-bit xids in WAL records), it is not clear to me how to make
>>> it work.  OTOH, recovering value of epoch which increments only at
>>> wraparound seems fairly straightforward as described in my previous
>>> email.
>>
>> I think it should be fairly simple if simply add the 64bit xid to the
>> existing clog extension WAL records.
>
> IIUC, you mean to say that we should log the 64bit xid value in
> CLOG_ZEROPAGE record while extending clog and that too we can do only
> at wraparound.  Now, maybe doing it every time also doesn't hurt, but
> I think doing it at wraparound should be sufficient.

Can you please elaborate on why clog redo in particular would need to
use 64 bit xids?  Is the 64 bit xid not derivable from the 32 bit xid
in the WAL + the current value of a new 64 bit next xid?

> Just to be clear, I am not planning to pursue writing a patch for this
> at the moment.  So, if anybody else is interested or if Andres wants
> to write it, I can help in the review.

I played around with this idea yesterday.  Experiment-grade patch
attached.  Approach:

1.  Introduce a new type BigTransactionId (better names welcome).
2.  Change ShmemVariableCache->nextXid to ShmemVariableCache->nextBigXid.
3.  Change checkpoints to use nextBigXid too.
4.  Change ReadNewTransactionId() to ReadNextBigTransactionId().
5.  Remove GetNextXidAndEpoch() as it's now redundant.
6.  Everywhere that was reading ShmemVariableCache->nextXid or calling
ReadNewTransactionId() but actually needs an xid now uses an explicit
conversion macro XidFromBigTransactionId().
7.  Everywhere that was writing ShmemVariableCache->nextXid but only
has an xid now goes through a new function
AdvanceNextBigTransactionIdPast(xid), to handle recovery (since WAL
records have xids in them as mentioned by Amit).

I think it's probably a good idea to make it very explicit when moving
between big and small transaction IDs, hence the including of the word
'big' in variable and function names and the use of a function-like
macro (rather than implicit conversion, which C doesn't give me a good
way to prevent).  Otherwise there is a class of bug that is hidden for
the first 2^32 transactions.

The logic in CreateCheckPoint() that assumed that there could be only
one wraparound per checkpoint is gone (the problem reported in this
thread).  Note that AdvanceNextBigTransactionIdPast() contains logic
that infers an epoch, which is in some ways similar to what the
removed code was doing, but in this case I think all callers must have
an xid from the same or next epoch.

I'm probably missing a few details... this is only a first swing at
the problem.  It does pass check-world and various xid wraparound
tests I came up with.  Clearly big xids could spread to more places
than I show here.  Do you see problems with this approach or have
better ideas?

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

0001-Track-the-next-xid-using-64-bits.patch

Re: Usage of epoch in txid_current

От

Andres Freund

Дата:

10 июля 2018 г., 05:43:16

On 2018-07-10 11:35:59 +1200, Thomas Munro wrote:
> I played around with this idea yesterday.  Experiment-grade patch
> attached.

Cool!


> I think it's probably a good idea to make it very explicit when moving
> between big and small transaction IDs, hence the including of the word
> 'big' in variable and function names and the use of a function-like
> macro (rather than implicit conversion, which C doesn't give me a good
> way to prevent).  Otherwise there is a class of bug that is hidden for
> the first 2^32 transactions.

You could have BigTransactionId (maybe renamed to FullTransactionId?) be
a struct type. That'd prevent such issues. Most compilers these days
should be more than good enough to optimize passing around an 8byte
struct by value...

Greetings,

Andres Freund

Re: Usage of epoch in txid_current

От

Tom Lane

Дата:

10 июля 2018 г., 05:56:25

Andres Freund <andres@anarazel.de> writes:
> On 2018-07-10 11:35:59 +1200, Thomas Munro wrote:
>> I think it's probably a good idea to make it very explicit when moving
>> between big and small transaction IDs, hence the including of the word
>> 'big' in variable and function names and the use of a function-like
>> macro (rather than implicit conversion, which C doesn't give me a good
>> way to prevent).  Otherwise there is a class of bug that is hidden for
>> the first 2^32 transactions.

> You could have BigTransactionId (maybe renamed to FullTransactionId?) be
> a struct type. That'd prevent such issues. Most compilers these days
> should be more than good enough to optimize passing around an 8byte
> struct by value...

Or, perhaps, use a struct in assert builds and int64 otherwise?
You could hide the ensuing notational differences in macros.

            regards, tom lane

Re: Usage of epoch in txid_current

От

Andres Freund

Дата:

10 июля 2018 г., 06:08:34

Hi,

On 2018-07-09 19:56:25 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-07-10 11:35:59 +1200, Thomas Munro wrote:
> >> I think it's probably a good idea to make it very explicit when moving
> >> between big and small transaction IDs, hence the including of the word
> >> 'big' in variable and function names and the use of a function-like
> >> macro (rather than implicit conversion, which C doesn't give me a good
> >> way to prevent).  Otherwise there is a class of bug that is hidden for
> >> the first 2^32 transactions.
> 
> > You could have BigTransactionId (maybe renamed to FullTransactionId?) be
> > a struct type. That'd prevent such issues. Most compilers these days
> > should be more than good enough to optimize passing around an 8byte
> > struct by value...
> 
> Or, perhaps, use a struct in assert builds and int64 otherwise?
> You could hide the ensuing notational differences in macros.

That should be doable. But I'd like to check if it's necessary
first. Optimizing passing an 8 byte struct shouldn't be hard for
compilers these days - especially when most things dealing with them are
inline functions.  If we find that it's not a problem on contemporary
compilers, it might be worthwhile to use a bit more type safety in other
places too.

Here's what gcc does on O1:

#include <stdint.h>

typedef struct foo
{
  uint64_t id;
} foo;

extern foo take_foo_struct(foo f, int i);
extern uint64_t take_foo_int(uint64_t id, int i);

foo take_foo_struct(foo f, int i)
{
  f.id += i;
  return f;
}

uint64_t take_foo_int(uint64_t id, int i)
{
  id += i;
  return id;
}

results in:

    .file    "test.c"
    .text
    .globl    take_foo_struct
    .type    take_foo_struct, @function
take_foo_struct:
.LFB0:
    .cfi_startproc
    movslq    %esi, %rax
    addq    %rdi, %rax
    ret
    .cfi_endproc
.LFE0:
    .size    take_foo_struct, .-take_foo_struct
    .globl    take_foo_int
    .type    take_foo_int, @function
take_foo_int:
.LFB1:
    .cfi_startproc
    movslq    %esi, %rax
    addq    %rdi, %rax
    ret
    .cfi_endproc
.LFE1:
    .size    take_foo_int, .-take_foo_int
    .ident    "GCC: (Debian 7.3.0-24) 7.3.0"
    .section    .note.GNU-stack,"",@progbits

IOW, exactly the same code generated.  Note that the compiler does *not*
see the callsites in this case, i.e. this is platform ABI conformant.

Greetings,

Andres Freund

Re: Usage of epoch in txid_current

От

Andres Freund

Дата:

10 июля 2018 г., 06:52:16

On 2018-07-09 17:08:34 -0700, Andres Freund wrote:
> Hi,
> 
> On 2018-07-09 19:56:25 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > On 2018-07-10 11:35:59 +1200, Thomas Munro wrote:
> > >> I think it's probably a good idea to make it very explicit when moving
> > >> between big and small transaction IDs, hence the including of the word
> > >> 'big' in variable and function names and the use of a function-like
> > >> macro (rather than implicit conversion, which C doesn't give me a good
> > >> way to prevent).  Otherwise there is a class of bug that is hidden for
> > >> the first 2^32 transactions.
> > 
> > > You could have BigTransactionId (maybe renamed to FullTransactionId?) be
> > > a struct type. That'd prevent such issues. Most compilers these days
> > > should be more than good enough to optimize passing around an 8byte
> > > struct by value...
> > 
> > Or, perhaps, use a struct in assert builds and int64 otherwise?
> > You could hide the ensuing notational differences in macros.
> 
> That should be doable. But I'd like to check if it's necessary
> first. Optimizing passing an 8 byte struct shouldn't be hard for
> compilers these days - especially when most things dealing with them are
> inline functions.  If we find that it's not a problem on contemporary
> compilers, it might be worthwhile to use a bit more type safety in other
> places too.
> 
> Here's what gcc does on O1:
> 
> #include <stdint.h>
> 
> typedef struct foo
> {
>   uint64_t id;
> } foo;
> 
> extern foo take_foo_struct(foo f, int i);
> extern uint64_t take_foo_int(uint64_t id, int i);
> 
> foo take_foo_struct(foo f, int i)
> {
>   f.id += i;
>   return f;
> }
> 
> uint64_t take_foo_int(uint64_t id, int i)
> {
>   id += i;
>   return id;
> }
> 
> results in:
> 
>     .file    "test.c"
>     .text
>     .globl    take_foo_struct
>     .type    take_foo_struct, @function
> take_foo_struct:
> .LFB0:
>     .cfi_startproc
>     movslq    %esi, %rax
>     addq    %rdi, %rax
>     ret
>     .cfi_endproc
> .LFE0:
>     .size    take_foo_struct, .-take_foo_struct
>     .globl    take_foo_int
>     .type    take_foo_int, @function
> take_foo_int:
> .LFB1:
>     .cfi_startproc
>     movslq    %esi, %rax
>     addq    %rdi, %rax
>     ret
>     .cfi_endproc
> .LFE1:
>     .size    take_foo_int, .-take_foo_int
>     .ident    "GCC: (Debian 7.3.0-24) 7.3.0"
>     .section    .note.GNU-stack,"",@progbits
> 
> IOW, exactly the same code generated.  Note that the compiler does *not*
> see the callsites in this case, i.e. this is platform ABI conformant.

FWIW, this is required by the x86-64 SYSV ABI.  See
https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf
3.2.3 Parameter Passing.  Aggregates with scalar types up to "two
eightbytes" are passed via registers.


It's also the case for MSVC / windows
https://docs.microsoft.com/en-us/cpp/cpp/argument-passing-and-naming-conventions
https://docs.microsoft.com/en-us/cpp/build/parameter-passing

Small aggregates (8, 16, 32, or 64 bits) are passed in registers.


Greetings,

Andres Freund

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

10 июля 2018 г., 07:20:52

On Tue, Jul 10, 2018 at 12:08 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2018-07-09 19:56:25 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>> > On 2018-07-10 11:35:59 +1200, Thomas Munro wrote:
>> >> I think it's probably a good idea to make it very explicit when moving
>> >> between big and small transaction IDs, hence the including of the word
>> >> 'big' in variable and function names and the use of a function-like
>> >> macro (rather than implicit conversion, which C doesn't give me a good
>> >> way to prevent).  Otherwise there is a class of bug that is hidden for
>> >> the first 2^32 transactions.
>>
>> > You could have BigTransactionId (maybe renamed to FullTransactionId?) be
>> > a struct type. That'd prevent such issues. Most compilers these days
>> > should be more than good enough to optimize passing around an 8byte
>> > struct by value...
>>
>> Or, perhaps, use a struct in assert builds and int64 otherwise?
>> You could hide the ensuing notational differences in macros.
>
> That should be doable. But I'd like to check if it's necessary
> first. Optimizing passing an 8 byte struct shouldn't be hard for
> compilers these days - especially when most things dealing with them are
> inline functions.  If we find that it's not a problem on contemporary
> compilers, it might be worthwhile to use a bit more type safety in other
> places too.
>
> ...
>
> IOW, exactly the same code generated.  Note that the compiler does *not*
> see the callsites in this case, i.e. this is platform ABI conformant.

I like it.  Here's a version that uses a struct named
FullTransactionId (yeah, that's a better name, thanks), defined in
transam.h because c.h didn't feel right.

Client code lost the ability to use operator < directly.  I needed to
use a static inline function as a constructor.  I lost the
interchangeability with the wide xids in txid.c, so I provided
U64FromFullTransactionId() (I think that'll be useful for
serialisation over the write too).  I don't know what to think about
the encoding or meaning of non-normal xids in this thing.

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

0001-Track-the-next-xid-using-64-bits-v2.patch

Re: Usage of epoch in txid_current

От

Andres Freund

Дата:

10 июля 2018 г., 07:30:57

On 2018-07-10 13:20:52 +1200, Thomas Munro wrote:
> defined in transam.h because c.h didn't feel right.

Yea, that looks better.


> Client code lost the ability to use operator < directly.  I needed to
> use a static inline function as a constructor.  I lost the
> interchangeability with the wide xids in txid.c, so I provided
> U64FromFullTransactionId() (I think that'll be useful for
> serialisation over the write too).

Should probably, at a later point, introduce a SQL type for it too.


> I don't know what to think about the encoding or meaning of non-normal
> xids in this thing.

I'm not following what you mean by this?



> diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
> index 4faa21f5aef..fa0847afc81 100644
> --- a/src/backend/access/transam/subtrans.c
> +++ b/src/backend/access/transam/subtrans.c
> @@ -261,7 +261,7 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
>      LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
>
>      startPage = TransactionIdToPage(oldestActiveXID);
> -    endPage = TransactionIdToPage(ShmemVariableCache->nextXid);
> +    endPage = TransactionIdToPage(XidFromFullTransactionId(ShmemVariableCache->nextFullXid));

These probably need an intermediate variable.


> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
> index 394843f7e91..13020f54d98 100644
> --- a/src/backend/access/transam/varsup.c
> +++ b/src/backend/access/transam/varsup.c
> @@ -73,7 +73,7 @@ GetNewTransactionId(bool isSubXact)
>
>      LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
>
> -    xid = ShmemVariableCache->nextXid;
> +    xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);

It's a bit over the top, I know, but I'd move the conversion to outside
the lock.  Less because it makes a meaningful efficiency difference, and
more because I find it clearer, and easier to reason about if we ever go
to atomically incrementing nextFullXid.

> @@ -6700,7 +6698,8 @@ StartupXLOG(void)
>                               wasShutdown ? "true" : "false")));
>      ereport(DEBUG1,
>              (errmsg_internal("next transaction ID: %u:%u; next OID: %u",
> -                             checkPoint.nextXidEpoch, checkPoint.nextXid,
> +                             EpochFromFullTransactionId(checkPoint.nextFullXid),
> +                             XidFromFullTransactionId(checkPoint.nextFullXid),
>                               checkPoint.nextOid)));

I don't think we should continue to expose epochs, but rather go to full
xids where appropriate.


> diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
> index 895a51f89d5..f6731bfd28d 100644
> --- a/src/bin/pg_controldata/pg_controldata.c
> +++ b/src/bin/pg_controldata/pg_controldata.c
> @@ -20,6 +20,7 @@
>
>  #include <time.h>
>
> +#include "access/transam.h"
>  #include "access/xlog.h"
>  #include "access/xlog_internal.h"
>  #include "catalog/pg_control.h"
> @@ -256,8 +257,8 @@ main(int argc, char *argv[])
>      printf(_("Latest checkpoint's full_page_writes: %s\n"),
>             ControlFile->checkPointCopy.fullPageWrites ? _("on") : _("off"));
>      printf(_("Latest checkpoint's NextXID:          %u:%u\n"),
> -           ControlFile->checkPointCopy.nextXidEpoch,
> -           ControlFile->checkPointCopy.nextXid);
> +           EpochFromFullTransactionId(ControlFile->checkPointCopy.nextFullXid),
> +           XidFromFullTransactionId(ControlFile->checkPointCopy.nextFullXid));
>      printf(_("Latest checkpoint's NextOID:          %u\n"),
>             ControlFile->checkPointCopy.nextOid);
>      printf(_("Latest checkpoint's NextMultiXactId:  %u\n"),

See above re exposing epoch.



> --- a/src/include/access/transam.h
> +++ b/src/include/access/transam.h
> @@ -44,6 +44,32 @@
>  #define TransactionIdStore(xid, dest)    (*(dest) = (xid))
>  #define StoreInvalidTransactionId(dest) (*(dest) = InvalidTransactionId)
>
> +#define EpochFromFullTransactionId(x)    ((uint32) ((x).value >> 32))
> +#define XidFromFullTransactionId(x)        ((uint32) (x).value)
> +#define U64FromFullTransactionId(x)        ((x).value)
> +#define FullTransactionIdPrecedes(a, b)    ((a).value < (b).value)
> +#define FullTransactionIdPrecedesOrEquals(a, b)    ((a).value <= (b).value)
> +
> +/*
> + * A 64 bit value that contains an epoch and a TransactionId.  This is
> + * wrapped in a struct to prevent implicit conversion to/from TransactionId.
> + * Allowing such conversions seems likely to be error-prone.
> + */
> +typedef struct FullTransactionId
> +{
> +    uint64  value;
> +} FullTransactionId;
> +
> +static inline FullTransactionId
> +MakeFullTransactionId(uint32 epoch, TransactionId xid)
> +{
> +    FullTransactionId result;
> +
> +    result.value = ((uint64) epoch) << 32 | xid;
> +
> +    return result;
> +}
> +
>  /* advance a transaction ID variable, handling wraparound correctly */
>  #define TransactionIdAdvance(dest)    \
>      do { \
> @@ -52,6 +78,15 @@
>              (dest) = FirstNormalTransactionId; \
>      } while(0)
>
> +/* advance a FullTransactionId lvalue, handling wraparound correctly */
> +#define FullTransactionIdAdvance(dest) \
> +    do { \
> +        (dest).value++; \
> +        if (XidFromFullTransactionId(dest) < FirstNormalTransactionId) \
> +            (dest) = MakeFullTransactionId(EpochFromFullTransactionId(dest), \
> +                                           FirstNormalTransactionId); \
> +    } while (0)

Yikes. Why isn't this an inline function? Because it assigns to dest?

Greetings,

Andres Freund

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

10 июля 2018 г., 08:15:39

On Tue, Jul 10, 2018 at 1:30 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2018-07-10 13:20:52 +1200, Thomas Munro wrote:
>> I don't know what to think about the encoding or meaning of non-normal
>> xids in this thing.
>
> I'm not following what you mean by this?

While defining FullTransactionIdPrecedes() in the image of
TransactionIdPrecedes(), I wondered whether the xid component of a
FullTransactionId would ever be one of the special values below
FirstNormalTransactionId that require special treatment.  The question
doesn't actually arise in this code, however, so I ignored that
thought and simply used x < y.

>> diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
>> index 4faa21f5aef..fa0847afc81 100644
>> --- a/src/backend/access/transam/subtrans.c
>> +++ b/src/backend/access/transam/subtrans.c
>> @@ -261,7 +261,7 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
>>       LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
>>
>>       startPage = TransactionIdToPage(oldestActiveXID);
>> -     endPage = TransactionIdToPage(ShmemVariableCache->nextXid);
>> +     endPage = TransactionIdToPage(XidFromFullTransactionId(ShmemVariableCache->nextFullXid));
>
> These probably need an intermediate variable.

Ok, did that in a couple of places.

>> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
>> index 394843f7e91..13020f54d98 100644
>> --- a/src/backend/access/transam/varsup.c
>> +++ b/src/backend/access/transam/varsup.c
>> @@ -73,7 +73,7 @@ GetNewTransactionId(bool isSubXact)
>>
>>       LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
>>
>> -     xid = ShmemVariableCache->nextXid;
>> +     xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
>
> It's a bit over the top, I know, but I'd move the conversion to outside
> the lock.  Less because it makes a meaningful efficiency difference, and
> more because I find it clearer, and easier to reason about if we ever go
> to atomically incrementing nextFullXid.

Erm -- I can't read nextFullXid until I have the lock, and then I need
it in a 32 bit format for the code that follows immediately (I don't
currently have xidVacLimit in FullTransactionId format).  I'm not sure
what you mean.

>> @@ -6700,7 +6698,8 @@ StartupXLOG(void)
>>                                                        wasShutdown ? "true" : "false")));
>>       ereport(DEBUG1,
>>                       (errmsg_internal("next transaction ID: %u:%u; next OID: %u",
>> -                                                      checkPoint.nextXidEpoch, checkPoint.nextXid,
>> +                                                      EpochFromFullTransactionId(checkPoint.nextFullXid),
>> +                                                      XidFromFullTransactionId(checkPoint.nextFullXid),
>>                                                        checkPoint.nextOid)));
>
> I don't think we should continue to expose epochs, but rather go to full
> xids where appropriate.

OK, done.

Hmm, that's going to hurt some eyeballs, because other
fields are shown in 32 bit xid format.  Should we also go to hex
everywhere so that we feeble humans can see the epoch component and
compare the xid component by eye?

Here's what I see on my test cluster:

Latest checkpoint's NextXID:          184683724519
Latest checkpoint's oldestXID:        4294901760

In hexadecimal money that'd be (with extra spaces, to line up with a
monospace font):

Latest checkpoint's NextXID:          0000002b0001fee7
Latest checkpoint's oldestXID:                ffff0000

I left pg_resetwal's arguments -x and -e alone, but I suppose someone
might want a new switch that does both together...

>> +/* advance a FullTransactionId lvalue, handling wraparound correctly */
>> +#define FullTransactionIdAdvance(dest) \
>> +     do { \
>> +             (dest).value++; \
>> +             if (XidFromFullTransactionId(dest) < FirstNormalTransactionId) \
>> +                     (dest) = MakeFullTransactionId(EpochFromFullTransactionId(dest), \
>> +                                                                                FirstNormalTransactionId); \
>> +     } while (0)
>
> Yikes. Why isn't this an inline function? Because it assigns to dest?

Following existing malpractice, and yeah because it requires an lvalue
so can't be changed without a different convention at the call site.
Fixed.

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

0001-Track-the-next-xid-using-64-bits-v3.patch

Re: Usage of epoch in txid_current

От

Craig Ringer

Дата:

10 июля 2018 г., 08:32:44

On 10 July 2018 at 07:35, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

I played around with this idea yesterday. Experiment-grade patch
attached. Approach:

1. Introduce a new type BigTransactionId (better names welcome).

txid_current() should be changed to BigTransactionId too. It's currently bigint.

Users are currently left in a real mess when it comes to pg_stat_activity xids, etc, which are epoch-unqualified. txid_current() reports an epoch-qualified xid, but there's no sensible and safe conversion from txid_current()'s bigint to an epoch-qualified ID. age() takes 'xid', everything takes 'xid', but is completely oblivious to epoch.

IMO all user-facing xids should be moved to BigTransactionId, with the 'xid' type ceasing to appear in any user facing role.

I think it's probably a good idea to make it very explicit when moving
between big and small transaction IDs, hence the including of the word
'big' in variable and function names and the use of a function-like
macro (rather than implicit conversion, which C doesn't give me a good
way to prevent).

The only way I know of to prevent it is to use a wrapper by-value struct. I've used this technique in the past where it's critical that implicit conversions don't occurr, but it sure isn't pretty.

typedef struct BigTransactionId

{

uint64 value;

} BigTransactionId;

instead of

typedef uint64 BigTransactionId;

It's annoying having to get the value member, but it's definitely highly protective against errors. Pass-by-value isn't a problem, neither is return-by-value.

BigTransactionId

add_one(BigTransactionId xid)

{

BigTransactionId ret;

ret.value = xid.value + 1;

return ret;

}

Personally I think it's potentially worth the required gymnastics if older compilers do a sane job of code generation with this.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Usage of epoch in txid_current

От

Craig Ringer

Дата:

10 июля 2018 г., 08:33:45

On 10 July 2018 at 10:32, Craig Ringer <craig@2ndquadrant.com> wrote:

I think it's probably a good idea to make it very explicit when moving
between big and small transaction IDs, hence the including of the word
'big' in variable and function names and the use of a function-like
macro (rather than implicit conversion, which C doesn't give me a good
way to prevent).

The only way I know of to prevent it is to use a wrapper by-value struct.

... and that's what I get for not finishing the thread before replying.

Anyway, please consider the other point re txid_current() and the user facing xid type.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Usage of epoch in txid_current

От

Andres Freund

Дата:

10 июля 2018 г., 08:40:14

On 2018-07-10 10:32:44 +0800, Craig Ringer wrote:
> On 10 July 2018 at 07:35, Thomas Munro <thomas.munro@enterprisedb.com>
> wrote:
> 
> >
> > I played around with this idea yesterday.  Experiment-grade patch
> > attached.  Approach:
> >
> > 1.  Introduce a new type BigTransactionId (better names welcome).
> >
> 
> txid_current() should be changed to BigTransactionId too. It's currently
> bigint.

That's quite a bit more work though, no? We'd have to introduce a new
SQL level type (including operators).  As I said nearby, I think that's
a good idea, but I don't see any sort of benefit of forcing those
patches to be done at once.

> Users are currently left in a real mess when it comes to pg_stat_activity
> xids, etc, which are epoch-unqualified. txid_current() reports an
> epoch-qualified xid, but there's no sensible and safe conversion from
> txid_current()'s bigint to an epoch-qualified ID.

I am confused what you mean by "to an epoch-qualified ID". You want to
split txid_current()'s return value into epoch and xid? Isn't that
fairly trivial?

SELECT bigxid & x'ffffff'::int8 AS xid, bigxid >> 32 epoch FROM txid_current() bigxid;

Now I'm not saying that's pretty and couldn't be made easier.

Greetings,

Andres Freund

Re: Usage of epoch in txid_current

От

Craig Ringer

Дата:

10 июля 2018 г., 09:32:34

On 10 July 2018 at 10:40, Andres Freund <andres@anarazel.de> wrote:

On 2018-07-10 10:32:44 +0800, Craig Ringer wrote:
> On 10 July 2018 at 07:35, Thomas Munro <thomas.munro@enterprisedb.com>
> wrote:
>
> >
> > I played around with this idea yesterday. Experiment-grade patch
> > attached. Approach:
> >
> > 1. Introduce a new type BigTransactionId (better names welcome).
> >
>
> txid_current() should be changed to BigTransactionId too. It's currently
> bigint.

That's quite a bit more work though, no? We'd have to introduce a new
SQL level type (including operators). As I said nearby, I think that's
a good idea, but I don't see any sort of benefit of forcing those
patches to be done at once.

Yeah. You're right. One step at a time.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

10 июля 2018 г., 10:20:18

On Tue, Jul 10, 2018 at 2:15 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Tue, Jul 10, 2018 at 1:30 PM, Andres Freund <andres@anarazel.de> wrote:,
>>>                       (errmsg_internal("next transaction ID: %u:%u; next OID: %u",
>>> -                                                      checkPoint.nextXidEpoch, checkPoint.nextXid,
>>> +                                                      EpochFromFullTransactionId(checkPoint.nextFullXid),
>>> +                                                      XidFromFullTransactionId(checkPoint.nextFullXid),
>>>                                                        checkPoint.nextOid)));
>>
>> I don't think we should continue to expose epochs, but rather go to full
>> xids where appropriate.
>
> OK, done.

Ugh.  When I changed pg_resetwal.c to print out the 64 bit value, I
broke controldata.c's code that reads those strings back in, as
revealed by a failing pg_upgrade test in check-world that I should
have waited for.  Perhaps we shouldn't change that output format...
though I think it does make sense to move forwards here (and probably
eventually for the other values too).  So here is a version that fixes
the parsing problem.  Unfortunately pg_upgrade is not allowed to use
pg_strtouint64() so for now I jammed the same macrology into
controldata.c, but I assume that could be sorted out.

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

0001-Track-the-next-xid-using-64-bits-v4.patch

Re: Usage of epoch in txid_current

От

Tom Lane

Дата:

16 июля 2018 г., 02:41:35

Andres Freund <andres@anarazel.de> writes:
> On 2018-07-09 19:56:25 -0400, Tom Lane wrote:
>> Or, perhaps, use a struct in assert builds and int64 otherwise?
>> You could hide the ensuing notational differences in macros.

> That should be doable. But I'd like to check if it's necessary
> first. Optimizing passing an 8 byte struct shouldn't be hard for
> compilers these days - especially when most things dealing with them are
> inline functions.  If we find that it's not a problem on contemporary
> compilers, it might be worthwhile to use a bit more type safety in other
> places too.

I checked your example program on hardware I have laying around:

x86_64, gcc 4.4.7 (RHEL6): identical code, confirms your result

x86_64, LLVM 9.1.0 (macOS High Sierra): also identical

x86, gcc 4.2.1 (old macOS --- dromedary's host): also identical code;
this surprised me a bit.  It looks like the ABI convention is that
64-bit values must be passed on the stack but can be returned in a
register pair, and it doesn't matter whether scalar or struct.

PPC, gcc 4.0.1 (ancient macOS --- prairiedog's host): *not* identical.
It looks like 64-bit arguments are passed in registers either way, but
a struct function result is returned in memory not a register.

ARM64, gcc 8.1.1 (Fedora 28): identical code

ARM64, clang 6.0.0 (FreeBSD 12): identical code

ARMv7, gcc 6.3.0 (Raspbian): *not* identical.
Looks like both pass and return conventions are memory-based for structs.

Offhand it would seem that we can get away with struct wrappers
on any platform where performance is really of concern today.

            regards, tom lane

Re: Usage of epoch in txid_current

От

Andres Freund

Дата:

16 июля 2018 г., 19:03:39

Hi,

On 2018-07-15 16:41:35 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-07-09 19:56:25 -0400, Tom Lane wrote:
> >> Or, perhaps, use a struct in assert builds and int64 otherwise?
> >> You could hide the ensuing notational differences in macros.
> 
> > That should be doable. But I'd like to check if it's necessary
> > first. Optimizing passing an 8 byte struct shouldn't be hard for
> > compilers these days - especially when most things dealing with them are
> > inline functions.  If we find that it's not a problem on contemporary
> > compilers, it might be worthwhile to use a bit more type safety in other
> > places too.
> 
> [ bunch of test results ]
> Offhand it would seem that we can get away with struct wrappers
> on any platform where performance is really of concern today.

Cool, thanks for checking!

Greetings,

Andres Freund

Re: Usage of epoch in txid_current

От

Tom Lane

Дата:

16 июля 2018 г., 19:55:48

Andres Freund <andres@anarazel.de> writes:
> On 2018-07-15 16:41:35 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> On 2018-07-09 19:56:25 -0400, Tom Lane wrote:
>>>> Or, perhaps, use a struct in assert builds and int64 otherwise?
>>>> You could hide the ensuing notational differences in macros.

>> [ bunch of test results ]
>> Offhand it would seem that we can get away with struct wrappers
>> on any platform where performance is really of concern today.

> Cool, thanks for checking!

BTW, independently of any performance questions, these results show
that my idea above was untenable anyway.  On those platforms where
there is a codegen difference, doing it like that would have resulted
in an ABI difference between regular and assert builds.  That's something
to avoid, at least in any API that's visible to extension modules ---
we've had project policy for some time that it should be possible to
use non-assert extensions with assert-enabled core and vice versa.

Conceivably we could have used the struct API only under a special
devel flag that few people use except a buildfarm animal or two.
But that's just a pain in the rear.

            regards, tom lane

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

24 июля 2018 г., 11:24:32

On Tue, Jul 17, 2018 at 1:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2018-07-15 16:41:35 -0400, Tom Lane wrote:
>>> Andres Freund <andres@anarazel.de> writes:
>>>> On 2018-07-09 19:56:25 -0400, Tom Lane wrote:
>>>>> Or, perhaps, use a struct in assert builds and int64 otherwise?
>>>>> You could hide the ensuing notational differences in macros.
>
>>> [ bunch of test results ]
>>> Offhand it would seem that we can get away with struct wrappers
>>> on any platform where performance is really of concern today.
>
>> Cool, thanks for checking!

+1

Here's a new version.  I did some cosmetic clean-up, and I dropped the
changes to pg_controldata output, replication epoch/xid processing
code and various similar non-essential changes.  For this patch I want
just the new type + next xid generator + appropriate conversions.

I propose that we get this committed early in the cycle.  That'd
maximise testing time in the tree, fix the bug identified by Amit, and
leave plenty of time for later patches to use FullTransactionId in
more places as appropriate.

Does anyone have specific kinds of validation or testing they'd like to see?

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

0001-Track-the-next-xid-using-64-bits-v5.patch

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

19 сентября 2018 г., 07:58:36

On Tue, Jul 24, 2018 at 5:24 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here's a new version.

Bitrot, rebased.

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

0001-Track-the-next-xid-using-64-bits-v6.patch

Re: Usage of epoch in txid_current

От

Dmitry Dolgov

Дата:

29 ноября 2018 г., 16:13:30

> On Tue, Jul 24, 2018 at 7:25 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> Here's a new version.  I did some cosmetic clean-up, and I dropped the
> changes to pg_controldata output, replication epoch/xid processing
> code and various similar non-essential changes.  For this patch I want
> just the new type + next xid generator + appropriate conversions.
>
> I propose that we get this committed early in the cycle.  That'd
> maximise testing time in the tree, fix the bug identified by Amit, and
> leave plenty of time for later patches to use FullTransactionId in
> more places as appropriate.

Then probably it's the good time to do so. Any opinions or more reviews here?
I'll move it to the next CF for now.

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

03 февраля 2019 г., 14:58:02

On Fri, Nov 30, 2018 at 12:12 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > On Tue, Jul 24, 2018 at 7:25 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> > Here's a new version.  I did some cosmetic clean-up, and I dropped the
> > changes to pg_controldata output, replication epoch/xid processing
> > code and various similar non-essential changes.  For this patch I want
> > just the new type + next xid generator + appropriate conversions.
> >
> > I propose that we get this committed early in the cycle.  That'd
> > maximise testing time in the tree, fix the bug identified by Amit, and
> > leave plenty of time for later patches to use FullTransactionId in
> > more places as appropriate.
>
> Then probably it's the good time to do so. Any opinions or more reviews here?
> I'll move it to the next CF for now.

If there are no objections, I'm planning to do a round of testing and
commit this shortly.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Usage of epoch in txid_current

От

Michael Paquier

Дата:

04 февраля 2019 г., 08:43:44

On Sun, Feb 03, 2019 at 10:58:02PM +1100, Thomas Munro wrote:
> If there are no objections, I'm planning to do a round of testing and
> commit this shortly.

Hm.  That looks sane to me at quick glance.  I am a bit on the edge
regaring the naming "FullTransactionId", which is actually a 64-bit
value with a 32-bit XID and a 32-bit epoch.  Something like
TransactionIdWithEpoch or EpochTransactionId sounds a bit better to
me.  My point is that "Full" is too generic for that.

Moved to next CF for now.
--
Michael

Вложения

signature.asc

Re: Usage of epoch in txid_current

От

Tom Lane

Дата:

04 февраля 2019 г., 08:48:24

Michael Paquier <michael@paquier.xyz> writes:
> Hm.  That looks sane to me at quick glance.  I am a bit on the edge
> regaring the naming "FullTransactionId", which is actually a 64-bit
> value with a 32-bit XID and a 32-bit epoch.  Something like
> TransactionIdWithEpoch or EpochTransactionId sounds a bit better to
> me.  My point is that "Full" is too generic for that.

WideTransactionId, maybe?  I agree that "Full" seems like a poor
adjective here.

            regards, tom lane

Re: Usage of epoch in txid_current

От

Andres Freund

Дата:

04 февраля 2019 г., 09:21:50

On February 4, 2019 6:43:44 AM GMT+01:00, Michael Paquier <michael@paquier.xyz> wrote:
>On Sun, Feb 03, 2019 at 10:58:02PM +1100, Thomas Munro wrote:
>> If there are no objections, I'm planning to do a round of testing and
>> commit this shortly.
>
>Hm.  That looks sane to me at quick glance.  I am a bit on the edge
>regaring the naming "FullTransactionId", which is actually a 64-bit
>value with a 32-bit XID and a 32-bit epoch.  Something like
>TransactionIdWithEpoch or EpochTransactionId sounds a bit better to
>me.  My point is that "Full" is too generic for that.

I'm not a fan of names with epoch in it - these are the real transaction IDs now. Conflating them with the until-now
inferredepochs sounds like a bad idea to me. We IMO should just treat the new type as a 64bit uint, and the 32bit as a
truncatedversion. Like, we could just add 64 as a prefix. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Usage of epoch in txid_current

От

Andres Freund

Дата:

04 февраля 2019 г., 10:41:26

Hi,

On 2018-09-19 13:58:36 +1200, Thomas Munro wrote:

> +/*
> + * Advance nextFullXid to the value after a given xid.  The epoch is inferred.
> + * If lock_free_check is true, then the caller must be sure that it's safe to
> + * read nextFullXid without holding XidGenLock (ie during recovery).
> + */
> +void
> +AdvanceNextFullTransactionIdPastXid(TransactionId xid, bool lock_free_check)
> +{
> +    TransactionId current_xid;
> +    uint32 epoch;
> +
> +    if (lock_free_check &&
> +        !TransactionIdFollowsOrEquals(xid,
> +                                      XidFromFullTransactionId(ShmemVariableCache->nextFullXid)))
> +        return;
> +
> +    LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
> +    current_xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
> +    if (TransactionIdFollowsOrEquals(xid, current_xid))
> +    {
> +        epoch = EpochFromFullTransactionId(ShmemVariableCache->nextFullXid);
> +        if (xid < current_xid)
> +            ++epoch; /* epoch wrapped */
> +        ShmemVariableCache->nextFullXid = MakeFullTransactionId(epoch, xid);
> +        FullTransactionIdAdvance(&ShmemVariableCache->nextFullXid);
> +    }
> +    LWLockRelease(XidGenLock);
>  }

Is this really a good idea? Seems like it's going to perpetuate the
problematic epoch logic we had before?

> --- a/src/bin/pg_resetwal/pg_resetwal.c
> +++ b/src/bin/pg_resetwal/pg_resetwal.c
> @@ -431,11 +431,15 @@ main(int argc, char *argv[])
>       * if any, includes these values.)
>       */
>      if (set_xid_epoch != -1)
> -        ControlFile.checkPointCopy.nextXidEpoch = set_xid_epoch;
> +        ControlFile.checkPointCopy.nextFullXid =
> +            MakeFullTransactionId(set_xid_epoch,
> +                                  XidFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid));
>  
>      if (set_xid != 0)
>      {
> -        ControlFile.checkPointCopy.nextXid = set_xid;
> +        ControlFile.checkPointCopy.nextFullXid =
> +            MakeFullTransactionId(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid),
> +                                  set_xid);
>  

>          /*
>           * For the moment, just set oldestXid to a value that will force
> @@ -705,8 +709,7 @@ GuessControlValues(void)
>      ControlFile.checkPointCopy.ThisTimeLineID = 1;
>      ControlFile.checkPointCopy.PrevTimeLineID = 1;
>      ControlFile.checkPointCopy.fullPageWrites = false;
> -    ControlFile.checkPointCopy.nextXidEpoch = 0;
> -    ControlFile.checkPointCopy.nextXid = FirstNormalTransactionId;
> +    ControlFile.checkPointCopy.nextFullXid = MakeFullTransactionId(0, FirstNormalTransactionId);
>      ControlFile.checkPointCopy.nextOid = FirstBootstrapObjectId;
>      ControlFile.checkPointCopy.nextMulti = FirstMultiXactId;
>      ControlFile.checkPointCopy.nextMultiOffset = 0;
> @@ -786,8 +789,8 @@ PrintControlValues(bool guessed)
>      printf(_("Latest checkpoint's full_page_writes: %s\n"),
>             ControlFile.checkPointCopy.fullPageWrites ? _("on") : _("off"));
>      printf(_("Latest checkpoint's NextXID:          %u:%u\n"),
> -           ControlFile.checkPointCopy.nextXidEpoch,
> -           ControlFile.checkPointCopy.nextXid);
> +           EpochFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid),
> +           XidFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid));
>      printf(_("Latest checkpoint's NextOID:          %u\n"),
>             ControlFile.checkPointCopy.nextOid);
>      printf(_("Latest checkpoint's NextMultiXactId:  %u\n"),
> @@ -879,7 +882,7 @@ PrintNewControlValues(void)
>      if (set_xid != 0)
>      {
>          printf(_("NextXID:                              %u\n"),
> -               ControlFile.checkPointCopy.nextXid);
> +               XidFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid));
>          printf(_("OldestXID:                            %u\n"),
>                 ControlFile.checkPointCopy.oldestXid);
>          printf(_("OldestXID's DB:                       %u\n"),
> @@ -889,7 +892,7 @@ PrintNewControlValues(void)
>      if (set_xid_epoch != -1)
>      {
>          printf(_("NextXID epoch:                        %u\n"),
> -               ControlFile.checkPointCopy.nextXidEpoch);
> +               EpochFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid));
>      }

I still think it's a mistake to not display the full xid here, and
rather perpetuate the epoch stuff. I'm ok with splitting that into a
separate commit, but this ought to be fixed in the same release imo.


> diff --git a/src/include/access/transam.h b/src/include/access/transam.h
> index 83ec3f19797..814becf96d7 100644
> --- a/src/include/access/transam.h
> +++ b/src/include/access/transam.h
> @@ -44,6 +44,32 @@
>  #define TransactionIdStore(xid, dest)    (*(dest) = (xid))
>  #define StoreInvalidTransactionId(dest) (*(dest) = InvalidTransactionId)
>  
> +#define EpochFromFullTransactionId(x)    ((uint32) ((x).value >> 32))
> +#define XidFromFullTransactionId(x)        ((uint32) (x).value)
> +#define U64FromFullTransactionId(x)        ((x).value)
> +#define FullTransactionIdPrecedes(a, b)    ((a).value < (b).value)
> +#define FullTransactionIdPrecedesOrEquals(a, b)    ((a).value <= (b).value)
> +
> +/*
> + * A 64 bit value that contains an epoch and a TransactionId.  This is
> + * wrapped in a struct to prevent implicit conversion to/from TransactionId.
> + * Allowing such conversions seems likely to be error-prone.
> + */
> +typedef struct FullTransactionId
> +{
> +    uint64  value;
> +} FullTransactionId;

Probably good to note here that not all values are valid normal xids.


> +static inline FullTransactionId
> +MakeFullTransactionId(uint32 epoch, TransactionId xid)
> +{
> +    FullTransactionId result;
> +
> +    result.value = ((uint64) epoch) << 32 | xid;
> +
> +    return result;
> +}

Make sounds a bit like it's allocating...


>  /* advance a transaction ID variable, handling wraparound correctly */
>  #define TransactionIdAdvance(dest)    \
>      do { \
> @@ -52,6 +78,16 @@
>              (dest) = FirstNormalTransactionId; \
>      } while(0)
>  
> +/* advance a FullTransactionId variable, stepping over special XIDs */
> +static inline void
> +FullTransactionIdAdvance(FullTransactionId *dest)
> +{
> +    dest->value++;
> +    if (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
> +        *dest = MakeFullTransactionId(EpochFromFullTransactionId(*dest),
> +                                      FirstNormalTransactionId);
> +}

Hm, this seems pretty odd coding to me. Why not do something like

dest->value++;
while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
    dest->value++;

That'd a) be a bit more readable, b) possible to do in a lockfree way at
a later point.


> diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
> index 1fcd8cf1b59..d1116454095 100644
> --- a/src/include/storage/standby.h
> +++ b/src/include/storage/standby.h
> @@ -72,7 +72,7 @@ typedef struct RunningTransactionsData
>      int            xcnt;            /* # of xact ids in xids[] */
>      int            subxcnt;        /* # of subxact ids in xids[] */
>      bool        subxid_overflow;    /* snapshot overflowed, subxids missing */
> -    TransactionId nextXid;        /* copy of ShmemVariableCache->nextXid */
> +    TransactionId nextXid;    /* xid from ShmemVariableCache->nextFullXid */

Probably worthwhile to pgindent partially.

Greetings,

Andres Freund

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

25 марта 2019 г., 07:01:07

On Mon, Feb 4, 2019 at 8:41 PM Andres Freund <andres@anarazel.de> wrote:
> On 2018-09-19 13:58:36 +1200, Thomas Munro wrote:
>
> > +/*
> > + * Advance nextFullXid to the value after a given xid.  The epoch is inferred.
> > + * If lock_free_check is true, then the caller must be sure that it's safe to
> > + * read nextFullXid without holding XidGenLock (ie during recovery).
> > + */
> > +void
> > +AdvanceNextFullTransactionIdPastXid(TransactionId xid, bool lock_free_check)
> > +{
> > +     TransactionId current_xid;
> > +     uint32 epoch;
> > +
> > +     if (lock_free_check &&
> > +             !TransactionIdFollowsOrEquals(xid,
> > +
XidFromFullTransactionId(ShmemVariableCache->nextFullXid)))
> > +             return;
> > +
> > +     LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
> > +     current_xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
> > +     if (TransactionIdFollowsOrEquals(xid, current_xid))
> > +     {
> > +             epoch = EpochFromFullTransactionId(ShmemVariableCache->nextFullXid);
> > +             if (xid < current_xid)
> > +                     ++epoch; /* epoch wrapped */
> > +             ShmemVariableCache->nextFullXid = MakeFullTransactionId(epoch, xid);
> > +             FullTransactionIdAdvance(&ShmemVariableCache->nextFullXid);
> > +     }
> > +     LWLockRelease(XidGenLock);
> >  }
>
> Is this really a good idea? Seems like it's going to perpetuate the
> problematic epoch logic we had before?

We could get rid of this in future, if certain WAL records and 2PC
state start carrying full xids.  That would be much bigger surgery
than I wanted to do in this patch.  This is the only place that
converts 32 bit -> 64 bit with an inferred epoch component.  I have
explained why I think that's correct in new comments along with a new
assertion.

The theory is that the xids encountered in recovery and 2PC startup
can never be too far out of phase with the current nextFullXid.  In
contrast, the original epoch tracking from commit 35af5422 wasn't
bounded in the same sort of way.  Certainly no other code should be
allowed to do this sort of thing without very careful consideration of
how the epoch is bounded.  The patch deliberately provides no general
purpose make-me-a-FullTransactionId-by-guessing-the-epoch facility.

While reviewing this, I also realised that the "lock_free_check"
function argument was unnecessary.  The only place that was setting
that to false, namely RecordKnownAssignedTransactionIds(), might as
well just use the same behaviour.  I've now rewritten
AdvanceNextFullTransactionIdPastXid() completely in light of all that;
please review.

> >       printf(_("Latest checkpoint's full_page_writes: %s\n"),
> >                  ControlFile.checkPointCopy.fullPageWrites ? _("on") : _("off"));
> >       printf(_("Latest checkpoint's NextXID:          %u:%u\n"),
> > -                ControlFile.checkPointCopy.nextXidEpoch,
> > -                ControlFile.checkPointCopy.nextXid);
> > +                EpochFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid),
> > +                XidFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid));

> I still think it's a mistake to not display the full xid here, and
> rather perpetuate the epoch stuff. I'm ok with splitting that into a
> separate commit, but this ought to be fixed in the same release imo.

Ok.

> > +/*
> > + * A 64 bit value that contains an epoch and a TransactionId.  This is
> > + * wrapped in a struct to prevent implicit conversion to/from TransactionId.
> > + * Allowing such conversions seems likely to be error-prone.
> > + */
> > +typedef struct FullTransactionId
> > +{
> > +     uint64  value;
> > +} FullTransactionId;
>
> Probably good to note here that not all values are valid normal xids.

Done.

> > +static inline FullTransactionId
> > +MakeFullTransactionId(uint32 epoch, TransactionId xid)
> > +{
> > +     FullTransactionId result;
> > +
> > +     result.value = ((uint64) epoch) << 32 | xid;
> > +
> > +     return result;
> > +}
>
> Make sounds a bit like it's allocating...

Changed to FullTransactionIdFromEpochAndXid().

> > +     dest->value++;
> > +     if (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
> > +             *dest = MakeFullTransactionId(EpochFromFullTransactionId(*dest),

> Hm, this seems pretty odd coding to me. Why not do something like
>
> dest->value++;
> while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
>     dest->value++;
>
> That'd a) be a bit more readable, b) possible to do in a lockfree way at
> a later point.

Done.

> > -     TransactionId nextXid;          /* copy of ShmemVariableCache->nextXid */
> > +     TransactionId nextXid;  /* xid from ShmemVariableCache->nextFullXid */
>
> Probably worthwhile to pgindent partially.

Done.

I also added FullTransactionId to typedefs.list, bumped
PG_CONTROL_VERSION and did some peformance testing of tight loops of
GetNewTransactionId(), which showed no measurable change (~10 million
single-threaded calls per second either way on the machine I tested).

New version attached.  I'd like to commit this for PG12.

-- 
Thomas Munro
https://enterprisedb.com

Вложения

0001-Track-the-next-transaction-ID-using-64-bits-v7.patch

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

25 марта 2019 г., 13:49:42

On Mon, Mar 25, 2019 at 5:01 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> New version attached.  I'd like to commit this for PG12.

Here is a follow-up sketch patch that shows FullTransactionId being
used in the transaction stack, so you can call eg
GetCurrentFullTransactionId().  A table access method could use this
to avoid the need to freeze stuff later (eg zheap).

I suppose it's not strictly necessary, since you could use
GetCurrentTransactionId() and infer the epoch by comparing with
ReadNextFullTransactionId() (now that the epoch counting is reliable,
due to patch 0001 which I'm repeating again here just for cfbot).  But
I suppose we want to get away from that sort of thing.  Thoughts?


--
Thomas Munro
https://enterprisedb.com

Вложения

Re: Usage of epoch in txid_current

От

Heikki Linnakangas

Дата:

25 марта 2019 г., 17:23:49

On 25/03/2019 12:49, Thomas Munro wrote:
> On Mon, Mar 25, 2019 at 5:01 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> New version attached.  I'd like to commit this for PG12.
> 
> Here is a follow-up sketch patch that shows FullTransactionId being
> used in the transaction stack, so you can call eg
> GetCurrentFullTransactionId().  A table access method could use this
> to avoid the need to freeze stuff later (eg zheap).
> 
> I suppose it's not strictly necessary, since you could use
> GetCurrentTransactionId() and infer the epoch by comparing with
> ReadNextFullTransactionId() (now that the epoch counting is reliable,
> due to patch 0001 which I'm repeating again here just for cfbot).  But
> I suppose we want to get away from that sort of thing.  Thoughts?

Looks good.

I started to write a patch to use XID & epoch in dealing with GiST page 
deletions [1], and I really could've used an epoch to go with 
RecentGlobalXmin. I presume that would be pretty straightforward to have 
with this, too.

[1] 
https://www.postgresql.org/message-id/5f7ed675-d1fc-66ef-f316-645080ff9625@iki.fi

- Heikki

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

26 марта 2019 г., 02:58:48

On Tue, Mar 26, 2019 at 3:23 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Looks good.
>
> I started to write a patch to use XID & epoch in dealing with GiST page
> deletions [1], and I really could've used an epoch to go with
> RecentGlobalXmin. I presume that would be pretty straightforward to have
> with this, too.

Yeah.  A simple way to compute that would be to use FullTransactionId
in PGXACT, so that GetSnapshotData() could find the minimum value and
put that into GlobalRecentFullXmin, and set GlobalRecentXmin with the
truncated version (or perhaps #define it as
(XidFromFullTransactionId(GlobalRecentFullXmin))).  Increasing the
number of cachelines that GetSnapshotData() touches may not be
popular, though.  I think you could probably reclaim that space by
using a more compact representation of vacuumFlags, overflowed,
delayChkpt, nxids (it's funny, the comment says "as tightly as
possible", which clearly isn't the case).  You'd probably also need
atomic 64 bit reads for the FullTransactionIds, which would be ugly on
ancient systems but otherwise no problem.

Instead of all that you might want to just infer the epoch instead.
I'm not sure of the exact logic required for that off-hand.  You'd
probably want nextFullXid as a reference to compute the epoch, but
you'd either need to acquire a lock to read it while you already hold
ProcArrayLock (!), or read it atomically after releasing ProcArrayLock
... but then a Deathstation 9000 might allocate 8 billion more xids
with all the necessary auto-vacuuming to allow that before scheduling
you back onto the CPU.  Admittedly, I haven't though about this very
deeply at all, there may be a simple and correct way to do it.

--
Thomas Munro
https://enterprisedb.com

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

27 марта 2019 г., 14:44:17

On Tue, Mar 26, 2019 at 12:58 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Mar 26, 2019 at 3:23 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > Looks good.

I did some testing and proof-reading and made a few minor changes:

* I tidied up the code that serialises transaction state.  It was
already hammering round pegs into square holes, and the previous patch
made that even worse, so I added a new struct
SerializedTransactionState to do this properly.

* I open-coded Get{Current,Top}TransactionId[IfAny](), rather than
having them call the "Full" variants, so that nobody could accuse me
of adding an extra function call that might not be inlined.  It's just
a couple of lines anyway.

* I kept the name GetNewTransactionId(), since it's referred to in
many places in comments etc.  Previously I had called it
GetNewFullTransactionId() and had GetNewTransactionId() just call that
and truncate to 32 bits, but there wasn't much point without an
in-tree caller for the narrow version.  If there is any out-of-tree
code calling this, it will now fail to compile thanks to our
non-convertible return type.

These are the patches I'm planning to push tomorrow.

I still need to look into Andres's suggestion about getting rid of
epoch from various user interfaces and showing 64 bit numbers.  I
should probably also find a place in the relevant README to explain
this new scheme.  I will post follow-up patches for those.

-- 
Thomas Munro
https://enterprisedb.com

Вложения

Re: Usage of epoch in txid_current

От

Heikki Linnakangas

Дата:

27 марта 2019 г., 15:48:20

On 27/03/2019 13:44, Thomas Munro wrote:
> * I tidied up the code that serialises transaction state.  It was
> already hammering round pegs into square holes, and the previous patch
> made that even worse, so I added a new struct
> SerializedTransactionState to do this properly.

Ah, good, it used to be confusing.

> I still need to look into Andres's suggestion about getting rid of
> epoch from various user interfaces and showing 64 bit numbers.  I
> should probably also find a place in the relevant README to explain
> this new scheme.  I will post follow-up patches for those.

Once we have the FullTransactionId type and basic macros in place, I'm 
sure we could tidy up a bunch of code by using them. For example, 
TransactionIdInRecentPast() in walsender.c would be simpler, if the 
caller dealt with FullTransactionIds rather than xid+epoch. But it makes 
sense to do that separately.

> +/*
> + * Advance nextFullXid to the value after a given xid.  The epoch is inferred.
> + * This must only be called during recovery or from two-phase start-up code.
> + */
> +void
> +AdvanceNextFullTransactionIdPastXid(TransactionId xid)
> +{
> +    FullTransactionId newNextFullXid;
> +    TransactionId next_xid;
> +    uint32        epoch;
> +
> +    /*
> +     * It is safe to read nextFullXid without a lock, because this is only
> +     * called from the startup process, meaning that no other process can
> +     * modify it.
> +     */
> +    Assert(AmStartupProcess());
> +

This assertion fails on WAL replay in single-user mode:

$ bin/postgres --single -D data postgres
2019-03-27 14:32:35.058 EET [32359] LOG:  database system was 
interrupted; last known up at 2019-03-27 14:32:18 EET
2019-03-27 14:32:35.144 EET [32359] LOG:  database system was not 
properly shut down; automatic recovery in progress
2019-03-27 14:32:35.148 EET [32359] LOG:  redo starts at 0/15BB7B0
TRAP: FailedAssertion("!((MyAuxProcType == StartupProcess))", File: 
"varsup.c", Line: 269)
Aborted

- Heikki

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

28 марта 2019 г., 08:29:47

On Thu, Mar 28, 2019 at 1:48 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Once we have the FullTransactionId type and basic macros in place, I'm
> sure we could tidy up a bunch of code by using them. For example,
> TransactionIdInRecentPast() in walsender.c would be simpler, if the
> caller dealt with FullTransactionIds rather than xid+epoch. But it makes
> sense to do that separately.

+1

> > +     /*
> > +      * It is safe to read nextFullXid without a lock, because this is only
> > +      * called from the startup process, meaning that no other process can
> > +      * modify it.
> > +      */
> > +     Assert(AmStartupProcess());
> > +
>
> This assertion fails on WAL replay in single-user mode:

Fixed.  (Embarrassingly I had that working in v7 but broke it in v8).

I decided to do some testing on a 32 bit system, and ran into weird
new problem in heap_compute_xid_horizon_for_tuples() which I assumed
to be somehow my fault due to the mention of xid horizons, but I
eventually realised that master was broken on that machine and
followed that up elsewhere.  Phew.

Thanks for the reviews!  Pushed.

-- 
Thomas Munro
https://enterprisedb.com

Re: Usage of epoch in txid_current

От

Jeff Janes

Дата:

04 мая 2019 г., 20:34:06

On Thu, Mar 28, 2019 at 1:30 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Mar 28, 2019 at 1:48 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Once we have the FullTransactionId type and basic macros in place, I'm
> sure we could tidy up a bunch of code by using them.

Thanks for the reviews! Pushed.

I think that this might be broken.

We have this change:

@@ -73,7 +75,8 @@ GetNewTransactionId(bool isSubXact)

LWLockAcquire(XidGenLock, LW_EXCLUSIVE);

- xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);

+ full_xid = ShmemVariableCache->nextFullXid;

+ xid = XidFromFullTransactionId(full_xid);

But then later on in an little-used code path around line 164:

/* Re-acquire lock and start over */

LWLockAcquire(XidGenLock, LW_EXCLUSIVE);

xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);

full_xid does not get updated, but then later on full_xid gets returned in lieu of xid.

Is there a reason that this is OK?

Cheers,

Jeff

Re: Usage of epoch in txid_current

От

Jeff Janes

Дата:

04 мая 2019 г., 20:45:18

On Sat, May 4, 2019 at 1:34 PM Jeff Janes <jeff.janes@gmail.com> wrote:

On Thu, Mar 28, 2019 at 1:30 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Mar 28, 2019 at 1:48 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Once we have the FullTransactionId type and basic macros in place, I'm
> sure we could tidy up a bunch of code by using them.

Thanks for the reviews! Pushed.

I think that this might be broken.

We have this change:

@@ -73,7 +75,8 @@ GetNewTransactionId(bool isSubXact)

LWLockAcquire(XidGenLock, LW_EXCLUSIVE);

- xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+ full_xid = ShmemVariableCache->nextFullXid;
+ xid = XidFromFullTransactionId(full_xid);

But then later on in an little-used code path around line 164:

/* Re-acquire lock and start over */
LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
xid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);

full_xid does not get updated, but then later on full_xid gets returned in lieu of xid.

Is there a reason that this is OK?

Ah, sorry for the noise. I see that this has been fixed already. I wasn't looking at HEAD, or at the other email thread, when I "discovered" this.

Sorry for the noise.

Cheers,

Jeff

Re: Usage of epoch in txid_current

От

EMMA Jade ANDERSON

Дата:

08 июня 2019 г., 13:35:37

Round pegs fit into square holes when the peg is smaller.,..

Hey why did the chicken follow Simon off the ledge?

Just because he told him to?!! Was he really trying to get home, did he find out the hard way things no one else knows but what he had learnt on the other side, just to let you know.... you don't ever really die. Plan to stay. I know enough to now....

The grass is not greener. There are not ten virgins for each man. The women are equals and love to suck Dick just as much as ya mother. Don't laugh just yet,,.. at this 13th generation level we are all pretty much related. And i am the original original, along with my brothers and your God, a princess to a King, stolen long long ago, Now doing a shit tonne better thank you, for asking.

Thanks for the flowers,,,,, I never got. Thanks for the love though... I did feel that.

I'm one month sober.

It was only when I died again did I get all the secrets that only the dead get and forget before living again just to tell you.,, im alive and i have insider knowledge, you'd pay zillions for!

I jumped into that portal and landed right back where you put me. Now not so glitchy,,, though money alludes me, I see all the numbers in a matrix like Neo, only on my computer screen,,, this world is real, and so am I. Flowers smell beautiful, and look precious.... crystals are purposeful and magic... the sky is an ever changing scenescape that words can only ever scrape the surface of in trying to describe it's magnificence, whatever the weather.

I'm solo. In need of some close up cuddling with Jonathan aka Andrew or whatever you chose, you know who you are. Hurry up, cos I miss you like crazy.... in this secured medicated snow globe I'm safe in right now. All I need is you.

Love your guts,

Em XO

Aka: Surety

On Wed, 27 Mar. 2019, 22:45 Thomas Munro, <thomas.munro@gmail.com> wrote:

On Tue, Mar 26, 2019 at 12:58 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Mar 26, 2019 at 3:23 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > Looks good.

I did some testing and proof-reading and made a few minor changes:

* I tidied up the code that serialises transaction state. It was
already hammering round pegs into square holes, and the previous patch
made that even worse, so I added a new struct
SerializedTransactionState to do this properly.

* I open-coded Get{Current,Top}TransactionId[IfAny](), rather than
having them call the "Full" variants, so that nobody could accuse me
of adding an extra function call that might not be inlined. It's just
a couple of lines anyway.

* I kept the name GetNewTransactionId(), since it's referred to in
many places in comments etc. Previously I had called it
GetNewFullTransactionId() and had GetNewTransactionId() just call that
and truncate to 32 bits, but there wasn't much point without an
in-tree caller for the narrow version. If there is any out-of-tree
code calling this, it will now fail to compile thanks to our
non-convertible return type.

These are the patches I'm planning to push tomorrow.

I still need to look into Andres's suggestion about getting rid of
epoch from various user interfaces and showing 64 bit numbers. I
should probably also find a place in the relevant README to explain
this new scheme. I will post follow-up patches for those.

--
Thomas Munro
https://enterprisedb.com

Re: Usage of epoch in txid_current

От

Alexander Korotkov

Дата:

22 июня 2019 г., 02:01:04

On Thu, Mar 28, 2019 at 8:30 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Thanks for the reviews!  Pushed.

Any ideas we should move towards 64-bit xids in more places?  That has
been discussed several times already.  I think last time it was
discussed in person during FOSDEM PGDay 2018 Developer Meeting [1].
There we've discussed that it probably doesn't worth it to change
32-bit on-disk xids in heap.  It's better to leave existing heap "as
is", but allow other pluggable table access methods to support 64-bit
xids.  Given now we have pluggable table access methods, we may build
a plan on full support of 64-bit xids in core.

In my vision sketchy plan may look like this.

1. Change all non-heap types of xids from TransactionId to
FullTransactionId.  But in shared memory change TransactionId to
pg_atomic_uint64 in order to guarantee atomicity of access, which we
require in multiple places.
2. Also introduce FullMultixactId, and apply to MultixactId the
similar change as #1.
3. Change SLRU on-disk format to handle 64-bit xids and multixacts.
In particular make SLRU page numbers 64-bit too.  Add SLRU upgrade
procedure to pg_upgrade.
4. Change relfrozenxid and relminmxid to something like rellimitxid
and rellimitmxid.  So, we don't imply there is restriction of 32-bit
xids.  Instead, we let table AM place (or don't place) a limit to
advancing nextXid, nextMultixact.
5. Table AM API would be switched to 64-bit xids/multixacts, while
heap will remain using 32-bit.  So, heap should convert 32-bit xid
value to 64-bit basing on nextXid/nextMultixact.  Assuming we set
rellimitxid and rellimitmxid for relation as oldestxid + 2^32 and
oldestmxid + 2^32, we may safely assume the 32-bit values of
xids/multixacts are in 2^32 range before nextXid/nextMultixact.
Thanks to this even in heap we would be able to operate 2^32
xids/multixacts simultaneously instead of 2^31 we have now.

So, this is how I see the possible plan.  I would be glad to see a feedback.

Unfortunately, I don't have enough time to implement all of this.  But
I think there are many interested parties in finally having 64-bit
xids in core.  Especially assuming we now have pluggable table AMs,
and it would be ridiculous to spear limitation of 32-bit xids to new
table AMs.

Links

1. https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2018_Developer_Meeting

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Usage of epoch in txid_current

От

Robert Haas

Дата:

24 июня 2019 г., 18:19:13

On Fri, Jun 21, 2019 at 7:01 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Thu, Mar 28, 2019 at 8:30 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Thanks for the reviews!  Pushed.
>
> Any ideas we should move towards 64-bit xids in more places?  That has
> been discussed several times already.  I think last time it was
> discussed in person during FOSDEM PGDay 2018 Developer Meeting [1].
> There we've discussed that it probably doesn't worth it to change
> 32-bit on-disk xids in heap.  It's better to leave existing heap "as
> is", but allow other pluggable table access methods to support 64-bit
> xids.  Given now we have pluggable table access methods, we may build
> a plan on full support of 64-bit xids in core.
>
> In my vision sketchy plan may look like this.
>
> 1. Change all non-heap types of xids from TransactionId to
> FullTransactionId.

I think it's fine to replace TransactionId with FullTransactionId
without stressing about it too much in code that's not that heavily
trafficked. However, I'm not sure we can do that across the board. For
example, converting snapshots to use 64-bit XIDs would mean that in
the worst case a snapshot will use twice as many cache lines, and that
might have performance implications on some workloads.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Usage of epoch in txid_current

От

Andres Freund

Дата:

24 июня 2019 г., 18:26:49

Hi,

On June 24, 2019 8:19:13 AM PDT, Robert Haas <robertmhaas@gmail.com> wrote:
>On Fri, Jun 21, 2019 at 7:01 PM Alexander Korotkov
><a.korotkov@postgrespro.ru> wrote:
>> On Thu, Mar 28, 2019 at 8:30 AM Thomas Munro <thomas.munro@gmail.com>
>wrote:
>> > Thanks for the reviews!  Pushed.
>>
>> Any ideas we should move towards 64-bit xids in more places?  That
>has
>> been discussed several times already.  I think last time it was
>> discussed in person during FOSDEM PGDay 2018 Developer Meeting [1].
>> There we've discussed that it probably doesn't worth it to change
>> 32-bit on-disk xids in heap.  It's better to leave existing heap "as
>> is", but allow other pluggable table access methods to support 64-bit
>> xids.  Given now we have pluggable table access methods, we may build
>> a plan on full support of 64-bit xids in core.
>>
>> In my vision sketchy plan may look like this.
>>
>> 1. Change all non-heap types of xids from TransactionId to
>> FullTransactionId.
>
>I think it's fine to replace TransactionId with FullTransactionId
>without stressing about it too much in code that's not that heavily
>trafficked. However, I'm not sure we can do that across the board. For
>example, converting snapshots to use 64-bit XIDs would mean that in
>the worst case a snapshot will use twice as many cache lines, and that
>might have performance implications on some workloads.

We could probably expand the transaction IDs on access or when computing them for most of these, as usually they'll
largely be about currently running transactions. It e.g. seems sensible to keep the procarray at 32 bit xids, expand
xmin/xmaxto 64 when computing snapshots, and leave the snapshot's transaction ID array at 32xids. That ought to be an
negligibleoverhead. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Usage of epoch in txid_current

От

Alvaro Herrera

Дата:

24 июня 2019 г., 20:43:03

On 2019-Jun-22, Alexander Korotkov wrote:

> 2. Also introduce FullMultixactId, and apply to MultixactId the
> similar change as #1.
> 3. Change SLRU on-disk format to handle 64-bit xids and multixacts.
> In particular make SLRU page numbers 64-bit too.  Add SLRU upgrade
> procedure to pg_upgrade.

I think enlarging multixacts to 64 bits is a terrible idea.  I would
rather get rid of multixacts completely; zheap is proposing not to use
multixacts at all, for example.  The amount of bloat caused by
pg_multixact data is already pretty bad ... because of which requiring
pg_multixact to be rewritten by pg_upgrade would cause a severe slowdown
for upgrades.  (It worked for FSM because the volume is tiny, but that's
not the case for multixact.)

I think the pg_upgrade problem can be worked around by creating a new
dir pg_multixact64 (an example) which is populated from the upgrade
point onwards; so you keep the old data unchanged, and new multixacts
use the new location, but the system knows to read the old one when
reading old tuples.  But, as I said above, I would rather not have
multixacts at all.

Another idea: create a new table AM that mimics heapam (I think ß-heap
"eszett-heap" is a good name), except that it reuses zheap's idea of
keeping "transaction locks" separately for tuple locks rather than
multixacts; heapam continues to use 32bits multixact.  Tables can be
migrated from heapam to ß-heap (alter table ... set access method) to
incrementally reduce reliance on multixacts going forward.  No need for
pg_upgrade-time disruption.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Usage of epoch in txid_current

От

Alexander Korotkov

Дата:

28 июня 2019 г., 04:12:12

Hi!

On Mon, Jun 24, 2019 at 6:27 PM Andres Freund <andres@anarazel.de> wrote:
> On June 24, 2019 8:19:13 AM PDT, Robert Haas <robertmhaas@gmail.com> wrote:
> >On Fri, Jun 21, 2019 at 7:01 PM Alexander Korotkov
> ><a.korotkov@postgrespro.ru> wrote:
> >> On Thu, Mar 28, 2019 at 8:30 AM Thomas Munro <thomas.munro@gmail.com>
> >wrote:
> >> > Thanks for the reviews!  Pushed.
> >>
> >> Any ideas we should move towards 64-bit xids in more places?  That
> >has
> >> been discussed several times already.  I think last time it was
> >> discussed in person during FOSDEM PGDay 2018 Developer Meeting [1].
> >> There we've discussed that it probably doesn't worth it to change
> >> 32-bit on-disk xids in heap.  It's better to leave existing heap "as
> >> is", but allow other pluggable table access methods to support 64-bit
> >> xids.  Given now we have pluggable table access methods, we may build
> >> a plan on full support of 64-bit xids in core.
> >>
> >> In my vision sketchy plan may look like this.
> >>
> >> 1. Change all non-heap types of xids from TransactionId to
> >> FullTransactionId.
> >
> >I think it's fine to replace TransactionId with FullTransactionId
> >without stressing about it too much in code that's not that heavily
> >trafficked. However, I'm not sure we can do that across the board. For
> >example, converting snapshots to use 64-bit XIDs would mean that in
> >the worst case a snapshot will use twice as many cache lines, and that
> >might have performance implications on some workloads.
>
> We could probably expand the transaction IDs on access or when computing them for most of these, as usually they'll
largely be about currently running transactions. It e.g. seems sensible to keep the procarray at 32 bit xids, expand
xmin/xmaxto 64 when computing snapshots, and leave the snapshot's transaction ID array at 32xids. That ought to be an
negligibleoverhead. 

I see, replace TransactionId with FullTransactionId just everywhere
doesn't look like good idea.

Given now we have pluggable table AMs, new table AMs may not have data
wraparound problem.  For instance, table AM could store xids 64-bit
internally, and convert them to 32-bit on-the-fly.  If xid is too old
for conversion, just replace it with FrozenTransactionId.  So, the
restriction we really have now is that running xacts and active
snapshots should fit 2^31 range. Turning Snapshot xmin/xmas into
64-bit will soften this restriction, then just running xacts should
fit 2^31 range while snapshots could be older.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Usage of epoch in txid_current

От

Alexander Korotkov

Дата:

28 июня 2019 г., 05:15:31

On Mon, Jun 24, 2019 at 8:43 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2019-Jun-22, Alexander Korotkov wrote:
>
> > 2. Also introduce FullMultixactId, and apply to MultixactId the
> > similar change as #1.
> > 3. Change SLRU on-disk format to handle 64-bit xids and multixacts.
> > In particular make SLRU page numbers 64-bit too.  Add SLRU upgrade
> > procedure to pg_upgrade.
>
> I think enlarging multixacts to 64 bits is a terrible idea.  I would
> rather get rid of multixacts completely; zheap is proposing not to use
> multixacts at all, for example.  The amount of bloat caused by
> pg_multixact data is already pretty bad ... because of which requiring
> pg_multixact to be rewritten by pg_upgrade would cause a severe slowdown
> for upgrades.  (It worked for FSM because the volume is tiny, but that's
> not the case for multixact.)
>
> I think the pg_upgrade problem can be worked around by creating a new
> dir pg_multixact64 (an example) which is populated from the upgrade
> point onwards; so you keep the old data unchanged, and new multixacts
> use the new location, but the system knows to read the old one when
> reading old tuples.  But, as I said above, I would rather not have
> multixacts at all.
>
> Another idea: create a new table AM that mimics heapam (I think ß-heap
> "eszett-heap" is a good name), except that it reuses zheap's idea of
> keeping "transaction locks" separately for tuple locks rather than
> multixacts; heapam continues to use 32bits multixact.  Tables can be
> migrated from heapam to ß-heap (alter table ... set access method) to
> incrementally reduce reliance on multixacts going forward.  No need for
> pg_upgrade-time disruption.

We need multixacts to store row-level locks information.  I remember
they weren't crash-safe some time ago; because we basically don't need
lock information about previous server run: all that locks are for
sure released.  Due to some difficulties we finally made them
crash-safe (I didn't follow that in much details).  But what about
discarding mulixacts on pg_upgrade?  Is it feasible goal?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Usage of epoch in txid_current

От

Robert Haas

Дата:

30 июня 2019 г., 00:06:58

On Mon, Jun 24, 2019 at 1:43 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> I think enlarging multixacts to 64 bits is a terrible idea.  I would
> rather get rid of multixacts completely; zheap is proposing not to use
> multixacts at all, for example.

But zedstore, at least as of the Saturday after PGCon, is proposing to
keep using them, after first widening them to 64 bits:

https://www.postgresql.org/message-id/CA+TgmoYeTeQSmALox0PmSm5Gh03oe=UNjhmL+K+btofY_U2jFQ@mail.gmail.com

I think we all have a visceral reaction against MultiXacts at this
point; they've just caused us too much pain, and the SLRU
infrastructure is a bunch of crap.[1] However, the case where N
sessions all do "SELECT * FROM sometab FOR SHARE" is pretty wretched
without multixacts.  You'll just have to keep reducing the tuple
density per page to fit all the locks, whereas the current heap is
totally fine and neither the heap nor the multixact space bloats all
that terribly much.

I currently think it's likely that zheap will use something
multixact-like rather than actually using multixacts per se.  I am
having trouble seeing how we can have some sort of fixed-bit-width ID
number that represents as set of XIDs for purposes of lock-tracking
without creating some nasty degenerate cases that don't exist
today.[2]

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] At least in comparison to other parts of the PostgreSQL
infrastructure which are more awesome.
[2] I'd like to credit Andres Freund for helping me understand this
issue better.

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

02 июля 2019 г., 02:33:51

On Sat, Jun 22, 2019 at 11:01 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> 3. Change SLRU on-disk format to handle 64-bit xids and multixacts.
> In particular make SLRU page numbers 64-bit too.  Add SLRU upgrade
> procedure to pg_upgrade.

+1 for doing this for the xid-indexed SLRUs so the segment file names
are never recycled.  Having yet another level of wraparound code to
feed and water and debug is not nice:

https://www.postgresql.org/message-id/flat/20190202083822.GC32531%40gust.leadboat.com

-- 
Thomas Munro
https://enterprisedb.com

Re: Usage of epoch in txid_current

От

Thomas Munro

Дата:

02 июля 2019 г., 02:57:03

On Sun, Jun 30, 2019 at 9:07 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jun 24, 2019 at 1:43 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > I think enlarging multixacts to 64 bits is a terrible idea.  I would
> > rather get rid of multixacts completely; zheap is proposing not to use
> > multixacts at all, for example.
>
> But zedstore, at least as of the Saturday after PGCon, is proposing to
> keep using them, after first widening them to 64 bits:
>
> https://www.postgresql.org/message-id/CA+TgmoYeTeQSmALox0PmSm5Gh03oe=UNjhmL+K+btofY_U2jFQ@mail.gmail.com
>
> I think we all have a visceral reaction against MultiXacts at this
> point; they've just caused us too much pain, and the SLRU
> infrastructure is a bunch of crap.[1] However, the case where N
> sessions all do "SELECT * FROM sometab FOR SHARE" is pretty wretched
> without multixacts.  You'll just have to keep reducing the tuple
> density per page to fit all the locks, whereas the current heap is
> totally fine and neither the heap nor the multixact space bloats all
> that terribly much.
>
> I currently think it's likely that zheap will use something
> multixact-like rather than actually using multixacts per se.  I am
> having trouble seeing how we can have some sort of fixed-bit-width ID
> number that represents as set of XIDs for purposes of lock-tracking
> without creating some nasty degenerate cases that don't exist
> today.[2]

The new thing described over here is intended to support something a
bit like multixacts:

https://www.postgresql.org/message-id/CA%2BhUKGKni7EEU4FT71vZCCwPeaGb2PQOeKOFjQJavKnD577UMQ%40mail.gmail.com

For example, here is some throw-away demo code that puts arbitrary
data into an undo log, in this case a list of xids given with SELECT
test_multixact(ARRAY[1234, 2345, 3456]), and provides a callback to
say when the data can be discarded, in this case when all of those
xids are no longer running.  You can see the record with SELECT * FROM
undoinspect('shared'), until it gets eaten by a background worker.
The reason it has to be an in-core demo is just because we don't have
a way to do extensions that have an RMGR ID and callback functions
yet.  Although this demo throws the return value away, the function
PrepareUndoInsert() returns a 64 bit UndoRecPtr which could be stored
in any page and can be used to retrieve the records (via the buffer
pool) including the binary payload which can be whatever struct you
like (though there is a size limit so you might need more than one
record for a huge list of multi-lockers).  When you try to load the
record, you might be told that it's gone, which means that the lockers
are gone, whcih means that your callback must have said that was OK.
This probably makes most sense for a system that is already planning
to use UndoRecPtr for other things, like MVCC, since it probably
already has per page or per tuple space to store UndoRecPtr, and
updates and explicit locks are so closely related.

https://github.com/EnterpriseDB/zheap/commit/c92fdfd1f1178cbb44557a7c630b366d1539c332

-- 
Thomas Munro
https://enterprisedb.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Usage of epoch in txid_current

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения