Обсуждение: Portability issues in shm_mq

Поиск

Список

Период

Сортировка

Portability issues in shm_mq

От

Tom Lane

Дата:

14 марта 2014 г., 23:43:46

Whilst setting up a buildfarm member on an old, now-spare Mac, I was
somewhat astonished to discover that contrib/test_shm_mq crashes thus:
TRAP: FailedAssertion("!(rb >= sizeof(uint64))", File: "shm_mq.c", Line: 429)
but only in UTF8 locales, not in C locale.  You'd have bet your last
dollar that that code was locale-independent, right?

The reason appears to be that in the payload string generated with
(select string_agg(chr(32+(random()*96)::int), '') from generate_series(1,400))
the chr() argument rounds up to 128 every so often.  In UTF8 encoding,
that causes chr() to return a multibyte character instead of a single
byte.  So, instead of always having a fixed payload string length of
400 bytes, the payload length moves around a bit --- in a few trials
I see anywhere from 400 to 409 bytes.

How is that leading to a crash?  Well, this machine is 32-bit, so MAXALIGN
is only 4.  This means it is possible for an odd-length message cum
message length word to not exactly divide the size of the shared memory
ring buffer, resulting in cases where an 8-byte message length word is
wrapped around the end of the buffer.  shm_mq_receive_bytes makes no
attempt to hide that situation from its caller, and happily returns just
4 bytes with SHM_MQ_SUCCESS.  shm_mq_receive, on the other hand, is so
confident that it will always get an indivisible length word that it just
Asserts that that's the case.

Recommendations:

1. Reduce the random() multiplier from 96 to 95.  In multibyte encodings
other than UTF8, chr() would flat out reject values of 128, so this test
case is unportable.

2. Why in the world is the test case testing exactly one message length
that happens to be a multiple of 8?  Put some randomness into that,
instead.

3. Either you need to work a bit harder at forcing alignment, or you need
to fix shm_mq_receive to cope with split message length words.

4. The header comment for shm_mq_receive_bytes may once have described its
API accurately, but that appears to have been a long long time ago in a
galaxy far far away.  Please fix.


Also, while this is not directly your problem, it's becoming clear that we
don't have enough buildfarm coverage of not-64-bit platforms; this problem
would have been spotted awhile ago if we did.  I'm going to spin up a
couple of critters on old machines lying around my office.  We should
probably also encourage owners of existing critters to expand their test
coverage a bit, eg try locales other than C.
        regards, tom lane

Re: Portability issues in shm_mq

От

Robert Haas

Дата:

16 марта 2014 г., 04:24:47

On Fri, Mar 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Whilst setting up a buildfarm member on an old, now-spare Mac, I was
> somewhat astonished to discover that contrib/test_shm_mq crashes thus:
> TRAP: FailedAssertion("!(rb >= sizeof(uint64))", File: "shm_mq.c", Line: 429)
> but only in UTF8 locales, not in C locale.  You'd have bet your last
> dollar that that code was locale-independent, right?
>
> The reason appears to be that in the payload string generated with
> (select string_agg(chr(32+(random()*96)::int), '') from generate_series(1,400))
> the chr() argument rounds up to 128 every so often.  In UTF8 encoding,
> that causes chr() to return a multibyte character instead of a single
> byte.  So, instead of always having a fixed payload string length of
> 400 bytes, the payload length moves around a bit --- in a few trials
> I see anywhere from 400 to 409 bytes.
>
> How is that leading to a crash?  Well, this machine is 32-bit, so MAXALIGN
> is only 4.  This means it is possible for an odd-length message cum
> message length word to not exactly divide the size of the shared memory
> ring buffer, resulting in cases where an 8-byte message length word is
> wrapped around the end of the buffer.  shm_mq_receive_bytes makes no
> attempt to hide that situation from its caller, and happily returns just
> 4 bytes with SHM_MQ_SUCCESS.  shm_mq_receive, on the other hand, is so
> confident that it will always get an indivisible length word that it just
> Asserts that that's the case.

Argh.  I think I forced the size of the buffer to be MAXALIGN'd, but
what it really needs is to be a multiple of the size of uint64.

> Recommendations:
>
> 1. Reduce the random() multiplier from 96 to 95.  In multibyte encodings
> other than UTF8, chr() would flat out reject values of 128, so this test
> case is unportable.

Agreed.

> 2. Why in the world is the test case testing exactly one message length
> that happens to be a multiple of 8?  Put some randomness into that,
> instead.

Good idea.  I think that started out as a performance test rather than
an integrity test, and I didn't think hard enough when revising it
about what would make a good integrity test.

> 3. Either you need to work a bit harder at forcing alignment, or you need
> to fix shm_mq_receive to cope with split message length words.

The first one is what is intended.  I will look at it.

> 4. The header comment for shm_mq_receive_bytes may once have described its
> API accurately, but that appears to have been a long long time ago in a
> galaxy far far away.  Please fix.

Ugh, looks like I forgot to update that when I introduced the
shm_mq_result return type.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Portability issues in shm_mq

От

Tom Lane

Дата:

16 марта 2014 г., 18:26:22

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Mar 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> How is that leading to a crash?  Well, this machine is 32-bit, so MAXALIGN
>> is only 4.  This means it is possible for an odd-length message cum
>> message length word to not exactly divide the size of the shared memory
>> ring buffer, resulting in cases where an 8-byte message length word is
>> wrapped around the end of the buffer.

> Argh.  I think I forced the size of the buffer to be MAXALIGN'd, but
> what it really needs is to be a multiple of the size of uint64.

After sleeping on it, I think what you're proposing here is to double down
on a wrong design decision.  ISTM you should change the message length
words to be size_t (or possibly ssize_t, if you're depending on signed
arithmetic), which would let you keep using MAXALIGN as the alignment
macro.  There is absolutely no benefit, either for performance or code
readability, in forcing 32-bit machines to use 64-bit message length
words.  Indeed, by not using the same alignment macros as everywhere else
and not being able to use %zu for debug printouts, I think the only real
effect you're producing is to make the DSM/MQ stuff more and more randomly
unlike the rest of Postgres.  Please reconsider while it's still not too
late to change those APIs.
        regards, tom lane

Re: Portability issues in shm_mq

От

Robert Haas

Дата:

17 марта 2014 г., 04:38:20

On Sun, Mar 16, 2014 at 11:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Mar 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> How is that leading to a crash?  Well, this machine is 32-bit, so MAXALIGN
>>> is only 4.  This means it is possible for an odd-length message cum
>>> message length word to not exactly divide the size of the shared memory
>>> ring buffer, resulting in cases where an 8-byte message length word is
>>> wrapped around the end of the buffer.
>
>> Argh.  I think I forced the size of the buffer to be MAXALIGN'd, but
>> what it really needs is to be a multiple of the size of uint64.
>
> After sleeping on it, I think what you're proposing here is to double down
> on a wrong design decision.  ISTM you should change the message length
> words to be size_t (or possibly ssize_t, if you're depending on signed
> arithmetic), which would let you keep using MAXALIGN as the alignment
> macro.  There is absolutely no benefit, either for performance or code
> readability, in forcing 32-bit machines to use 64-bit message length
> words.  Indeed, by not using the same alignment macros as everywhere else
> and not being able to use %zu for debug printouts, I think the only real
> effect you're producing is to make the DSM/MQ stuff more and more randomly
> unlike the rest of Postgres.  Please reconsider while it's still not too
> late to change those APIs.

Hmm.  That's not a terrible idea.  I think part of the reason I did it
this way because, although the size of an individual message can be
limited to size_t, the queue maintains a counter of the total number
of bytes ever sent and received, and that has to use 64-bit arithmetic
so it doesn't overflow (much as we do for LSNs).  It seemed simpler to
make all of the lengths uint64 rather than the lengths of individual
messages Size and the total number of bytes sent and received uint64.
However, that could probably be worked out.

But I think there's another possible problem here.  In order for reads
from the buffer not to suffer alignment problems, the chunk size for
reads and writes from the buffer needs to be MAXIMUM_ALIGNOF (or some
multiple of it).  And in order to avoid a great deal of additional and
unwarranted complexity, the size of the message word also needs to be
MAXIMUM_ALIGNOF (or some multiple of it).  So the message word can
only be of size 4 if MAXIMUM_ALIGNOF is also 4.  IOW, I think your
approach is going to run into trouble on any system where
sizeof(Size)==4 but MAXIMUM_ALIGNOF==8.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Portability issues in shm_mq

От

Tom Lane

Дата:

17 марта 2014 г., 05:45:16

Robert Haas <robertmhaas@gmail.com> writes:
> But I think there's another possible problem here.  In order for reads
> from the buffer not to suffer alignment problems, the chunk size for
> reads and writes from the buffer needs to be MAXIMUM_ALIGNOF (or some
> multiple of it).  And in order to avoid a great deal of additional and
> unwarranted complexity, the size of the message word also needs to be
> MAXIMUM_ALIGNOF (or some multiple of it).  So the message word can
> only be of size 4 if MAXIMUM_ALIGNOF is also 4.  IOW, I think your
> approach is going to run into trouble on any system where
> sizeof(Size)==4 but MAXIMUM_ALIGNOF==8.

Well, it will result in padding space when you maxalign the length word,
but I don't see why it wouldn't work; and it would certainly be no less
efficient than what's there today.

I'll be quite happy to test the results on my old HPPA box, which has
exactly those properties, if you're worried about it.
        regards, tom lane

Re: Portability issues in shm_mq

От

Robert Haas

Дата:

17 марта 2014 г., 18:59:16

On Sun, Mar 16, 2014 at 10:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> But I think there's another possible problem here.  In order for reads
>> from the buffer not to suffer alignment problems, the chunk size for
>> reads and writes from the buffer needs to be MAXIMUM_ALIGNOF (or some
>> multiple of it).  And in order to avoid a great deal of additional and
>> unwarranted complexity, the size of the message word also needs to be
>> MAXIMUM_ALIGNOF (or some multiple of it).  So the message word can
>> only be of size 4 if MAXIMUM_ALIGNOF is also 4.  IOW, I think your
>> approach is going to run into trouble on any system where
>> sizeof(Size)==4 but MAXIMUM_ALIGNOF==8.
>
> Well, it will result in padding space when you maxalign the length word,
> but I don't see why it wouldn't work; and it would certainly be no less
> efficient than what's there today.

Well, the problem is with this:
   /* Write the message length into the buffer. */   if (!mqh->mqh_did_length_word)   {       res =
shm_mq_send_bytes(mqh,sizeof(uint64), &nbytes, nowait,                               &bytes_written);

If I change nbytes to be of type Size, and the second argument to
sizeof(Size), then it's wrong whenever sizeof(Size) % MAXIMUM_ALIGNOF
!= 0.  I could do something like:

union
{   char pad[MAXIMUM_ALIGNOF];   Size val;
} padded_size;

padded_size.val = nbytes;
res = shm_mq_send_bytes(mqh, sizeof(padded_size), &padded_size,
nowait, &bytes_written);

...but that probably *is* less efficient, and it's certainly a lot
uglier; a similar hack will be needed when extracting the length word,
and assertions elsewhere will need adjustment.  I wonder if it
wouldn't be better to adjust the external API to use Size just for
consistency but, internally to the module, keep using 8 byte sizes
within the buffer.  Really, I think that would boil down to going
through and making sure that we use TYPEALIGN(...,sizeof(uint64))
everywhere instead of MAXALIGN(), which doesn't seem like a big deal.
On the third hand, maybe there are or will be platforms out there
where MAXIMUM_ALIGNOF > 8.  If so, it's probably best to bite the
bullet and allow for padding now, so we don't have to monkey with this
again later.

Sorry for belaboring this point, but I want to make sure I only need
to fix this once.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Portability issues in shm_mq

От

Tom Lane

Дата:

17 марта 2014 г., 19:03:55

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Mar 16, 2014 at 10:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, it will result in padding space when you maxalign the length word,
>> but I don't see why it wouldn't work; and it would certainly be no less
>> efficient than what's there today.

> Well, the problem is with this:

>     /* Write the message length into the buffer. */
>     if (!mqh->mqh_did_length_word)
>     {
>         res = shm_mq_send_bytes(mqh, sizeof(uint64), &nbytes, nowait,
>                                 &bytes_written);

> If I change nbytes to be of type Size, and the second argument to
> sizeof(Size), then it's wrong whenever sizeof(Size) % MAXIMUM_ALIGNOF
> != 0.

Well, you need to maxalign the number of bytes physically inserted into
the queue.  Doesn't shm_mq_send_bytes do that?  Where do you do the
maxaligning of the message payload data, when the payload is odd-length?
I would have expected some logic like "copy N bytes but then advance
the pointer by maxalign(N)".
        regards, tom lane

Re: Portability issues in shm_mq

От

Robert Haas

Дата:

17 марта 2014 г., 20:26:31

On Mon, Mar 17, 2014 at 12:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Mar 16, 2014 at 10:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Well, it will result in padding space when you maxalign the length word,
>>> but I don't see why it wouldn't work; and it would certainly be no less
>>> efficient than what's there today.
>
>> Well, the problem is with this:
>
>>     /* Write the message length into the buffer. */
>>     if (!mqh->mqh_did_length_word)
>>     {
>>         res = shm_mq_send_bytes(mqh, sizeof(uint64), &nbytes, nowait,
>>                                 &bytes_written);
>
>> If I change nbytes to be of type Size, and the second argument to
>> sizeof(Size), then it's wrong whenever sizeof(Size) % MAXIMUM_ALIGNOF
>> != 0.
>
> Well, you need to maxalign the number of bytes physically inserted into
> the queue.  Doesn't shm_mq_send_bytes do that?  Where do you do the
> maxaligning of the message payload data, when the payload is odd-length?
> I would have expected some logic like "copy N bytes but then advance
> the pointer by maxalign(N)".

Oh, yeah.  Duh.  Clearly my brain isn't working today.  Hmm, so maybe
this will be fairly simple... will try it out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Portability issues in shm_mq

От

Robert Haas

Дата:

18 марта 2014 г., 05:50:01

On Mon, Mar 17, 2014 at 1:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Oh, yeah.  Duh.  Clearly my brain isn't working today.  Hmm, so maybe
> this will be fairly simple... will try it out.

OK, I tried this out.  The major complication that cropped up was
that, if we make the length word always a Size but align the buffer to
MAXIMUM_ALIGNOF, then the length word might get split if sizeof(Size)
> MAXIMUM_ALIGNOF.  That doesn't look too bad, but required changing a
couple of if statements into while loops, and changing around the
structure of a shm_mq_handle a bit.  See attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

shm-mq-size.patch

Re: Portability issues in shm_mq

От

Tom Lane

Дата:

18 марта 2014 г., 06:09:34

Robert Haas <robertmhaas@gmail.com> writes:
> OK, I tried this out.  The major complication that cropped up was
> that, if we make the length word always a Size but align the buffer to
> MAXIMUM_ALIGNOF, then the length word might get split if sizeof(Size)
> > MAXIMUM_ALIGNOF.

Hmm ... do we support any platforms where that's actually the case?
It's possible I guess but I don't know of any offhand.  The reverse
case is real, but I'm not sure about this one.

> That doesn't look too bad, but required changing a
> couple of if statements into while loops, and changing around the
> structure of a shm_mq_handle a bit.  See attached.

Would it get noticeably simpler or faster if you omitted support for
the sizeof(Size) > MAXIMUM_ALIGNOF case?  It looks like perhaps not,
but if we were paying anything much I'd be tempted to just put in
a static assert to the contrary and see if anyone complains.

BTW, can't we drop the MAXALIGN64 stuff again?
        regards, tom lane

Re: Portability issues in shm_mq

От

Robert Haas

Дата:

18 марта 2014 г., 17:35:23

On Mon, Mar 17, 2014 at 11:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> OK, I tried this out.  The major complication that cropped up was
>> that, if we make the length word always a Size but align the buffer to
>> MAXIMUM_ALIGNOF, then the length word might get split if sizeof(Size)
>> > MAXIMUM_ALIGNOF.
>
> Hmm ... do we support any platforms where that's actually the case?
> It's possible I guess but I don't know of any offhand.  The reverse
> case is real, but I'm not sure about this one.

Man, I don't have a clue what exists.  I certainly wouldn't want to
discourage someone from making a 64-bit machine that only requires
32-bit alignment for 8-bit values, but color me clueless as to whether
such things are really out there.

>> That doesn't look too bad, but required changing a
>> couple of if statements into while loops, and changing around the
>> structure of a shm_mq_handle a bit.  See attached.
>
> Would it get noticeably simpler or faster if you omitted support for
> the sizeof(Size) > MAXIMUM_ALIGNOF case?  It looks like perhaps not,
> but if we were paying anything much I'd be tempted to just put in
> a static assert to the contrary and see if anyone complains.

Not really.  I installed a fast path into the receive code for the
common case where the length word isn't split, which will always be
true on platforms where sizeof(Size) <= MAXIMUM_ALIGNOF and usually
true otherwise.  We could ditch the slow path completely by ignoring
that case, but it's not all that much code.  On the sending side, the
logic is pretty trivial, so I definitely don't feel bad about carrying
that.

The thing I kind of like about this approach is that it makes the code
fully independent of the relationship between MAXIMUM_ALIGNOF and
sizeof(Size).  If the former is smaller, we'll write the length word
in chunks if needed; if the latter is smaller, we'll insert useless
padding bytes.  In the perhaps-common case where they're identical, it
all works as before, except for minor space savings on 32-bit
platforms.  I was a little annoyed by having to write the extra code
and thought about objecting to this whole line of attack yet again,
but I think it's actually likely for the best.  If we start persuading
ourselves that certain cases don't need to work, and rely on that
throughout the backend, and then such machines crop up and we want to
support them, we'll have a deep hole to climb out of.  With this
approach, there might be bugs, of course, but it's a lot easier to fix
a bug that only occurs on a new platform than it is to reconsider the
whole design in light of a new platform.

> BTW, can't we drop the MAXALIGN64 stuff again?

It's still used by xlog.c.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Portability issues in shm_mq

От

Tom Lane

Дата:

18 марта 2014 г., 17:44:15

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Mar 17, 2014 at 11:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Would it get noticeably simpler or faster if you omitted support for
>> the sizeof(Size) > MAXIMUM_ALIGNOF case?  It looks like perhaps not,
>> but if we were paying anything much I'd be tempted to just put in
>> a static assert to the contrary and see if anyone complains.

> Not really.  I installed a fast path into the receive code for the
> common case where the length word isn't split, which will always be
> true on platforms where sizeof(Size) <= MAXIMUM_ALIGNOF and usually
> true otherwise.  We could ditch the slow path completely by ignoring
> that case, but it's not all that much code.  On the sending side, the
> logic is pretty trivial, so I definitely don't feel bad about carrying
> that.

Works for me.

> The thing I kind of like about this approach is that it makes the code
> fully independent of the relationship between MAXIMUM_ALIGNOF and
> sizeof(Size).

Yeah.  If it's not costing us much to support both cases, let's do so.
        regards, tom lane

Re: Portability issues in shm_mq

От

Robert Haas

Дата:

18 марта 2014 г., 18:29:46

On Tue, Mar 18, 2014 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The thing I kind of like about this approach is that it makes the code
>> fully independent of the relationship between MAXIMUM_ALIGNOF and
>> sizeof(Size).
>
> Yeah.  If it's not costing us much to support both cases, let's do so.

OK, committed as posted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Portability issues in shm_mq

От

Robert Haas

Дата:

18 марта 2014 г., 19:04:56

All right.

On Fri, Mar 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Whilst setting up a buildfarm member on an old, now-spare Mac, I was
> somewhat astonished to discover that contrib/test_shm_mq crashes thus:
> TRAP: FailedAssertion("!(rb >= sizeof(uint64))", File: "shm_mq.c", Line: 429)
> but only in UTF8 locales, not in C locale.  You'd have bet your last
> dollar that that code was locale-independent, right?

First, can you retest this with the latest code?

> Recommendations:
>
> 1. Reduce the random() multiplier from 96 to 95.  In multibyte encodings
> other than UTF8, chr() would flat out reject values of 128, so this test
> case is unportable.
>
> 2. Why in the world is the test case testing exactly one message length
> that happens to be a multiple of 8?  Put some randomness into that,
> instead.
>
> 3. Either you need to work a bit harder at forcing alignment, or you need
> to fix shm_mq_receive to cope with split message length words.
>
> 4. The header comment for shm_mq_receive_bytes may once have described its
> API accurately, but that appears to have been a long long time ago in a
> galaxy far far away.  Please fix.

On these recommendations, I believe that #3 and #4 are now dealt with.That leaves #1 and #2.  #1 is of course easy, but
Ithink we should

do them both together.

If we want to inject some randomness into the test, which parameters
do we want to randomize and over what ranges?  Also, if a buildfarm
critter falls over, how will we know what value triggered the failure?It's tempting to instead add one or more tests
thatwe specifically

choose to have values we think are likely to exercise
platform-specific differences or otherwise find bugs - e.g. just add a
second test where the queue size and message length are both odd.  And
maybe at a test where the queue is smaller than the message size, so
that every message wraps (multiple times?).

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Portability issues in shm_mq

От

Tom Lane

Дата:

18 марта 2014 г., 19:15:41

Robert Haas <robertmhaas@gmail.com> writes:
> First, can you retest this with the latest code?

Yeah, on it now.

> If we want to inject some randomness into the test, which parameters
> do we want to randomize and over what ranges?

I think the message length is the only particularly interesting
parameter.  It'd be nice if the length varied *within* a test, but
that would take rather considerable restructuring, so maybe it's
not worth the trouble.

> Also, if a buildfarm
> critter falls over, how will we know what value triggered the failure?

Maybe we won't, but I think knowing that it does fail on platform X is
likely to be enough to find the problem.

>  It's tempting to instead add one or more tests that we specifically
> choose to have values we think are likely to exercise
> platform-specific differences or otherwise find bugs - e.g. just add a
> second test where the queue size and message length are both odd.

Meh.  I think you're putting a bit too much faith in your ability to
predict the locus of bugs that you think aren't there.

> maybe at a test where the queue is smaller than the message size, so
> that every message wraps (multiple times?).

Does the code support messages larger than the queue size?  If so, yes,
that case clearly oughta be tested.
        regards, tom lane

Re: Portability issues in shm_mq

От

Tom Lane

Дата:

18 марта 2014 г., 19:21:36

I wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> First, can you retest this with the latest code?

> Yeah, on it now.

Early returns not good:

*** /Users/buildfarm/bf-data/HEAD/pgsql.93630/contrib/test_shm_mq/expected/test_shm_mq.out      Tue Mar 18 12:00:18
2014
--- /Users/buildfarm/bf-data/HEAD/pgsql.93630/contrib/test_shm_mq/results/test_shm_mq.out       Tue Mar 18 12:17:04
2014
***************
*** 5,18 **** -- internal sanity tests fail. -- SELECT test_shm_mq(32768, (select
string_agg(chr(32+(random()*96)::int),'') from generate_series(1,400)), 10000, 1);

!  test_shm_mq 
! -------------
!  
! (1 row)
!  SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*96)::int), '') from
generate_series(1,270000)),200, 3);

!  test_shm_mq_pipelined 
! -----------------------
!  
! (1 row)
! 
--- 5,12 ---- -- internal sanity tests fail. -- SELECT test_shm_mq(32768, (select
string_agg(chr(32+(random()*96)::int),'') from generate_series(1,400)), 10000, 1);

! ERROR:  message corrupted
! DETAIL:  The original message was 400 bytes but the final message is 7492059346764176 bytes. SELECT
test_shm_mq_pipelined(16384,(select string_agg(chr(32+(random()*96)::int), '') from generate_series(1,270000)), 200,
3);
! ERROR:  message corrupted
! DETAIL:  The original message was 270000 bytes but the final message is 7492059347033776 bytes.

This is C locale on a 32-bit machine, so you'll likely be seeing the same
complaint in already-online buildfarm members.

Note that size_t is definitely not int64 on this machine, so it looks to
me like your int64-ectomy was incomplete.  Those message lengths should
be impossible no matter what on this hardware.
        regards, tom lane

Re: Portability issues in shm_mq

От

Tom Lane

Дата:

18 марта 2014 г., 20:16:58

I wrote:
> Early returns not good:

Also, these compiler messages are probably relevant:

ccache gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I.
-I../../src/include-D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/et  -c -o test.o test.c
 
test.c: In function 'test_shm_mq':
test.c:89:3: warning: passing argument 2 of 'shm_mq_receive' from incompatible pointer type [enabled by default]
In file included from test_shm_mq.h:18:0,                from test.c:19:
../../src/include/storage/shm_mq.h:61:22: note: expected 'Size *' but argument is of type 'uint64 *'
test.c: In function 'test_shm_mq_pipelined':
test.c:198:4: warning: passing argument 2 of 'shm_mq_receive' from incompatible pointer type [enabled by default]
In file included from test_shm_mq.h:18:0,                from test.c:19:
../../src/include/storage/shm_mq.h:61:22: note: expected 'Size *' but argument is of type 'uint64 *'
ccache gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I.
-I../../src/include-D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/et  -c -o setup.o setup.c
 
ccache gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I.
-I../../src/include-D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/et  -c -o worker.o worker.c
 
worker.c: In function 'copy_messages':
worker.c:193:3: warning: passing argument 2 of 'shm_mq_receive' from incompatible pointer type [enabled by default]
In file included from worker.c:25:0:
../../src/include/storage/shm_mq.h:61:22: note: expected 'Size *' but argument is of type 'uint64 *'

I'm thinking maybe you just forgot to update the contrib module.
        regards, tom lane

Re: Portability issues in shm_mq

От

Robert Haas

Дата:

18 марта 2014 г., 20:31:52

On Tue, Mar 18, 2014 at 1:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Early returns not good:
>
> Also, these compiler messages are probably relevant:
>
> ccache gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I.
-I../../src/include-D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/et  -c -o test.o test.c 
> test.c: In function 'test_shm_mq':
> test.c:89:3: warning: passing argument 2 of 'shm_mq_receive' from incompatible pointer type [enabled by default]
> In file included from test_shm_mq.h:18:0,
>                  from test.c:19:
> ../../src/include/storage/shm_mq.h:61:22: note: expected 'Size *' but argument is of type 'uint64 *'
> test.c: In function 'test_shm_mq_pipelined':
> test.c:198:4: warning: passing argument 2 of 'shm_mq_receive' from incompatible pointer type [enabled by default]
> In file included from test_shm_mq.h:18:0,
>                  from test.c:19:
> ../../src/include/storage/shm_mq.h:61:22: note: expected 'Size *' but argument is of type 'uint64 *'
> ccache gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I.
-I../../src/include-D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/et  -c -o setup.o setup.c 
> ccache gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I.
-I../../src/include-D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/et  -c -o worker.o worker.c 
> worker.c: In function 'copy_messages':
> worker.c:193:3: warning: passing argument 2 of 'shm_mq_receive' from incompatible pointer type [enabled by default]
> In file included from worker.c:25:0:
> ../../src/include/storage/shm_mq.h:61:22: note: expected 'Size *' but argument is of type 'uint64 *'
>
> I'm thinking maybe you just forgot to update the contrib module.

Well, I definitely forgot that.  I'll count myself lucky if that's the
only problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Portability issues in shm_mq

От

andres@anarazel.de (Andres Freund)

Дата:

18 марта 2014 г., 20:36:46

On 2014-03-18 13:31:47 -0400, Robert Haas wrote:
> Well, I definitely forgot that.  I'll count myself lucky if that's the
> only problem.

One minor thing missing, patch attached.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

zu.patch

Re: Portability issues in shm_mq

От

Tom Lane

Дата:

18 марта 2014 г., 21:27:40

andres@anarazel.de (Andres Freund) writes:
> On 2014-03-18 13:31:47 -0400, Robert Haas wrote:
>> Well, I definitely forgot that.  I'll count myself lucky if that's the
>> only problem.

> One minor thing missing, patch attached.

setup_dynamic_shared_memory needed some more hacking too.  Committed.
        regards, tom lane

Re: Portability issues in shm_mq

От

Tom Lane

Дата:

18 марта 2014 г., 21:58:51

After the last round of changes, I can confirm that my original test with
UTF8 locale works, and my HPPA box is happy too.  We could still stand
to improve the regression test though.
        regards, tom lane

Re: Portability issues in shm_mq

От

Robert Haas

Дата:

18 марта 2014 г., 22:10:32

On Tue, Mar 18, 2014 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>  It's tempting to instead add one or more tests that we specifically
>> choose to have values we think are likely to exercise
>> platform-specific differences or otherwise find bugs - e.g. just add a
>> second test where the queue size and message length are both odd.
>
> Meh.  I think you're putting a bit too much faith in your ability to
> predict the locus of bugs that you think aren't there.

Well, I'm open to suggestions.

>> maybe at a test where the queue is smaller than the message size, so
>> that every message wraps (multiple times?).
>
> Does the code support messages larger than the queue size?  If so, yes,
> that case clearly oughta be tested.

Yep.  You should be able to send and receive any message that fits
within MaxAllocSize on even the smallest possible queue. Performance
may suck, but if that's an issue for you then don't use such a blasted
small queue.  The intended use of this is to stream (potentially long)
error messages or (potentially long and numerous) tuples between
cooperating backends without having to preallocate space for all the
data you want to send (which won't be any more feasible in a DSM than
it would be in the main segment).

Actually, you should be able to send or receive arbitrarily large
messages if you change the MemoryContextAlloc call in shm_mq.c to
MemoryContextAllocHuge, but I can't see any compelling reason to do
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Portability issues in shm_mq

От

Tom Lane

Дата:

18 марта 2014 г., 23:14:43

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 18, 2014 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Meh.  I think you're putting a bit too much faith in your ability to
>> predict the locus of bugs that you think aren't there.

> Well, I'm open to suggestions.

As a suggestion: it'd be worth explicitly testing zero-byte and one-byte
messages, those being obvious edge cases.  Then, say, randomly chosen
lengths in the range 100-1000; this would help ferret out odd-length
issues.  And something with message sizes larger than the queue size.
        regards, tom lane

Re: Portability issues in shm_mq

От

Robert Haas

Дата:

20 марта 2014 г., 16:35:44

On Tue, Mar 18, 2014 at 4:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Mar 18, 2014 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Meh.  I think you're putting a bit too much faith in your ability to
>>> predict the locus of bugs that you think aren't there.
>
>> Well, I'm open to suggestions.
>
> As a suggestion: it'd be worth explicitly testing zero-byte and one-byte
> messages, those being obvious edge cases.  Then, say, randomly chosen
> lengths in the range 100-1000; this would help ferret out odd-length
> issues.  And something with message sizes larger than the queue size.

All right, done.  Let's see if that tickles any edge cases we haven't
hit before.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Portability issues in shm_mq

Вложения

Вложения