Обсуждение: Portability issues in shm_mq
Whilst setting up a buildfarm member on an old, now-spare Mac, I was somewhat astonished to discover that contrib/test_shm_mq crashes thus: TRAP: FailedAssertion("!(rb >= sizeof(uint64))", File: "shm_mq.c", Line: 429) but only in UTF8 locales, not in C locale. You'd have bet your last dollar that that code was locale-independent, right? The reason appears to be that in the payload string generated with (select string_agg(chr(32+(random()*96)::int), '') from generate_series(1,400)) the chr() argument rounds up to 128 every so often. In UTF8 encoding, that causes chr() to return a multibyte character instead of a single byte. So, instead of always having a fixed payload string length of 400 bytes, the payload length moves around a bit --- in a few trials I see anywhere from 400 to 409 bytes. How is that leading to a crash? Well, this machine is 32-bit, so MAXALIGN is only 4. This means it is possible for an odd-length message cum message length word to not exactly divide the size of the shared memory ring buffer, resulting in cases where an 8-byte message length word is wrapped around the end of the buffer. shm_mq_receive_bytes makes no attempt to hide that situation from its caller, and happily returns just 4 bytes with SHM_MQ_SUCCESS. shm_mq_receive, on the other hand, is so confident that it will always get an indivisible length word that it just Asserts that that's the case. Recommendations: 1. Reduce the random() multiplier from 96 to 95. In multibyte encodings other than UTF8, chr() would flat out reject values of 128, so this test case is unportable. 2. Why in the world is the test case testing exactly one message length that happens to be a multiple of 8? Put some randomness into that, instead. 3. Either you need to work a bit harder at forcing alignment, or you need to fix shm_mq_receive to cope with split message length words. 4. The header comment for shm_mq_receive_bytes may once have described its API accurately, but that appears to have been a long long time ago in a galaxy far far away. Please fix. Also, while this is not directly your problem, it's becoming clear that we don't have enough buildfarm coverage of not-64-bit platforms; this problem would have been spotted awhile ago if we did. I'm going to spin up a couple of critters on old machines lying around my office. We should probably also encourage owners of existing critters to expand their test coverage a bit, eg try locales other than C. regards, tom lane
On Fri, Mar 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Whilst setting up a buildfarm member on an old, now-spare Mac, I was > somewhat astonished to discover that contrib/test_shm_mq crashes thus: > TRAP: FailedAssertion("!(rb >= sizeof(uint64))", File: "shm_mq.c", Line: 429) > but only in UTF8 locales, not in C locale. You'd have bet your last > dollar that that code was locale-independent, right? > > The reason appears to be that in the payload string generated with > (select string_agg(chr(32+(random()*96)::int), '') from generate_series(1,400)) > the chr() argument rounds up to 128 every so often. In UTF8 encoding, > that causes chr() to return a multibyte character instead of a single > byte. So, instead of always having a fixed payload string length of > 400 bytes, the payload length moves around a bit --- in a few trials > I see anywhere from 400 to 409 bytes. > > How is that leading to a crash? Well, this machine is 32-bit, so MAXALIGN > is only 4. This means it is possible for an odd-length message cum > message length word to not exactly divide the size of the shared memory > ring buffer, resulting in cases where an 8-byte message length word is > wrapped around the end of the buffer. shm_mq_receive_bytes makes no > attempt to hide that situation from its caller, and happily returns just > 4 bytes with SHM_MQ_SUCCESS. shm_mq_receive, on the other hand, is so > confident that it will always get an indivisible length word that it just > Asserts that that's the case. Argh. I think I forced the size of the buffer to be MAXALIGN'd, but what it really needs is to be a multiple of the size of uint64. > Recommendations: > > 1. Reduce the random() multiplier from 96 to 95. In multibyte encodings > other than UTF8, chr() would flat out reject values of 128, so this test > case is unportable. Agreed. > 2. Why in the world is the test case testing exactly one message length > that happens to be a multiple of 8? Put some randomness into that, > instead. Good idea. I think that started out as a performance test rather than an integrity test, and I didn't think hard enough when revising it about what would make a good integrity test. > 3. Either you need to work a bit harder at forcing alignment, or you need > to fix shm_mq_receive to cope with split message length words. The first one is what is intended. I will look at it. > 4. The header comment for shm_mq_receive_bytes may once have described its > API accurately, but that appears to have been a long long time ago in a > galaxy far far away. Please fix. Ugh, looks like I forgot to update that when I introduced the shm_mq_result return type. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Mar 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> How is that leading to a crash? Well, this machine is 32-bit, so MAXALIGN >> is only 4. This means it is possible for an odd-length message cum >> message length word to not exactly divide the size of the shared memory >> ring buffer, resulting in cases where an 8-byte message length word is >> wrapped around the end of the buffer. > Argh. I think I forced the size of the buffer to be MAXALIGN'd, but > what it really needs is to be a multiple of the size of uint64. After sleeping on it, I think what you're proposing here is to double down on a wrong design decision. ISTM you should change the message length words to be size_t (or possibly ssize_t, if you're depending on signed arithmetic), which would let you keep using MAXALIGN as the alignment macro. There is absolutely no benefit, either for performance or code readability, in forcing 32-bit machines to use 64-bit message length words. Indeed, by not using the same alignment macros as everywhere else and not being able to use %zu for debug printouts, I think the only real effect you're producing is to make the DSM/MQ stuff more and more randomly unlike the rest of Postgres. Please reconsider while it's still not too late to change those APIs. regards, tom lane
On Sun, Mar 16, 2014 at 11:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Mar 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> How is that leading to a crash? Well, this machine is 32-bit, so MAXALIGN >>> is only 4. This means it is possible for an odd-length message cum >>> message length word to not exactly divide the size of the shared memory >>> ring buffer, resulting in cases where an 8-byte message length word is >>> wrapped around the end of the buffer. > >> Argh. I think I forced the size of the buffer to be MAXALIGN'd, but >> what it really needs is to be a multiple of the size of uint64. > > After sleeping on it, I think what you're proposing here is to double down > on a wrong design decision. ISTM you should change the message length > words to be size_t (or possibly ssize_t, if you're depending on signed > arithmetic), which would let you keep using MAXALIGN as the alignment > macro. There is absolutely no benefit, either for performance or code > readability, in forcing 32-bit machines to use 64-bit message length > words. Indeed, by not using the same alignment macros as everywhere else > and not being able to use %zu for debug printouts, I think the only real > effect you're producing is to make the DSM/MQ stuff more and more randomly > unlike the rest of Postgres. Please reconsider while it's still not too > late to change those APIs. Hmm. That's not a terrible idea. I think part of the reason I did it this way because, although the size of an individual message can be limited to size_t, the queue maintains a counter of the total number of bytes ever sent and received, and that has to use 64-bit arithmetic so it doesn't overflow (much as we do for LSNs). It seemed simpler to make all of the lengths uint64 rather than the lengths of individual messages Size and the total number of bytes sent and received uint64. However, that could probably be worked out. But I think there's another possible problem here. In order for reads from the buffer not to suffer alignment problems, the chunk size for reads and writes from the buffer needs to be MAXIMUM_ALIGNOF (or some multiple of it). And in order to avoid a great deal of additional and unwarranted complexity, the size of the message word also needs to be MAXIMUM_ALIGNOF (or some multiple of it). So the message word can only be of size 4 if MAXIMUM_ALIGNOF is also 4. IOW, I think your approach is going to run into trouble on any system where sizeof(Size)==4 but MAXIMUM_ALIGNOF==8. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > But I think there's another possible problem here. In order for reads > from the buffer not to suffer alignment problems, the chunk size for > reads and writes from the buffer needs to be MAXIMUM_ALIGNOF (or some > multiple of it). And in order to avoid a great deal of additional and > unwarranted complexity, the size of the message word also needs to be > MAXIMUM_ALIGNOF (or some multiple of it). So the message word can > only be of size 4 if MAXIMUM_ALIGNOF is also 4. IOW, I think your > approach is going to run into trouble on any system where > sizeof(Size)==4 but MAXIMUM_ALIGNOF==8. Well, it will result in padding space when you maxalign the length word, but I don't see why it wouldn't work; and it would certainly be no less efficient than what's there today. I'll be quite happy to test the results on my old HPPA box, which has exactly those properties, if you're worried about it. regards, tom lane
On Sun, Mar 16, 2014 at 10:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> But I think there's another possible problem here. In order for reads >> from the buffer not to suffer alignment problems, the chunk size for >> reads and writes from the buffer needs to be MAXIMUM_ALIGNOF (or some >> multiple of it). And in order to avoid a great deal of additional and >> unwarranted complexity, the size of the message word also needs to be >> MAXIMUM_ALIGNOF (or some multiple of it). So the message word can >> only be of size 4 if MAXIMUM_ALIGNOF is also 4. IOW, I think your >> approach is going to run into trouble on any system where >> sizeof(Size)==4 but MAXIMUM_ALIGNOF==8. > > Well, it will result in padding space when you maxalign the length word, > but I don't see why it wouldn't work; and it would certainly be no less > efficient than what's there today. Well, the problem is with this: /* Write the message length into the buffer. */ if (!mqh->mqh_did_length_word) { res = shm_mq_send_bytes(mqh,sizeof(uint64), &nbytes, nowait, &bytes_written); If I change nbytes to be of type Size, and the second argument to sizeof(Size), then it's wrong whenever sizeof(Size) % MAXIMUM_ALIGNOF != 0. I could do something like: union { char pad[MAXIMUM_ALIGNOF]; Size val; } padded_size; padded_size.val = nbytes; res = shm_mq_send_bytes(mqh, sizeof(padded_size), &padded_size, nowait, &bytes_written); ...but that probably *is* less efficient, and it's certainly a lot uglier; a similar hack will be needed when extracting the length word, and assertions elsewhere will need adjustment. I wonder if it wouldn't be better to adjust the external API to use Size just for consistency but, internally to the module, keep using 8 byte sizes within the buffer. Really, I think that would boil down to going through and making sure that we use TYPEALIGN(...,sizeof(uint64)) everywhere instead of MAXALIGN(), which doesn't seem like a big deal. On the third hand, maybe there are or will be platforms out there where MAXIMUM_ALIGNOF > 8. If so, it's probably best to bite the bullet and allow for padding now, so we don't have to monkey with this again later. Sorry for belaboring this point, but I want to make sure I only need to fix this once. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Mar 16, 2014 at 10:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Well, it will result in padding space when you maxalign the length word, >> but I don't see why it wouldn't work; and it would certainly be no less >> efficient than what's there today. > Well, the problem is with this: > /* Write the message length into the buffer. */ > if (!mqh->mqh_did_length_word) > { > res = shm_mq_send_bytes(mqh, sizeof(uint64), &nbytes, nowait, > &bytes_written); > If I change nbytes to be of type Size, and the second argument to > sizeof(Size), then it's wrong whenever sizeof(Size) % MAXIMUM_ALIGNOF > != 0. Well, you need to maxalign the number of bytes physically inserted into the queue. Doesn't shm_mq_send_bytes do that? Where do you do the maxaligning of the message payload data, when the payload is odd-length? I would have expected some logic like "copy N bytes but then advance the pointer by maxalign(N)". regards, tom lane
On Mon, Mar 17, 2014 at 12:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Mar 16, 2014 at 10:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Well, it will result in padding space when you maxalign the length word, >>> but I don't see why it wouldn't work; and it would certainly be no less >>> efficient than what's there today. > >> Well, the problem is with this: > >> /* Write the message length into the buffer. */ >> if (!mqh->mqh_did_length_word) >> { >> res = shm_mq_send_bytes(mqh, sizeof(uint64), &nbytes, nowait, >> &bytes_written); > >> If I change nbytes to be of type Size, and the second argument to >> sizeof(Size), then it's wrong whenever sizeof(Size) % MAXIMUM_ALIGNOF >> != 0. > > Well, you need to maxalign the number of bytes physically inserted into > the queue. Doesn't shm_mq_send_bytes do that? Where do you do the > maxaligning of the message payload data, when the payload is odd-length? > I would have expected some logic like "copy N bytes but then advance > the pointer by maxalign(N)". Oh, yeah. Duh. Clearly my brain isn't working today. Hmm, so maybe this will be fairly simple... will try it out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Mar 17, 2014 at 1:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Oh, yeah. Duh. Clearly my brain isn't working today. Hmm, so maybe > this will be fairly simple... will try it out. OK, I tried this out. The major complication that cropped up was that, if we make the length word always a Size but align the buffer to MAXIMUM_ALIGNOF, then the length word might get split if sizeof(Size) > MAXIMUM_ALIGNOF. That doesn't look too bad, but required changing a couple of if statements into while loops, and changing around the structure of a shm_mq_handle a bit. See attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Robert Haas <robertmhaas@gmail.com> writes: > OK, I tried this out. The major complication that cropped up was > that, if we make the length word always a Size but align the buffer to > MAXIMUM_ALIGNOF, then the length word might get split if sizeof(Size) > > MAXIMUM_ALIGNOF. Hmm ... do we support any platforms where that's actually the case? It's possible I guess but I don't know of any offhand. The reverse case is real, but I'm not sure about this one. > That doesn't look too bad, but required changing a > couple of if statements into while loops, and changing around the > structure of a shm_mq_handle a bit. See attached. Would it get noticeably simpler or faster if you omitted support for the sizeof(Size) > MAXIMUM_ALIGNOF case? It looks like perhaps not, but if we were paying anything much I'd be tempted to just put in a static assert to the contrary and see if anyone complains. BTW, can't we drop the MAXALIGN64 stuff again? regards, tom lane
On Mon, Mar 17, 2014 at 11:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> OK, I tried this out. The major complication that cropped up was >> that, if we make the length word always a Size but align the buffer to >> MAXIMUM_ALIGNOF, then the length word might get split if sizeof(Size) >> > MAXIMUM_ALIGNOF. > > Hmm ... do we support any platforms where that's actually the case? > It's possible I guess but I don't know of any offhand. The reverse > case is real, but I'm not sure about this one. Man, I don't have a clue what exists. I certainly wouldn't want to discourage someone from making a 64-bit machine that only requires 32-bit alignment for 8-bit values, but color me clueless as to whether such things are really out there. >> That doesn't look too bad, but required changing a >> couple of if statements into while loops, and changing around the >> structure of a shm_mq_handle a bit. See attached. > > Would it get noticeably simpler or faster if you omitted support for > the sizeof(Size) > MAXIMUM_ALIGNOF case? It looks like perhaps not, > but if we were paying anything much I'd be tempted to just put in > a static assert to the contrary and see if anyone complains. Not really. I installed a fast path into the receive code for the common case where the length word isn't split, which will always be true on platforms where sizeof(Size) <= MAXIMUM_ALIGNOF and usually true otherwise. We could ditch the slow path completely by ignoring that case, but it's not all that much code. On the sending side, the logic is pretty trivial, so I definitely don't feel bad about carrying that. The thing I kind of like about this approach is that it makes the code fully independent of the relationship between MAXIMUM_ALIGNOF and sizeof(Size). If the former is smaller, we'll write the length word in chunks if needed; if the latter is smaller, we'll insert useless padding bytes. In the perhaps-common case where they're identical, it all works as before, except for minor space savings on 32-bit platforms. I was a little annoyed by having to write the extra code and thought about objecting to this whole line of attack yet again, but I think it's actually likely for the best. If we start persuading ourselves that certain cases don't need to work, and rely on that throughout the backend, and then such machines crop up and we want to support them, we'll have a deep hole to climb out of. With this approach, there might be bugs, of course, but it's a lot easier to fix a bug that only occurs on a new platform than it is to reconsider the whole design in light of a new platform. > BTW, can't we drop the MAXALIGN64 stuff again? It's still used by xlog.c. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Mar 17, 2014 at 11:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Would it get noticeably simpler or faster if you omitted support for >> the sizeof(Size) > MAXIMUM_ALIGNOF case? It looks like perhaps not, >> but if we were paying anything much I'd be tempted to just put in >> a static assert to the contrary and see if anyone complains. > Not really. I installed a fast path into the receive code for the > common case where the length word isn't split, which will always be > true on platforms where sizeof(Size) <= MAXIMUM_ALIGNOF and usually > true otherwise. We could ditch the slow path completely by ignoring > that case, but it's not all that much code. On the sending side, the > logic is pretty trivial, so I definitely don't feel bad about carrying > that. Works for me. > The thing I kind of like about this approach is that it makes the code > fully independent of the relationship between MAXIMUM_ALIGNOF and > sizeof(Size). Yeah. If it's not costing us much to support both cases, let's do so. regards, tom lane
On Tue, Mar 18, 2014 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The thing I kind of like about this approach is that it makes the code >> fully independent of the relationship between MAXIMUM_ALIGNOF and >> sizeof(Size). > > Yeah. If it's not costing us much to support both cases, let's do so. OK, committed as posted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
All right. On Fri, Mar 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Whilst setting up a buildfarm member on an old, now-spare Mac, I was > somewhat astonished to discover that contrib/test_shm_mq crashes thus: > TRAP: FailedAssertion("!(rb >= sizeof(uint64))", File: "shm_mq.c", Line: 429) > but only in UTF8 locales, not in C locale. You'd have bet your last > dollar that that code was locale-independent, right? First, can you retest this with the latest code? > Recommendations: > > 1. Reduce the random() multiplier from 96 to 95. In multibyte encodings > other than UTF8, chr() would flat out reject values of 128, so this test > case is unportable. > > 2. Why in the world is the test case testing exactly one message length > that happens to be a multiple of 8? Put some randomness into that, > instead. > > 3. Either you need to work a bit harder at forcing alignment, or you need > to fix shm_mq_receive to cope with split message length words. > > 4. The header comment for shm_mq_receive_bytes may once have described its > API accurately, but that appears to have been a long long time ago in a > galaxy far far away. Please fix. On these recommendations, I believe that #3 and #4 are now dealt with.That leaves #1 and #2. #1 is of course easy, but Ithink we should do them both together. If we want to inject some randomness into the test, which parameters do we want to randomize and over what ranges? Also, if a buildfarm critter falls over, how will we know what value triggered the failure?It's tempting to instead add one or more tests thatwe specifically choose to have values we think are likely to exercise platform-specific differences or otherwise find bugs - e.g. just add a second test where the queue size and message length are both odd. And maybe at a test where the queue is smaller than the message size, so that every message wraps (multiple times?). Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > First, can you retest this with the latest code? Yeah, on it now. > If we want to inject some randomness into the test, which parameters > do we want to randomize and over what ranges? I think the message length is the only particularly interesting parameter. It'd be nice if the length varied *within* a test, but that would take rather considerable restructuring, so maybe it's not worth the trouble. > Also, if a buildfarm > critter falls over, how will we know what value triggered the failure? Maybe we won't, but I think knowing that it does fail on platform X is likely to be enough to find the problem. > It's tempting to instead add one or more tests that we specifically > choose to have values we think are likely to exercise > platform-specific differences or otherwise find bugs - e.g. just add a > second test where the queue size and message length are both odd. Meh. I think you're putting a bit too much faith in your ability to predict the locus of bugs that you think aren't there. > maybe at a test where the queue is smaller than the message size, so > that every message wraps (multiple times?). Does the code support messages larger than the queue size? If so, yes, that case clearly oughta be tested. regards, tom lane
I wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> First, can you retest this with the latest code? > Yeah, on it now. Early returns not good: *** /Users/buildfarm/bf-data/HEAD/pgsql.93630/contrib/test_shm_mq/expected/test_shm_mq.out Tue Mar 18 12:00:18 2014 --- /Users/buildfarm/bf-data/HEAD/pgsql.93630/contrib/test_shm_mq/results/test_shm_mq.out Tue Mar 18 12:17:04 2014 *************** *** 5,18 **** -- internal sanity tests fail. -- SELECT test_shm_mq(32768, (select string_agg(chr(32+(random()*96)::int),'') from generate_series(1,400)), 10000, 1); ! test_shm_mq ! ------------- ! ! (1 row) ! SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*96)::int), '') from generate_series(1,270000)),200, 3); ! test_shm_mq_pipelined ! ----------------------- ! ! (1 row) ! --- 5,12 ---- -- internal sanity tests fail. -- SELECT test_shm_mq(32768, (select string_agg(chr(32+(random()*96)::int),'') from generate_series(1,400)), 10000, 1); ! ERROR: message corrupted ! DETAIL: The original message was 400 bytes but the final message is 7492059346764176 bytes. SELECT test_shm_mq_pipelined(16384,(select string_agg(chr(32+(random()*96)::int), '') from generate_series(1,270000)), 200, 3); ! ERROR: message corrupted ! DETAIL: The original message was 270000 bytes but the final message is 7492059347033776 bytes. This is C locale on a 32-bit machine, so you'll likely be seeing the same complaint in already-online buildfarm members. Note that size_t is definitely not int64 on this machine, so it looks to me like your int64-ectomy was incomplete. Those message lengths should be impossible no matter what on this hardware. regards, tom lane
I wrote: > Early returns not good: Also, these compiler messages are probably relevant: ccache gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I. -I../../src/include-D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/et -c -o test.o test.c test.c: In function 'test_shm_mq': test.c:89:3: warning: passing argument 2 of 'shm_mq_receive' from incompatible pointer type [enabled by default] In file included from test_shm_mq.h:18:0, from test.c:19: ../../src/include/storage/shm_mq.h:61:22: note: expected 'Size *' but argument is of type 'uint64 *' test.c: In function 'test_shm_mq_pipelined': test.c:198:4: warning: passing argument 2 of 'shm_mq_receive' from incompatible pointer type [enabled by default] In file included from test_shm_mq.h:18:0, from test.c:19: ../../src/include/storage/shm_mq.h:61:22: note: expected 'Size *' but argument is of type 'uint64 *' ccache gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I. -I../../src/include-D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/et -c -o setup.o setup.c ccache gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I. -I../../src/include-D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/et -c -o worker.o worker.c worker.c: In function 'copy_messages': worker.c:193:3: warning: passing argument 2 of 'shm_mq_receive' from incompatible pointer type [enabled by default] In file included from worker.c:25:0: ../../src/include/storage/shm_mq.h:61:22: note: expected 'Size *' but argument is of type 'uint64 *' I'm thinking maybe you just forgot to update the contrib module. regards, tom lane
On Tue, Mar 18, 2014 at 1:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Early returns not good: > > Also, these compiler messages are probably relevant: > > ccache gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I. -I../../src/include-D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/et -c -o test.o test.c > test.c: In function 'test_shm_mq': > test.c:89:3: warning: passing argument 2 of 'shm_mq_receive' from incompatible pointer type [enabled by default] > In file included from test_shm_mq.h:18:0, > from test.c:19: > ../../src/include/storage/shm_mq.h:61:22: note: expected 'Size *' but argument is of type 'uint64 *' > test.c: In function 'test_shm_mq_pipelined': > test.c:198:4: warning: passing argument 2 of 'shm_mq_receive' from incompatible pointer type [enabled by default] > In file included from test_shm_mq.h:18:0, > from test.c:19: > ../../src/include/storage/shm_mq.h:61:22: note: expected 'Size *' but argument is of type 'uint64 *' > ccache gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I. -I../../src/include-D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/et -c -o setup.o setup.c > ccache gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I. -I../../src/include-D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/et -c -o worker.o worker.c > worker.c: In function 'copy_messages': > worker.c:193:3: warning: passing argument 2 of 'shm_mq_receive' from incompatible pointer type [enabled by default] > In file included from worker.c:25:0: > ../../src/include/storage/shm_mq.h:61:22: note: expected 'Size *' but argument is of type 'uint64 *' > > I'm thinking maybe you just forgot to update the contrib module. Well, I definitely forgot that. I'll count myself lucky if that's the only problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-03-18 13:31:47 -0400, Robert Haas wrote: > Well, I definitely forgot that. I'll count myself lucky if that's the > only problem. One minor thing missing, patch attached. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
andres@anarazel.de (Andres Freund) writes: > On 2014-03-18 13:31:47 -0400, Robert Haas wrote: >> Well, I definitely forgot that. I'll count myself lucky if that's the >> only problem. > One minor thing missing, patch attached. setup_dynamic_shared_memory needed some more hacking too. Committed. regards, tom lane
After the last round of changes, I can confirm that my original test with UTF8 locale works, and my HPPA box is happy too. We could still stand to improve the regression test though. regards, tom lane
On Tue, Mar 18, 2014 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It's tempting to instead add one or more tests that we specifically >> choose to have values we think are likely to exercise >> platform-specific differences or otherwise find bugs - e.g. just add a >> second test where the queue size and message length are both odd. > > Meh. I think you're putting a bit too much faith in your ability to > predict the locus of bugs that you think aren't there. Well, I'm open to suggestions. >> maybe at a test where the queue is smaller than the message size, so >> that every message wraps (multiple times?). > > Does the code support messages larger than the queue size? If so, yes, > that case clearly oughta be tested. Yep. You should be able to send and receive any message that fits within MaxAllocSize on even the smallest possible queue. Performance may suck, but if that's an issue for you then don't use such a blasted small queue. The intended use of this is to stream (potentially long) error messages or (potentially long and numerous) tuples between cooperating backends without having to preallocate space for all the data you want to send (which won't be any more feasible in a DSM than it would be in the main segment). Actually, you should be able to send or receive arbitrarily large messages if you change the MemoryContextAlloc call in shm_mq.c to MemoryContextAllocHuge, but I can't see any compelling reason to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Mar 18, 2014 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Meh. I think you're putting a bit too much faith in your ability to >> predict the locus of bugs that you think aren't there. > Well, I'm open to suggestions. As a suggestion: it'd be worth explicitly testing zero-byte and one-byte messages, those being obvious edge cases. Then, say, randomly chosen lengths in the range 100-1000; this would help ferret out odd-length issues. And something with message sizes larger than the queue size. regards, tom lane
On Tue, Mar 18, 2014 at 4:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Mar 18, 2014 at 12:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Meh. I think you're putting a bit too much faith in your ability to >>> predict the locus of bugs that you think aren't there. > >> Well, I'm open to suggestions. > > As a suggestion: it'd be worth explicitly testing zero-byte and one-byte > messages, those being obvious edge cases. Then, say, randomly chosen > lengths in the range 100-1000; this would help ferret out odd-length > issues. And something with message sizes larger than the queue size. All right, done. Let's see if that tickles any edge cases we haven't hit before. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company