Обсуждение: can we optimize STACK_DEPTH_SLOP
Poking at NetBSD kernel source it looks like the default ulimit -s depends on the architecture and ranges from 512k to 16M. Postgres insists on max_stack_depth being STACK_DEPTH_SLOP -- ie 512kB -- less than the ulimit setting making it impossible to start up on architectures with a default of 512kB without raising the ulimit. If we could just lower it to 384kB then Postgres would start up but I wonder if we should just use MIN(stack_rlimit/2, STACK _DEPTH_SLOP) so that there's always a setting of max_stack_depth that would allow Postgres to start. ./arch/sun2/include/vmparam.h:73:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/arm/include/arm32/vmparam.h:66:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/sun3/include/vmparam3.h:109:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/sun3/include/vmparam3x.h:58:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/luna68k/include/vmparam.h:70:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/hppa/include/vmparam.h:62:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/hp300/include/vmparam.h:82:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/alpha/include/vmparam.h:79:#define DFLSSIZ (1<<21) /* initial stack size (2M) */ ./arch/acorn26/include/vmparam.h:55:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/amd64/include/vmparam.h:83:#define DFLSSIZ (4*1024*1024) /* initial stack size limit */ ./arch/amd64/include/vmparam.h:101:#define DFLSSIZ32 (2*1024*1024) /* initial stack size limit */ ./arch/ia64/include/vmparam.h:57:#define DFLSSIZ (1<<21) /* initial stack size (2M) */ ./arch/mvme68k/include/vmparam.h:82:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/i386/include/vmparam.h:74:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/amiga/include/vmparam.h:82:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/sparc/include/vmparam.h:94:#define DFLSSIZ (8*1024*1024) /* initial stack size limit */ ./arch/mips/include/vmparam.h:95:#define DFLSSIZ (4*1024*1024) /* initial stack size limit */ ./arch/mips/include/vmparam.h:114:#define DFLSSIZ (16*1024*1024) /* initial stack size limit */ ./arch/mips/include/vmparam.h:134:#define DFLSSIZ32 DFLTSIZ /* initial stack size limit */ ./arch/sh3/include/vmparam.h:69:#define DFLSSIZ (2 * 1024 * 1024) ./arch/mac68k/include/vmparam.h:115:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/next68k/include/vmparam.h:89:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/news68k/include/vmparam.h:82:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/x68k/include/vmparam.h:74:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/cesfic/include/vmparam.h:82:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/usermode/include/vmparam.h:69:#define DFLSSIZ (2 * 1024 * 1024) ./arch/usermode/include/vmparam.h:78:#define DFLSSIZ (4 * 1024 * 1024) ./arch/powerpc/include/oea/vmparam.h:74:#define DFLSSIZ (2*1024*1024) /* default stack size */ ./arch/powerpc/include/ibm4xx/vmparam.h:60:#define DFLSSIZ (2*1024*1024) /* default stack size */ ./arch/powerpc/include/booke/vmparam.h:75:#define DFLSSIZ (2*1024*1024) /* default stack size */ ./arch/vax/include/vmparam.h:74:#define DFLSSIZ (512*1024) /* initial stack size limit */ ./arch/sparc64/include/vmparam.h:100:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/sparc64/include/vmparam.h:125:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ ./arch/sparc64/include/vmparam.h:145:#define DFLSSIZ32 (2*1024*1024) /* initial stack size limit */ ./arch/atari/include/vmparam.h:81:#define DFLSSIZ (2*1024*1024) /* initial stack size limit */ -- greg
Greg Stark <stark@mit.edu> writes: > Poking at NetBSD kernel source it looks like the default ulimit -s > depends on the architecture and ranges from 512k to 16M. Postgres > insists on max_stack_depth being STACK_DEPTH_SLOP -- ie 512kB -- less > than the ulimit setting making it impossible to start up on > architectures with a default of 512kB without raising the ulimit. > If we could just lower it to 384kB then Postgres would start up but I > wonder if we should just use MIN(stack_rlimit/2, STACK > _DEPTH_SLOP) so that there's always a setting of max_stack_depth that > would allow Postgres to start. I'm pretty nervous about reducing that materially without any investigation into how much of the slop we actually use. Our assumption so far has generally been that only recursive routines need to have any stack depth check; but there are plenty of very deep non-recursive call paths. I do not think we're doing people any favors by letting them skip fooling with "ulimit -s" if the result is that their database crashes under stress. For that matter, even if we were sure we'd produce a "stack too deep" error rather than crashing, that's still not very nice if it happens on run-of-the-mill queries. regards, tom lane
On Tue, Jul 5, 2016 at 11:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <stark@mit.edu> writes: >> Poking at NetBSD kernel source it looks like the default ulimit -s >> depends on the architecture and ranges from 512k to 16M. Postgres >> insists on max_stack_depth being STACK_DEPTH_SLOP -- ie 512kB -- less >> than the ulimit setting making it impossible to start up on >> architectures with a default of 512kB without raising the ulimit. > >> If we could just lower it to 384kB then Postgres would start up but I >> wonder if we should just use MIN(stack_rlimit/2, STACK >> _DEPTH_SLOP) so that there's always a setting of max_stack_depth that >> would allow Postgres to start. > > I'm pretty nervous about reducing that materially without any > investigation into how much of the slop we actually use. Our assumption > so far has generally been that only recursive routines need to have any > stack depth check; but there are plenty of very deep non-recursive call > paths. I do not think we're doing people any favors by letting them skip > fooling with "ulimit -s" if the result is that their database crashes > under stress. For that matter, even if we were sure we'd produce a > "stack too deep" error rather than crashing, that's still not very nice > if it happens on run-of-the-mill queries. To me it seems like using anything based on stack_rlimit/2 is pretty risky for the reason that you state, but I also think that not being able to start the database at all on some platforms with small stacks is bad. If I had to guess, I'd bet that most functions in the backend use a few hundred bytes of stack space or less, so that even 100kB of stack space is enough for hundreds of stack frames. If we're putting that kind of depth on the stack without ever checking the stack depth, we deserve what we get. That having been said, it wouldn't surprise me to find that we have functions here and there which put objects that are many kB in size on the stack, making it much easier to overrun the available stack space in only a few frames. It would be nice if there were a tool that you could run over your binaries and have it dump out the names of all functions that create large stack frames, but I don't know of one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jul 5, 2016 at 11:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm pretty nervous about reducing that materially without any >> investigation into how much of the slop we actually use. > To me it seems like using anything based on stack_rlimit/2 is pretty > risky for the reason that you state, but I also think that not being > able to start the database at all on some platforms with small stacks > is bad. My point was that this is something we should investigate, not just guess about. I did some experimentation using the attached quick-kluge patch, which (1) causes each exiting server process to report its actual ending stack size, and (2) hacks the STACK_DEPTH_SLOP test so that you can set max_stack_depth considerably higher than what rlimit(2) claims. Unfortunately the way I did (1) only works on systems with pmap; I'm not sure how to make it more portable. My results on an x86_64 RHEL6 system were pretty interesting: 1. All but two of the regression test scripts have ending stack sizes of 188K to 196K. There is one outlier at 296K (most likely the regex test, though I did not stop to confirm that) and then there's the errors.sql test, which intentionally provokes a "stack too deep" failure and will therefore consume approximately max_stack_depth stack if it can. 2. With the RHEL6 default "ulimit -s" setting of 10240kB, you actually have to increase max_stack_depth to 12275kB before you get a crash in errors.sql. At the highest passing value, 12274kB, pmap says we end with 1 00007ffc51f6e000 12284K rw--- [ stack ] which is just shy of 2MB more than the alleged limit. I conclude that at least in this kernel version, the kernel doesn't complain until your stack would be 2MB *more* than the ulimit -s value. That result also says that at least for that particular test, the value of STACK_DEPTH_SLOP could be as little as 10K without a crash, even without this surprising kernel forgiveness. But of course that test isn't really pushing the slop factor, since it's only compiling a trivial expression at each recursion depth. Given these results I definitely wouldn't have a problem with reducing STACK_DEPTH_SLOP to 200K, and you could possibly talk me down to less. On x86_64. Other architectures might be more stack-hungry, though. I'm particularly worried about IA64 --- I wonder if anyone can perform these same experiments on that? regards, tom lane diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c index cc36b80..7740120 100644 *** a/src/backend/storage/ipc/ipc.c --- b/src/backend/storage/ipc/ipc.c *************** static int on_proc_exit_index, *** 98,106 **** --- 98,113 ---- void proc_exit(int code) { + char sysbuf[256]; + /* Clean up everything that must be cleaned up */ proc_exit_prepare(code); + /* report stack size to stderr */ + snprintf(sysbuf, sizeof(sysbuf), "pmap %d | grep stack 1>&2", + (int) getpid()); + system(sysbuf); + #ifdef PROFILE_PID_DIR { /* diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h index 7254355..009bec2 100644 *** a/src/include/tcop/tcopprot.h --- b/src/include/tcop/tcopprot.h *************** *** 27,33 **** /* Required daylight between max_stack_depth and the kernel limit, in bytes */ ! #define STACK_DEPTH_SLOP (512 * 1024L) extern CommandDest whereToSendOutput; extern PGDLLIMPORT const char *debug_query_string; --- 27,33 ---- /* Required daylight between max_stack_depth and the kernel limit, in bytes */ ! #define STACK_DEPTH_SLOP (-100 * 1024L * 1024L) extern CommandDest whereToSendOutput; extern PGDLLIMPORT const char *debug_query_string;
On Tue, Jul 5, 2016 at 8:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Unfortunately the way I did (1) only works on systems with pmap; I'm not > sure how to make it more portable. I did a similar(ish) test which is admittedly not as exhaustive as using pmap. I instrumented check_stack_depth() itself to keep track of a high water mark (and based on Robert's thought process) to keep track of the largest increment over the previous checked stack depth. This doesn't cover any cases where there's no check_stack_depth() call in the call stack at all (but then if there's no check_stack_depth call at all it's hard to see how any setting of STACK_DEPTH_SLOP is necessarily going to help). I see similar results to you. The regexp test shows: LOG: disconnection: highest stack depth: 392256 largest stack increment: 35584 And the: STATEMENT: select infinite_recurse(); LOG: disconnection: highest stack depth: 2097584 largest stack increment: 1936 There were a couple other tests with similar stack increase increments to the regular expression test: STATEMENT: alter table atacc2 add constraint foo check (test>0) no inherit; LOG: disconnection: highest stack depth: 39232 largest stack increment: 34224 STATEMENT: SELECT chr(0); LOG: disconnection: highest stack depth: 44144 largest stack increment: 34512 But aside from those two the next largest increment between two success check_stack_depth calls was about 12kB: STATEMENT: select array_elem_check(121.00); LOG: disconnection: highest stack depth: 24256 largest stack increment: 12896 This was all on x86_64 too. -- greg
Вложения
Greg Stark <stark@mit.edu> writes: > I did a similar(ish) test which is admittedly not as exhaustive as > using pmap. I instrumented check_stack_depth() itself to keep track of > a high water mark (and based on Robert's thought process) to keep > track of the largest increment over the previous checked stack depth. > This doesn't cover any cases where there's no check_stack_depth() call > in the call stack at all (but then if there's no check_stack_depth > call at all it's hard to see how any setting of STACK_DEPTH_SLOP is > necessarily going to help). Well, the point of STACK_DEPTH_SLOP is that we don't want to have to put check_stack_depth calls in every function in the backend, especially not otherwise-inexpensive leaf functions. So the idea is for the slop number to cover the worst-case call graph after the last function with a check. Your numbers are pretty interesting, in that they clearly prove we need a slop value of at least 40-50K, but they don't really show that that's adequate. I'm a bit disturbed by the fact that you seem to be showing maximum measured depth for the non-outlier tests as only around 40K-ish. That doesn't match up very well with my pmap results, since in no case did I see a physical stack size below 188K. [ pokes around for a little bit... ] Oh, this is interesting: it looks like the *postmaster*'s stack size is 188K, and of course every forked child is going to inherit that as a minimum stack depth. What's more, pmap shows stack sizes near that for all my running postmasters going back to 8.4. But 8.3 and before show a stack size of 84K, which seems to be some sort of minimum on this machine; even a trivial "cat" process has that stack size according to pmap. Conclusion: something we did in 8.4 greatly bloated the postmaster's stack space consumption, to the point that it's significantly more than anything a normal backend does. That's surprising and scary, because it means the postmaster is *more* exposed to stack SIGSEGV than most backends. We need to find the cause, IMO. regards, tom lane
On Wed, Jul 6, 2016 at 2:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Conclusion: something we did in 8.4 greatly bloated the postmaster's > stack space consumption, to the point that it's significantly more than > anything a normal backend does. That's surprising and scary, because > it means the postmaster is *more* exposed to stack SIGSEGV than most > backends. We need to find the cause, IMO. Hm. I do something based on your test where I build a .so and started the postmaster with -c shared_preload_libraries to load it. I tried to run it on every revision I have built for the historic benchmarks. That only worked as far back as 8.4.0 -- which makes me suspect it's possibly because of precisely shared_preload_libraries and the dynamic linker that the stack size grew.... The only thing it actually revealed was a *drop* of 50kB between REL9_2_0~1610 and REL9_2_0~1396. REL8_4_0~1702 188K REL8_4_0~1603 192K REL8_4_0~1498 188K REL8_4_0~1358 192K REL8_4_0~1218 184K REL8_4_0~1013 188K REL8_4_0~996 192K REL8_4_0~856 192K REL8_4_0~775 192K REL8_4_0~567 192K REL8_4_0~480 188K REL8_4_0~360 188K REL8_4_0~151 188K REL9_0_0~1855 188K REL9_0_0~1654 188K REL9_0_0~1538 192K REL9_0_0~1454 184K REL9_0_0~1351 184K REL9_0_0~1249 188K REL9_0_0~1107 184K REL9_0_0~938 184K REL9_0_0~627 184K REL9_0_0~414 184K REL9_0_0~202 184K REL9_1_0~1867 188K REL9_1_0~1695 184K REL9_1_0~1511 188K REL9_1_0~1328 192K REL9_1_0~978 192K REL9_1_0~948 188K REL9_1_0~628 188K REL9_1_0~382 192K REL9_2_0~1825 184K REL9_2_0~1610 192K <--------------- here REL9_2_0~1396 148K REL9_2_0~1226 148K REL9_2_0~1190 148K REL9_2_0~1072 140K REL9_2_0~1071 144K REL9_2_0~984 144K REL9_2_0~777 144K REL9_2_0~767 148K REL9_2_0~551 148K REL9_2_0~309 144K REL9_3_0~1509 148K REL9_3_0~1304 148K REL9_3_0~1099 144K REL9_3_0~1030 144K REL9_3_0~944 140K REL9_3_0~789 144K REL9_3_0~735 148K REL9_3_0~589 144K REL9_3_0~390 148K REL9_3_0~223 144K REL9_4_0~1923 148K REL9_4_0~1894 148K REL9_4_0~1755 144K REL9_4_0~1688 144K REL9_4_0~1617 144K REL9_4_0~1431 144K REL9_4_0~1246 144K REL9_4_0~1142 148K REL9_4_0~995 148K REL9_4_0~744 140K REL9_4_0~462 148K REL9_5_0~2370 148K REL8_4_22 192K REL9_5_0~2183 148K REL9_5_0~1996 148K REL9_5_0~1782 144K REL9_5_0~1569 148K REL9_5_0~1557 144K REL9_5_ALPHA1-20-g7b156c1 144K REL9_5_ALPHA1-299-g47ebbdc 144K REL9_5_ALPHA1-489-ge06b2e1 144K REL9_0_23 188K REL9_1_19 192K REL9_2_14 144K REL9_3_10 148K REL9_4_5 148K REL9_5_ALPHA1-683-ge073490 144K REL9_5_ALPHA1-844-gdfcd9cb 148K REL9_5_0 148K REL9_5_ALPHA1-972-g7dc09c1 144K REL9_5_ALPHA1-1114-g57a6a72 148K -- greg
Greg Stark <stark@mit.edu> writes: > Ok, I managed to get __atribute__((destructor)) working and capitured > the attached pmap output for all the revisions. You can see the git > revision in the binary name along with a putative date though in the > case of branches the date can be deceptive. It looks to me like REL8_4 > is already bloated by REL8_4_0~2268 (which means 2268 commits *before* > the REL8_4_0 tag -- i.e. soon after it branched). I traced through this by dint of inserting a lot of system("pmap") calls, and what I found out is that it's the act of setting one of the timezone variables that does it. This is because tzload() allocates a local variable "union local_storage ls", which sounds harmless enough, but in point of fact the darn thing is 78K! And to add insult to injury, with my setting (US/Eastern) there is a recursive call to parse the "posixrules" timezone file. So that's 150K worth of stack right there, although possibly it's only half that for some zone settings. (And if you use "GMT" you escape all of this, since that's hard coded.) So now I understand why the IANA code has provisions for malloc'ing that storage rather than just using the stack. We should do likewise. regards, tom lane
Ok, I managed to get __atribute__((destructor)) working and capitured the attached pmap output for all the revisions. You can see the git revision in the binary name along with a putative date though in the case of branches the date can be deceptive. It looks to me like REL8_4 is already bloated by REL8_4_0~2268 (which means 2268 commits *before* the REL8_4_0 tag -- i.e. soon after it branched). I can't really make heads or tails of this. I don't see any commits in the early days of 8.4 that could change the stack depth in the postmaster.
Вложения
I found out that pmap can give much more fine-grained results than I was getting before, if you give it the -x flag and then pay attention to the "dirty" column rather than the "nominal size" column. That gives a reliable indication of how much stack space the process ever actually touched, with resolution apparently 4KB on my machine. I redid my measurements with commit 62c8421e8 applied, and now get results like this for one run of the standard regression tests: $ grep '\[ stack \]' postmaster.log | sort -k 4n | uniq -c 137 00007fff0f615000 84 36 36 rw--- [ stack] 21 00007fff0f615000 84 40 40 rw--- [ stack ] 4 00007fff0f615000 84 44 44 rw--- [ stack ] 20 00007fff0f615000 84 48 48 rw--- [ stack ] 8 00007fff0f615000 84 52 52 rw--- [ stack ] 2 00007fff0f615000 84 56 56 rw--- [ stack ] 10 00007fff0f615000 84 60 60 rw--- [ stack ] 3 00007fff0f615000 84 64 64 rw--- [ stack ] 3 00007fff0f615000 84 68 68 rw--- [ stack ] 2 00007fff0f615000 84 72 72 rw--- [ stack] 1 00007fff0f612000 96 76 76 rw--- [ stack ] 2 00007fff0f60e000 112 112 112 rw--- [ stack ] 1 00007fff0f5e0000 296 296 296 rw--- [ stack ] 1 00007fff0f427000 2060 2060 2060 rw--- [ stack ] The rightmost numeric column is the "dirty KB in region" column, and 36KB is the floor established by the postmaster. (It looks like selecting timezone is still the largest stack-space hog in that, but it's no longer enough to make me want to do something about it.) So now we're seeing some cases that exceed that floor, which is good. regex and errors are still the outliers, as expected. Also, I found that on OS X "vmmap -dirty" could produce results comparable to pmap, so here's the numbers for the same test case on current OS X: 154 Stack 8192K 36K 2 5 Stack 8192K 40K 2 11 Stack 8192K 44K 2 6 Stack 8192K 48K 2 11 Stack 8192K 52K 2 7 Stack 8192K 56K 2 8 Stack 8192K 60K 2 2 Stack 8192K 64K 2 2 Stack 8192K 68K 2 4 Stack 8192K 72K 2 1 Stack 8192K 76K 2 2 Stack 8192K 108K 2 1 Stack 8192K 384K 2 1 Stack 8192K 2056K 2 (The "virtual" stack size seems to always be the same as ulimit -s, ie 8MB by default, on this platform.) This is good confirmation that the actual stack consumption is pretty stable across different compilers, though it looks like OS X's version of clang is a bit more stack-wasteful for the regex recursion. Based on these numbers, I'd have no fear of reducing STACK_DEPTH_SLOP to 256KB on x86_64. It would sure be good to check things on some other architectures, though ... regards, tom lane
I wrote: > Based on these numbers, I'd have no fear of reducing STACK_DEPTH_SLOP > to 256KB on x86_64. It would sure be good to check things on some > other architectures, though ... I went to the work of doing the same test on a PPC Mac: 182 Stack [ 8192K/ 40K] 25 Stack [ 8192K/ 48K] 2 Stack [ 8192K/ 56K] 11 Stack [ 8192K/ 60K] 5 Stack [ 8192K/ 64K] 2 Stack [ 8192K/ 108K] 1 Stack [ 8192K/ 576K] 1 Stack [ 8192K/ 2056K] The last number here is "resident pages", not "dirty pages", because this older version of OS X doesn't provide the latter. Still, the numbers seem to track pretty well with the ones I got on x86_64. Which is a bit odd when you think about it: a 32-bit machine ought to consume less stack space because pointers are narrower. Also on my old HPPA dinosaur: 40 addr 0x7b03a000, length 8, physical pages 8, type STACK166 addr 0x7b03a000, length 10, physical pages 9, type STACK26 addr 0x7b03a000, length 12, physical pages 11, type STACK 16 addr 0x7b03a000, length 14, physical pages 13, typeSTACK 1 addr 0x7b03a000, length 15, physical pages 13, type STACK 1 addr 0x7b03a000, length 16, physical pages 15,type STACK 2 addr 0x7b03a000, length 28, physical pages 27, type STACK 1 addr 0x7b03a000, length 190, physical pages190, type STACK 1 addr 0x7b03a000, length 514, physical pages 514, type STACK As best I can tell, "length" is the nominal virtual space for the stack, and "physical pages" is the actually allocated/resident space, both measured in 4K pages. So that again matches pretty well, although the stack-efficiency of the recursive regex functions seems to get worse with each new case I look at. However ... the thread here https://www.postgresql.org/message-id/flat/21563.1289064886%40sss.pgh.pa.us says that depending on your choice of compiler and optimization level, IA64 can be 4x to 5x worse for stack space than x86_64, even after spotting it double the memory allocation to handle its two separate stacks. I don't currently have access to an IA64 machine to check. Based on what I'm seeing so far, really 100K ought to be more than plenty of slop for most architectures, but I'm afraid to go there for IA64. Also, there might be some more places like tzload() that are putting unreasonably large variables on the stack, but that the regression tests don't exercise (I've not tested anything replication-related, for example). Bottom line: I propose that we keep STACK_DEPTH_SLOP at 512K for IA64 but reduce it to 256K for everything else. regards, tom lane
On Fri, Jul 8, 2016 at 4:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Based on what I'm seeing so far, really 100K ought to be more than plenty > of slop for most architectures, but I'm afraid to go there for IA64. Searching for info on ia64 turned up this interesting thread: https://www.postgresql.org/message-id/21563.1289064886%40sss.pgh.pa.us From that discussion it seems we should probably run these tests with -O0 because the stack usage can be substantially higher without optimizations. And it doesn't sound like ia64 uses much more *normal* stack, just that there's the additional register stack. It might not be unreasonable to commit the pmap hack, gather the data from the build farm then later add an #ifdef around it. (or just make it #ifdef USE_ASSERTIONS which I assume most build farm members are running with anyways). Alternatively it wouldn't be very hard to use mincore(2) to implement it natively. I believe mincore is nonstandard but present in Linux and BSD. -- greg
Greg Stark <stark@mit.edu> writes: > Searching for info on ia64 turned up this interesting thread: > https://www.postgresql.org/message-id/21563.1289064886%40sss.pgh.pa.us Yeah, that's the same one I referenced upthread ;-) > From that discussion it seems we should probably run these tests with > -O0 because the stack usage can be substantially higher without > optimizations. And it doesn't sound like ia64 uses much more *normal* > stack, just that there's the additional register stack. > It might not be unreasonable to commit the pmap hack, gather the data > from the build farm then later add an #ifdef around it. (or just make > it #ifdef USE_ASSERTIONS which I assume most build farm members are > running with anyways). Hmm. The two IA64 critters in the farm are running HPUX, which means they likely don't have pmap. But I could clean up the hack I used to gather stack size data on gaur's host and commit it temporarily. On non-HPUX platforms we could just try system("pmap -x") and see what happens; as long as we're ignoring the result it shouldn't cause anything really bad. I was going to object that this would probably not tell us anything about the worst-case IA64 stack usage, but I see that neither of those critters are selecting any optimization, so actually it would. So, agreed, let's commit some temporary debug code and see what the buildfarm can teach us. Will go work on that in a bit. > Alternatively it wouldn't be very hard to use mincore(2) to implement > it natively. I believe mincore is nonstandard but present in Linux and > BSD. Hm, after reading the man page I don't quite see how that would help? You'd have to already know the mapped stack address range in order to call the function without getting ENOMEM. regards, tom lane
On Fri, Jul 8, 2016 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hm, after reading the man page I don't quite see how that would help? > You'd have to already know the mapped stack address range in order to > call the function without getting ENOMEM. I had assumed unmapped pages would just return a 0 in the bitmap. I suppose you could still do it by just probing one page at a time until you find an unmapped page. In a way that's better since you can count stack pages even if they're paged out. Fwiw here's the pmap info from burbot (Linux Sparc64): 136 48 48 rw--- [ stack ] 136 48 48 rw--- [ stack ] 136 48 48 rw--- [ stack ] 136 48 48 rw--- [ stack ] 136 56 56 rw--- [ stack ] 136 80 80 rw--- [ stack ] 136 96 96 rw--- [ stack ] 136 112 112 rw--- [ stack ] 136 112 112 rw--- [ stack ] 576 576 576 rw--- [ stack ] 2056 2056 2056 rw--- [ stack ] I'm actually a bit confused how to interpret these numbers. This appears to be an 8kB pagesize architecture so is that 576*8kB or over 5MB of stack for the regexp test? But we don't know if there are any check_stack_depth calls in that call tree? -- greg
Greg Stark <stark@mit.edu> writes: > Fwiw here's the pmap info from burbot (Linux Sparc64): > 136 48 48 rw--- [ stack ] > 136 48 48 rw--- [ stack ] > 136 48 48 rw--- [ stack ] > 136 48 48 rw--- [ stack ] > 136 56 56 rw--- [ stack ] > 136 80 80 rw--- [ stack ] > 136 96 96 rw--- [ stack ] > 136 112 112 rw--- [ stack ] > 136 112 112 rw--- [ stack ] > 576 576 576 rw--- [ stack ] > 2056 2056 2056 rw--- [ stack ] > I'm actually a bit confused how to interpret these numbers. This > appears to be an 8kB pagesize architecture so is that 576*8kB or over > 5MB of stack for the regexp test? No, pmap specifies that its outputs are measured in kilobytes. So this is by and large the same as what I'm seeing on x86_64, again with the caveat that the recursive regex routines seem to vary all over the place in terms of stack consumption. > But we don't know if there are any > check_stack_depth calls in that call tree? The regex recursion definitely does have check_stack_depth calls in it (since commit b63fc2877). But what we're trying to measure here is the worst-case stack depth regardless of any check_stack_depth calls. That's a ceiling on what we might need to set STACK_DEPTH_SLOP to --- probably a very loose ceiling, but I don't want to err on the side of underestimating it. I wouldn't consider either the regex or errors tests as needing to bound STACK_DEPTH_SLOP, since we know that most of their consumption is from recursive code that contains check_stack_depth calls. But it's useful to look at those depths just as a sanity check that we're getting valid numbers. regards, tom lane
I wrote: > So, agreed, let's commit some temporary debug code and see what the > buildfarm can teach us. Will go work on that in a bit. After reviewing the buildfarm results, I'm feeling nervous about this whole idea again. For the most part, the unaccounted-for daylight between the maximum stack depth measured by check_stack_depth and the actually dirtied stack space reported by pmap is under 100K. But there are a pretty fair number of exceptions. The worst cases I found were on "dunlin", which approached 200K extra space in a couple of places: dunlin | 2016-07-09 22:05:09 | check.log | 00007ffff2667000 268 208 208 rw--- [ stack ]dunlin | 2016-07-09 22:05:09 | check.log | max measuredstack depth 14kBdunlin | 2016-07-09 22:05:09 | install-check-C.log | 00007fffee650000 268 200 200 rw--- [ stack ]dunlin | 2016-07-09 22:05:09 | install-check-C.log | max measured stack depth 14kB This appears to be happening in the tsdicts test script. Other machines also show a significant discrepancy between pmap and check_stack_depth results for that test, which suggests that maybe the tsearch code is being overly reliant on large local variables. But I haven't dug through it. Another area of concern is PLs. For instance, on capybara, a machine otherwise pretty unexceptional in stack-space appetite, quite a few of the PL tests ate ~100K of unaccounted-for space: capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 104 104 rw--- [ stack ]capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 0 0 rw--- [ stack ]capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | max measured stack depth 8kBcapybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbd000 136 136 136 rw--- [ stack ]capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbd000 136 0 0 rw--- [ stack ]capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | max measured stack depth 0kBcapybara | 2016-07-0921:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 104 104 rw--- [stack ]capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 0 0 rw--- [ stack ]capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | maxmeasured stack depth 5kBcapybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 116 116 rw--- [ stack ]capybara | 2016-07-09 21:15:56 | pl-install-check-C.log | 00007ffc61bbe000 132 0 0 rw--- [ stack ]capybara | 2016-07-09 21:15:56 |pl-install-check-C.log | max measured stack depth 7kB Presumably that reflects some oddity of the local version of perl or python, but I have no idea what. So while we could possibly get away with reducing STACK_DEPTH_SLOP to 256K, there is good reason to think that that would be leaving little or no safety margin. At this point I'm inclined to think we should leave well enough alone. At the very least, if we were to try to reduce that number, I'd want to have some plan for tracking our stack space consumption better than we have done in the past. regards, tom lane PS: for amusement's sake, here are some numbers I extracted concerning the relative stack-hungriness of different buildfarm members. First, the number of recursion levels each machine could accomplish before hitting "stack too deep" in the errors.sql regression test (measured by counting the number of CONTEXT lines in the relevant error message): sysname | snapshot | count ---------------+---------------------+-------protosciurus | 2016-07-10 12:03:06 | 731chub | 2016-07-10 15:10:01| 1033quokka | 2016-07-10 02:17:31 | 1033hornet | 2016-07-09 23:42:32 | 1156clam | 2016-07-0922:00:01 | 1265anole | 2016-07-09 22:41:40 | 1413spoonbill | 2016-07-09 23:00:05 | 1535sungazer | 2016-07-09 23:51:33 | 1618gaur | 2016-07-09 04:53:13 | 1634kouprey | 2016-07-10 04:58:00| 1653nudibranch | 2016-07-10 09:18:10 | 1664grouse | 2016-07-10 08:43:02 | 1708sprat | 2016-07-1008:43:55 | 1717pademelon | 2016-07-09 06:12:10 | 1814mandrill | 2016-07-10 00:10:02 | 2093gharial | 2016-07-10 01:15:50 | 2248francolin | 2016-07-10 13:00:01 | 2379piculet | 2016-07-10 13:00:01 | 2379lorikeet | 2016-07-10 08:04:19 | 2422caecilian | 2016-07-09 19:31:50 | 2423jacana | 2016-07-09 22:36:38| 2515bowerbird | 2016-07-10 02:13:47 | 2617locust | 2016-07-09 21:50:26 | 2838prairiedog | 2016-07-0922:44:58 | 2838dromedary | 2016-07-09 20:48:06 | 2840damselfly | 2016-07-10 10:27:09 | 2880curculio | 2016-07-09 21:30:01 | 2905mylodon | 2016-07-09 20:50:01 | 2974tern | 2016-07-09 23:51:23| 3015burbot | 2016-07-10 03:30:45 | 3042magpie | 2016-07-09 21:38:02 | 3043reindeer | 2016-07-1004:00:05 | 3043friarbird | 2016-07-10 04:20:01 | 3187nightjar | 2016-07-09 21:17:52 | 3187sittella | 2016-07-09 21:46:29 | 3188crake | 2016-07-09 22:06:09 | 3267guaibasaurus | 2016-07-10 00:17:01| 3267ibex | 2016-07-09 20:59:06 | 3267mule | 2016-07-09 23:30:02 | 3267spurfowl | 2016-07-0921:06:39 | 3267anchovy | 2016-07-09 21:41:04 | 3268blesbok | 2016-07-09 21:17:46 | 3268capybara | 2016-07-09 21:15:56 | 3268conchuela | 2016-07-09 21:00:01 | 3268handfish | 2016-07-09 04:37:57| 3268macaque | 2016-07-08 21:25:06 | 3268minisauripus | 2016-07-10 03:19:42 | 3268rhinoceros | 2016-07-0921:45:01 | 3268sidewinder | 2016-07-09 21:45:00 | 3272jaguarundi | 2016-07-10 06:52:05 | 3355loach | 2016-07-09 21:15:00 | 3355okapi | 2016-07-10 06:15:02 | 3425fulmar | 2016-07-09 23:47:57 | 3436longfin | 2016-07-09 21:10:17 | 3444brolga | 2016-07-10 09:40:46 | 3537dunlin | 2016-07-09 22:05:09| 3616coypu | 2016-07-09 22:20:46 | 3626hyrax | 2016-07-09 19:52:03 | 3635treepie | 2016-07-0922:41:37 | 3635frogmouth | 2016-07-10 02:00:09 | 3636narwhal | 2016-07-10 10:00:05 | 3966rover_firefly| 2016-07-10 15:01:45 | 4084lapwing | 2016-07-09 21:15:01 | 4085cockatiel | 2016-07-10 13:40:47| 4362currawong | 2016-07-10 05:16:03 | 5136mastodon | 2016-07-10 11:00:01 | 5136termite | 2016-07-0921:01:30 | 5452hamster | 2016-07-09 16:00:06 | 5685dangomushi | 2016-07-09 18:00:27 | 5692gull | 2016-07-10 04:48:28 | 5692mereswine | 2016-07-10 10:40:57 | 5810axolotl | 2016-07-09 22:12:12 | 5811chipmunk | 2016-07-10 08:18:07 | 5949grison | 2016-07-09 21:00:02 | 5949 (74 rows) (coypu gets a gold star for this one, since it makes a good showing despite having max_stack_depth set to 1536kB --- everyone else seems to be using 2MB.) Second, the stack space consumed for the regex regression test --- here, smaller is better: currawong | 2016-07-10 05:16:03 | max measured stack depth 213kBmastodon | 2016-07-10 11:00:01 | max measured stackdepth 213kBaxolotl | 2016-07-09 22:12:12 | max measured stack depth 240kBhamster | 2016-07-09 16:00:06 |max measured stack depth 240kBmereswine | 2016-07-10 10:40:57 | max measured stack depth 240kBbrolga | 2016-07-1009:40:46 | max measured stack depth 284kBnarwhal | 2016-07-10 10:00:05 | max measured stack depth 284kBcockatiel | 2016-07-10 13:40:47 | max measured stack depth 285kBfrancolin | 2016-07-10 13:00:01 | max measuredstack depth 285kBhyrax | 2016-07-09 19:52:03 | max measured stack depth 285kBmagpie | 2016-07-09 21:38:02| max measured stack depth 285kBpiculet | 2016-07-10 13:00:01 | max measured stack depth 285kBreindeer | 2016-07-10 04:00:05 | max measured stack depth 285kBtreepie | 2016-07-09 22:41:37 | max measured stack depth 285kBlapwing | 2016-07-09 21:15:01 | max measured stack depth 287kBrover_firefly | 2016-07-10 15:01:45 | max measuredstack depth 287kBcoypu | 2016-07-09 22:20:46 | max measured stack depth 288kBfriarbird | 2016-07-10 04:20:01| max measured stack depth 289kBnightjar | 2016-07-09 21:17:52 | max measured stack depth 289kBgharial | 2016-07-10 01:15:50 | max measured stack depths 290kB, 384kBbowerbird | 2016-07-10 02:13:47 | max measured stack depth378kBcaecilian | 2016-07-09 19:31:50 | max measured stack depth 378kBfrogmouth | 2016-07-10 02:00:09 | max measuredstack depth 378kBmylodon | 2016-07-09 20:50:01 | max measured stack depth 378kBjaguarundi | 2016-07-10 06:52:05| max measured stack depth 379kBloach | 2016-07-09 21:15:00 | max measured stack depth 379kBlongfin | 2016-07-09 21:10:17 | max measured stack depth 379kBsidewinder | 2016-07-09 21:45:00 | max measured stack depth 379kBanchovy | 2016-07-09 21:41:04 | max measured stack depth 381kBblesbok | 2016-07-09 21:17:46 | max measuredstack depth 381kBcapybara | 2016-07-09 21:15:56 | max measured stack depth 381kBconchuela | 2016-07-09 21:00:01| max measured stack depth 381kBcrake | 2016-07-09 22:06:09 | max measured stack depth 381kBcurculio | 2016-07-09 21:30:01 | max measured stack depth 381kBguaibasaurus | 2016-07-10 00:17:01 | max measured stack depth 381kBhandfish | 2016-07-09 04:37:57 | max measured stack depth 381kBibex | 2016-07-09 20:59:06 | max measuredstack depth 381kBmacaque | 2016-07-08 21:25:06 | max measured stack depth 381kBminisauripus | 2016-07-10 03:19:42| max measured stack depth 381kBmule | 2016-07-09 23:30:02 | max measured stack depth 381kBrhinoceros | 2016-07-09 21:45:01 | max measured stack depth 381kBsittella | 2016-07-09 21:46:29 | max measured stack depth 381kBspurfowl | 2016-07-09 21:06:39 | max measured stack depth 381kBdromedary | 2016-07-09 20:48:06 | max measuredstack depth 382kBpademelon | 2016-07-09 06:12:10 | max measured stack depth 382kBfulmar | 2016-07-09 23:47:57| max measured stack depth 383kBdunlin | 2016-07-09 22:05:09 | max measured stack depth 388kBokapi | 2016-07-10 06:15:02 | max measured stack depth 389kBmandrill | 2016-07-10 00:10:02 | max measured stack depth 489kBtern | 2016-07-09 23:51:23 | max measured stack depth 491kBdamselfly | 2016-07-10 10:27:09 | max measuredstack depth 492kBburbot | 2016-07-10 03:30:45 | max measured stack depth 567kBlocust | 2016-07-09 21:50:26| max measured stack depth 571kBprairiedog | 2016-07-09 22:44:58 | max measured stack depth 571kBclam | 2016-07-09 22:00:01 | max measured stack depth 573kBjacana | 2016-07-09 22:36:38 | max measured stack depth 661kBlorikeet | 2016-07-10 08:04:19 | max measured stack depth 662kBgaur | 2016-07-09 04:53:13 | max measuredstack depth 756kBchub | 2016-07-10 15:10:01 | max measured stack depth 856kBquokka | 2016-07-10 02:17:31| max measured stack depth 856kBhornet | 2016-07-09 23:42:32 | max measured stack depth 868kBgrouse | 2016-07-10 08:43:02 | max measured stack depth 944kBkouprey | 2016-07-10 04:58:00 | max measured stack depth 944kBnudibranch | 2016-07-10 09:18:10 | max measured stack depth 945kBsprat | 2016-07-10 08:43:55 | max measuredstack depth 946kBsungazer | 2016-07-09 23:51:33 | max measured stack depth 963kBprotosciurus | 2016-07-10 12:03:06| max measured stack depth 1432kB The second list omits a couple of machines whose reports got garbled by concurrent insertions into the log file.