On my workstation today (running vanilla 9.4.0) I was testing some new
code that does aggressive parallel loading to a couple of tables. It
ran ok several dozen times and froze up with no external trigger.
There were at most 8 active backends that were stuck (the loader is
threaded to a cap) -- each query typically resolves in a few seconds
but they were hung for 30 minutes+. Had to do restart immediate as
backends were not responding to cancel...but I snapped a 'perf top'
before I did so. The results were interesting so I'm posting them
here. So far I have not been able to reproduce...FYI
61.03% postgres [.] s_lock13.56% postgres [.] LWLockRelease10.11% postgres
[.] LWLockAcquire 4.02% perf [.] 0x00000000000526d3 1.65% postgres
[.] _bt_compare 1.60% libc-2.17.so [.] 0x0000000000081069 0.66% [kernel]
[k]kallsyms_expand_symbol.constprop.1 0.60% [kernel] [k] format_decode 0.57% [kernel]
[k] number.isra.1 0.47% [kernel] [k] memcpy 0.44% postgres [.]
ReleaseAndReadBuffer0.44% postgres [.] FunctionCall2Coll 0.41% [kernel] [k]
vsnprintf0.41% [kernel] [k] module_get_kallsym 0.32% postgres [.]
_bt_relandgetbuf0.31% [kernel] [k] string.isra.5 0.31% [kernel] [k] strnlen
0.31% postgres [.] _bt_moveright 0.28% libc-2.17.so [.] getdelim 0.22% postgres
[.] LockBuffer 0.16% [kernel] [k] seq_read 0.16% libc-2.17.so
[.]__libc_calloc 0.13% postgres [.] _bt_checkpage 0.09% [kernel] [k]
pointer.isra.150.09% [kernel] [k] update_iter 0.08% plugin_host [.]
PyObject_GetAttr0.06% [kernel] [k] strlcpy 0.06% [kernel] [k] seq_vprintf
0.06% [kernel] [k] copy_user_enhanced_fast_string 0.06% libc-2.17.so [.] _IO_feof
0.06% postgres [.] btoidcmp 0.06% [kernel] [k] page_fault 0.06% libc-2.17.so
[.] free 0.06% libc-2.17.so [.] memchr 0.06% libpthread-2.17.so [.]
__pthread_mutex_unlock_usercnt
merlin