Re: stress test for parallel workers

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема Re: stress test for parallel workers
Дата
Msg-id 20190723230440.GU22387@telsasoft.com
обсуждение исходный текст
Ответ на Re: stress test for parallel workers  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: stress test for parallel workers  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: stress test for parallel workers  (Thomas Munro <thomas.munro@gmail.com>)
Re: stress test for parallel workers  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
On Wed, Jul 24, 2019 at 10:46:42AM +1200, Thomas Munro wrote:
> On Wed, Jul 24, 2019 at 10:42 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > On Wed, Jul 24, 2019 at 10:03:25AM +1200, Thomas Munro wrote:
> > > On Wed, Jul 24, 2019 at 5:42 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > #2  0x000000000085ddff in errfinish (dummy=<value optimized out>) at elog.c:555
> > > >         edata = <value optimized out>
> > >
> > > If you have that core, it might be interesting to go to frame 2 and
> > > print *edata or edata->saved_errno.
> >
> > As you saw..unless someone you know a trick, it's "optimized out".
> 
> How about something like this:
> 
> print errorData[errordata_stack_depth]

Clever.

(gdb) p errordata[errordata_stack_depth]
$2 = {elevel = 13986192, output_to_server = 254, output_to_client = 127, show_funcname = false, hide_stmt = false,
hide_ctx= false, filename = 0x27b3790 "< %m %u >", lineno = 41745456, 
 
  funcname = 0x3030313335 <Address 0x3030313335 out of bounds>, domain = 0x0, context_domain = 0x27cff90 "postgres",
sqlerrcode= 0, message = 0xe8800000001 <Address 0xe8800000001 out of bounds>, 
 
  detail = 0x297a <Address 0x297a out of bounds>, detail_log = 0x0, hint = 0xe88 <Address 0xe88 out of bounds>, context
=0x297a <Address 0x297a out of bounds>, message_id = 0x0, schema_name = 0x0, 
 
  table_name = 0x0, column_name = 0x0, datatype_name = 0x0, constraint_name = 0x0, cursorpos = 0, internalpos = 0,
internalquery= 0x0, saved_errno = 0, assoc_context = 0x0}
 
(gdb) p errordata
$3 = {{elevel = 22, output_to_server = true, output_to_client = false, show_funcname = false, hide_stmt = false,
hide_ctx= false, filename = 0x9c4030 "origin.c", lineno = 591, 
 
    funcname = 0x9c46e0 "CheckPointReplicationOrigin", domain = 0x9ac810 "postgres-11", context_domain = 0x9ac810
"postgres-11",sqlerrcode = 4293, 
 
    message = 0x27b0e40 "could not write to file \"pg_logical/replorigin_checkpoint.tmp\": No space left on device",
detail= 0x0, detail_log = 0x0, hint = 0x0, context = 0x0, 
 
    message_id = 0x8a7aa8 "could not write to file \"%s\": %m", ...

I ought to have remembered that it *was* in fact out of space this AM when this
core was dumped (due to having not touched it since scheduling transition to
this VM last week).

I want to say I'm almost certain it wasn't ENOSPC in other cases, since,
failing to find log output, I ran df right after the failure.

But that gives me an idea: is it possible there's an issue with files being
held opened by worker processes ?  Including by parallel workers?  Probably
WALs, even after they're rotated ?  If there were worker processes holding
opened lots of rotated WALs, that could cause ENOSPC, but that wouldn't be
obvious after they die, since the space would then be freed.

Justin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nikita Glukhov
Дата:
Сообщение: Re: Support for jsonpath .datetime() method
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pgbench tests vs Windows