Re: margay fails assertion in stats/dsa/dsm code

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: margay fails assertion in stats/dsa/dsm code
Дата
Msg-id CA+hUKGKbpjPUXTWUCK77w9CSg-CBMJZYH8aMBOO5e7jJ+1ef=w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: margay fails assertion in stats/dsa/dsm code  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: margay fails assertion in stats/dsa/dsm code  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Fri, Jul 1, 2022 at 4:02 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jun 29, 2022 at 12:01 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > -               if (errno != EEXIST)
> > +               if (op == DSM_OP_ATTACH || errno != EEXIST)
> >                         ereport(elevel,
> >                                         (errcode_for_dynamic_shared_memory(),
> >                                          errmsg("could not open shared
> > memory segment \"%s\": %m",
> >
> > margay would probably still fail until that underlying problem is
> > addressed, but less mysteriously on our side at least.
>
> That seems like a correct fix, but maybe we should also be checking
> the return value of dsm_impl_op() e.g. define dsm_impl_op_error() as
> an inline function that does if (!dsm_impl_op(..., ERROR)) elog(ERROR,
> "the author of dsm.c is not as clever as he thinks he is").

Thanks.  Also the mmap and sysv paths do something similar, so I also
made the same change there just on principle.  I didn't make the extra
belt-and-braces check you suggested for now, preferring minimalism.  I
think the author of dsm.c was pretty clever, it's just that the world
turned out to be more hostile than expected, in one very specific way.

Pushed.

So that should get us to a state where margay still fails
occasionally, but now with an ERROR rather than a crash.

Next up, I confirmed my theory about what's happening on closed
Solaris by tracing syscalls.  It is indeed that clunky sleep(1) code
that gives up after 64 tries.  Even in pre-shmem-stats releases that
don't contend enough to reach the bogus EEXIST error, I'm pretty sure
people must be getting random sleeps injected into their parallel
queries in the wild by this code.

I have concluded that that implementation of shm_open() is not really
usable for our purposes.  We'll have to change *something* to turn
margay reliably green, not to mention bogus error reports we can
expect from 15 in the wild, and performance woes that I cannot now
unsee.

So... I think we should select a different default
dynamic_shared_memory_type in initdb.c if defined(__sun__).  Which is
the least terrible?  For sysv, it looks like all the relevant sysctls
that used to be required to use sysv memory became obsolete/automatic
in Sol 10 (note: Sol 9 is long EOL'd), so it should just work AFAICT,
whereas for mmap mode your shared memory data is likely to cause file
I/O because we put the temporary files in your data directory.  I'm
thinking perhaps we should default to dynamic_shared_memory_type=sysv
for 15+.  I don't really want to change it in the back branches, since
nobody has actually complained about "posix" performance and it might
upset someone if we change it for newly initdb'd DBs in a major
release series.  But I'm not an expert or even user of this OS, I'm
just trying to fix the build farm; better ideas welcome.

Thoughts?



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: replacing role-level NOINHERIT with a grant-level option
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Backup command and functions can cause assertion failure and segmentation fault