Re: pg11.1: dsa_area could not attach to segment

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: pg11.1: dsa_area could not attach to segment
Дата
Msg-id CAEepm=20TBrkCZmK9Vi-5r-OAHdygAN0NqHn-uCb51hiZP+9rA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg11.1: dsa_area could not attach to segment  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
On Thu, Feb 7, 2019 at 12:47 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> However I *did* reproduce the error in an isolated, non-production postgres
> instance.  It's a total empty, untuned v11.1 initdb just for this, running ONLY
> a few simultaneous loops around just one query It looks like the simultaneous
> loops sometimes (but not always) fail together.  This has happened a couple
> times.
>
> It looks like one query failed due to "could not attach" in leader, one failed
> due to same in worker, and one failed with "not pinned", which I hadn't seen
> before and appears to be related to DSM, not DSA...

Hmm.  I hadn't considered that angle...  Some kind of interference
between unrelated DSA areas, or other DSM activity?  I will also try
to repro that here...

> I'm also trying to reproduce on other production servers.  But so far nothing
> else has shown the bug, including the other server which hit our original
> (other) DSA error with the queued_alters query.  So I tentatively think there
> really may be something specific to the server (not the hypervisor so maybe the
> OS, libraries, kernel, scheduler, ??).

Initially I thought these might be two symptoms of the same corruption
but I'm now starting to wonder if there are two bugs here: "could not
allocate %d pages" (rare) might be a logic bug in the computation of
contiguous_pages that requires a particular allocation pattern to hit,
and "dsa_area could not attach to segment" (rarissimo) might be
something else requiring concurrency/a race.

One thing that might be useful would be to add a call to
dsa_dump(area) just before the errors are raised, which will write a
bunch of stuff out to stderr and might give us some clues.  And to
print out the variable "index" from get_segment_by_index() when it
fails.  I'm also going to try to work up some better assertions.
--
Thomas Munro
http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Undo logs
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Documentation and code don't agree about partitioned table UPDATEs