Re: dsa_allocate() faliure

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: dsa_allocate() faliure
Дата
Msg-id CAEepm=1C3t0B9yXDFtNgPDS0c--RZjDQuaCpFCaCaFUbPb6AFQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: dsa_allocate() faliure  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: dsa_allocate() faliure  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: dsa_allocate() faliure  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
On Sun, Feb 10, 2019 at 5:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 10, 2019 at 2:37 AM Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > But at first glance it shouldn't be allocating pages, because it just
> > does consolidation to try to convert to singleton format, and then it
> > does recycle list cleanup using soft=true so that no allocation of
> > btree pages should occur.
>
> I think I see what's happening.  At the moment the problem occurs,
> there is no btree - there is only a singleton range.  So
> FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and
> then ends up in the section with the comment  /* Not contiguous; we
> need to initialize the btree. */.  And that section, sadly, does not
> respect the 'soft' flag, so kaboom.  Something like the attached might
> fix it.

Ouch.  Yeah, that'd do it and matches the evidence.  With this change,
I couldn't reproduce the problem after 90 minutes with a test case
that otherwise hits it within a couple of minutes.

Here's a patch with a commit message explaining the change.

It also removes an obsolete comment, which is in fact related.  The
comment refers to an output parameter internal_pages_used, which must
have been used to report this exact phenomenon in an earlier
development version.  But there is no such parameter in the committed
version, and instead there is the soft flag to prevent internal
allocation.  I have no view on which approach is best, but yeah, if
we're using a soft flag, it has to work reliably.

This brings us to a difficult choice: we're about to cut a new
release, and this could in theory be included.  Even though the fix is
quite convincing, it doesn't seem wise to change such complicated code
at the last minute, and I know from an off-list chat that that is also
Robert's view.  So I'll wait until after the release, and we'll have
to live with the bug for another 3 months.

Note that this patch addresses the error "dsa_allocate could not find
%zu free pages".  (The error "dsa_area could not attach to segment" is
something else and apparently rarer.)

> Boy, I love FreePageManagerDump!

Yeah.  And I love reproducible bugs.

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Reporting script runtimes in pg_regress
Следующее
От: Tom Lane
Дата:
Сообщение: Re: dsa_allocate() faliure