Re: _bt_split(), and the risk of OOM before its critical section

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: _bt_split(), and the risk of OOM before its critical section
Дата
Msg-id CAH2-Wz=r2ae=mLvD-o5hjzcbARBXJSXDZcL_v1J=FCzCb0Vqaw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: _bt_split(), and the risk of OOM before its critical section  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: _bt_split(), and the risk of OOM before its critical section  (Peter Geoghegan <pg@bowt.ie>)
Re: _bt_split(), and the risk of OOM before its critical section  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Mon, May 6, 2019 at 3:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, as _bt_split is currently coded, _bt_truncate has to be a "no
> errors" function, which it isn't.  The pfree for its result is being
> done in an ill-chosen place, too.

I am tempted to move the call to _bt_truncate() out of _bt_split()
entirely on HEAD, possibly relocating it to
nbtsplitloc.c/_bt_findsplitloc(). That way, there is a clearer
separation between how split points are chosen, suffix truncation, and
the mechanical process of executing a legal page split.

> Another problem now that I look at it is that the _bt_getbuf for the right
> sibling is probably not too safe.  And the _bt_vacuum_cycleid() call seems
> a bit dangerous from this standpoint as well.

Yeah, we can tighten those up without much difficulty.

> I'm not really concerned about that one because at that point the
> right page is still in a freshly-pageinit'd state.  It's perhaps
> not quite as nice as having it be zeroes, but it won't look like
> it has any interesting data.

The important question is how VACUUM will recognize it. It's clearly
not as bad as something that causes "failed to re-find parent key"
errors, but I think that VACUUM might not be reclaiming it for the FSM
(haven't checked). Note that _bt_unlink_halfdead_page() is perfectly
happy to ignore the fact that the left sibling of a half-dead page has
a rightlink that doesn't point back to the target. Because, uh, there
might have been a concurrent page deletion, somehow.

We have heard a lot about "failed to re-find parent key" errors from
VACUUM before now because that is about the only strong cross-check
that it does. (Not that I'm arguing that we need more of that.)

> In any case, once we've started to fill the ropaque area, it would really
> be better if we don't call anything that could throw errors.
>
> Maybe we should bite the bullet and use two temp pages, so that none
> of the data ends up in the shared buffer arena until we reach the
> critical section?  The extra copying is slightly annoying, but
> it certainly seems like enforcing this invariant over such a
> long stretch of code is not very maintainable.

While I think that the smarts we have around deciding a split point
will probably improve in future releases, and that we'll probably make
_bt_truncate() itself do more, the actual business of performing a
split has no reason to change that I can think of. I would like to
keep _bt_split() as simple as possible anyway -- it should only be
copying tuples using simple primitives like the bufpage.c routines.
Living with what we have now (not using a temp buffer for the right
page) seems fine.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: _bt_split(), and the risk of OOM before its critical section
Следующее
От: Paul Jungwirth
Дата:
Сообщение: Re: range_agg