Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Дата
Msg-id CA+TgmoaAnNLS-cM7UhGMU=aaDw_LpCGdXKajP1MQApUvvpDG8A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1  (Andres Freund <andres@anarazel.de>)
Ответы Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1  (Andres Freund <andres@anarazel.de>)
Список pgsql-general
On Tue, Jun 2, 2015 at 8:56 AM, Andres Freund <andres@anarazel.de> wrote:
> But what *definitely* looks wrong to me is that a TruncateMultiXact() in
> this scenario now (since a couple weeks ago) does a
> SimpleLruReadPage_ReadOnly() in the members slru via
> find_multixact_start(). That just won't work acceptably when we're not
> yet consistent. There very well could not be a valid members segment at
> that point?  Am I missing something?

Yes: that code isn't new.

TruncateMultiXact() called SimpleLruReadPage_ReadOnly() directly in
9.3.0 and every subsequent release until 9.3.7/9.4.2.  The only thing
that's changed is that we've moved that logic into a function called
find_multixact_start() instead of having it directly inside that
function.  We did that because we needed to use the same logic in some
other places.  The reason why 9.3.7/9.4.2 are causing problems for
people that they didn't have previously is because those new,
additional call sites were poorly chosen and didn't include adequate
protection against calling that function with an invalid input value.
What this patch is about is getting back to the situation that we were
in from 9.3.0 - 9.3.6 and 9.4.0 - 9.4.1, where TruncateMultiXact() did
the thing that you're complaining about here but no one else did.

From my point of view, I think that you are absolutely right to
question what's going on in TruncateMultiXact().  It's kooky, and
there may well be bugs buried there.  But I don't think fixing that
should be the priority right now, because we have zero reports of
problems attributable to that logic.  I think the priority should be
on undoing the damage that we did in 9.3.7/9.4.2, when we made other
places to do the same thing.  We started getting trouble reports
attributable to those changes *almost immediately*, which means that
whether or not TruncateMultiXact() is broken, these new call sites
definitely are.  I think we really need to fix those new places ASAP.

> I think at the very least we'll have to skip this step while not yet
> consistent. That really sucks, because we'll possibly end up with
> multixacts that are completely filled by the time we've reached
> consistency.

That would be a departure from the behavior of every existing release
that includes this code based on, to my knowledge, zero trouble
reports.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-general по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Следующее
От: William Dunn
Дата:
Сообщение: Re: Database designpattern - product feature