Re: [BUGS] Old row version in hot chain become visible after a freeze

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: [BUGS] Old row version in hot chain become visible after a freeze
Дата
Msg-id 20170912082203.gq5hutuveqlxpvuc@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: [BUGS] Old row version in hot chain become visible after a freeze  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: [BUGS] Old row version in hot chain become visible after a freeze  (Michael Paquier <michael.paquier@gmail.com>)
Re: [BUGS] Old row version in hot chain become visible after a freeze  ("Wong, Yi Wen" <yiwong@amazon.com>)
Список pgsql-bugs
Michael Paquier wrote:
> On Mon, Sep 11, 2017 at 11:01 PM, Alvaro Herrera
> <alvherre@alvh.no-ip.org> wrote:
> > (I also threw in a small sleep between heap_page_prune and
> > HeapTupleSatisfiesVacuum while testing, just to widen the problem window
> > to hopefully make any remaining problems more evident.)
> 
> I am understanding that you mean heap_prepare_freeze_tuple here
> instead of heap_page_prune.

Hmm ... no, I meant adding a sleep after the page is pruned, before
HeapTupleSatisfiesVacuum call that determines the action with regards to
freezing.

> > This turned up a few different failure modes, which I fixed until no
> > further problems arose.  With the attached patch, I no longer see any
> > failures (assertion failures) or misbehavior (additional rows), in a few
> > dozen runs, which were easy to come by with the original code.
> 
> Well, you simply removed the assertion ;), and my tests don't show
> additional rows as well, which is nice.

Yeah, the assertion was wrong -- it was essentially assuming that the
window above (between page pruning and HTSV) was of zero size, which is
evidently what caused this whole disaster.

> > The
> > resulting patch, which I like better than the previously proposed idea
> > of skipping the freeze, takes the approach of handling freeze correctly
> > for the cases where the tuple still exists after pruning.
> 
> That's also something I was wondering when looking at the first patch.
> I am unfortunately not as skilled as you are with this area of the
> code (this thread has brought its quantity of study!), so I was not
> able to draw a clear line with what needs to be done. But I am clearly
> +1 with this approach.

Great, thanks.

> > I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple
> > cannot be recorded, as observed by Yi Wen.  AFAICS that's not reachable
> > because of the way the array is allocated, so an elog(ERROR) is
> > sufficient.
> >
> > I regret my inability to turn the oneliner into a committable test case,
> > but I think that's beyond what I can do for now.
> 
> Here are some comments about your last patch.
> 
> heap_tuple_needs_freeze looks to be still consistent with
> heap_prepare_freeze_tuple even after what you have changed, which is
> good.

Thanks for confirming.

> Using again the test of Dan at the top of the thread, I am seeing from
> time to time what looks like garbage data in xmax, like that:
>  ctid  | xmin | xmax | id
> -------+------+------+----
>  (0,1) |  620 |    0 |  1
>  (0,7) |  625 |   84 |  3
> (2 rows)
> [...]
>  ctid  | xmin | xmax | id
> -------+------+------+----
>  (0,1) |  656 |    0 |  1
>  (0,6) |  661 |  128 |  3
> (2 rows)

I bet those are multixact values rather than garbage.  You should see
HEAP_XMAX_IS_MULTI in t_infomask in those cases, and the values should
be near NextMultiXactId (make sure to checkpoint if you examine with
pg_controldata; I think it's easier to obtain it from shmem.  Or you
could just use pg_get_multixact_members()).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [BUGS] Old row version in hot chain become visible after a freeze
Следующее
От: Maksim Karaba
Дата:
Сообщение: Re: [BUGS] BUG #14781: server process was terminated by signal 11:Segmentation fault