Re: [BUGS] Old row version in hot chain become visible after a freeze
От | Alvaro Herrera |
---|---|
Тема | Re: [BUGS] Old row version in hot chain become visible after a freeze |
Дата | |
Msg-id | 20170912082203.gq5hutuveqlxpvuc@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: [BUGS] Old row version in hot chain become visible after a freeze (Michael Paquier <michael.paquier@gmail.com>) |
Ответы |
Re: [BUGS] Old row version in hot chain become visible after a freeze
(Michael Paquier <michael.paquier@gmail.com>)
Re: [BUGS] Old row version in hot chain become visible after a freeze ("Wong, Yi Wen" <yiwong@amazon.com>) |
Список | pgsql-bugs |
Michael Paquier wrote: > On Mon, Sep 11, 2017 at 11:01 PM, Alvaro Herrera > <alvherre@alvh.no-ip.org> wrote: > > (I also threw in a small sleep between heap_page_prune and > > HeapTupleSatisfiesVacuum while testing, just to widen the problem window > > to hopefully make any remaining problems more evident.) > > I am understanding that you mean heap_prepare_freeze_tuple here > instead of heap_page_prune. Hmm ... no, I meant adding a sleep after the page is pruned, before HeapTupleSatisfiesVacuum call that determines the action with regards to freezing. > > This turned up a few different failure modes, which I fixed until no > > further problems arose. With the attached patch, I no longer see any > > failures (assertion failures) or misbehavior (additional rows), in a few > > dozen runs, which were easy to come by with the original code. > > Well, you simply removed the assertion ;), and my tests don't show > additional rows as well, which is nice. Yeah, the assertion was wrong -- it was essentially assuming that the window above (between page pruning and HTSV) was of zero size, which is evidently what caused this whole disaster. > > The > > resulting patch, which I like better than the previously proposed idea > > of skipping the freeze, takes the approach of handling freeze correctly > > for the cases where the tuple still exists after pruning. > > That's also something I was wondering when looking at the first patch. > I am unfortunately not as skilled as you are with this area of the > code (this thread has brought its quantity of study!), so I was not > able to draw a clear line with what needs to be done. But I am clearly > +1 with this approach. Great, thanks. > > I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple > > cannot be recorded, as observed by Yi Wen. AFAICS that's not reachable > > because of the way the array is allocated, so an elog(ERROR) is > > sufficient. > > > > I regret my inability to turn the oneliner into a committable test case, > > but I think that's beyond what I can do for now. > > Here are some comments about your last patch. > > heap_tuple_needs_freeze looks to be still consistent with > heap_prepare_freeze_tuple even after what you have changed, which is > good. Thanks for confirming. > Using again the test of Dan at the top of the thread, I am seeing from > time to time what looks like garbage data in xmax, like that: > ctid | xmin | xmax | id > -------+------+------+---- > (0,1) | 620 | 0 | 1 > (0,7) | 625 | 84 | 3 > (2 rows) > [...] > ctid | xmin | xmax | id > -------+------+------+---- > (0,1) | 656 | 0 | 1 > (0,6) | 661 | 128 | 3 > (2 rows) I bet those are multixact values rather than garbage. You should see HEAP_XMAX_IS_MULTI in t_infomask in those cases, and the values should be near NextMultiXactId (make sure to checkpoint if you examine with pg_controldata; I think it's easier to obtain it from shmem. Or you could just use pg_get_multixact_members()). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Michael PaquierДата:
Сообщение: Re: [BUGS] Old row version in hot chain become visible after a freeze
Следующее
От: Maksim KarabaДата:
Сообщение: Re: [BUGS] BUG #14781: server process was terminated by signal 11:Segmentation fault