Re: [BUGS] Old row version in hot chain become visible after a freeze
От | Alvaro Herrera |
---|---|
Тема | Re: [BUGS] Old row version in hot chain become visible after a freeze |
Дата | |
Msg-id | 20170911140103.5akxptyrwgpc25bw@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: [BUGS] Old row version in hot chain become visible after a freeze (Michael Paquier <michael.paquier@gmail.com>) |
Ответы |
Re: [BUGS] Old row version in hot chain become visible after a freeze
(Michael Paquier <michael.paquier@gmail.com>)
Re: [BUGS] Old row version in hot chain become visible after a freeze (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Список | pgsql-bugs |
Michael Paquier wrote: > On Wed, Sep 6, 2017 at 7:40 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Otherwise, we risk marking the page as all-frozen, and it would be > > skipped by vacuum. If we never come around to HOT-pruning the page, a > > non-permanent xid (or a multixact? not sure that that can happen; > > probably not) would linger unnoticed and cause a DoS condition later > > ("cannot open file pg_clog/1234") when the tuple header is read. > > Yes, you are definitely right here. > > On Wed, Sep 6, 2017 at 10:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Another thing is that if you're going through heap_copy_data, the tuple > > is only checked for DEAD, not RECENTLY_DEAD. Maybe there is no bug here > > but I'm not sure yet. > > Not sure to what you are referring here. Perhaps heap_prune_chain()? > It seems to me that it does the right thing. I meant copy_heap_data, which also "freezes" tuples while copying the heap. In the end, I concluded the fact that it runs with access exclusive lock is sufficient protection. I was worried that in the future we might improve that code and enable it to run without AEL, but after looking at it a little more, it sounds like a big enough project that this is going to be the least of its troubles. Anyway, in order to test this, I tried running this one-liner (files attached): psql -f /tmp/repro.sql ; for i in `seq 1 50`; do psql --no-psqlrc -f /tmp/lock.sql & done (I also threw in a small sleep between heap_page_prune and HeapTupleSatisfiesVacuum while testing, just to widen the problem window to hopefully make any remaining problems more evident.) This turned up a few different failure modes, which I fixed until no further problems arose. With the attached patch, I no longer see any failures (assertion failures) or misbehavior (additional rows), in a few dozen runs, which were easy to come by with the original code. The resulting patch, which I like better than the previously proposed idea of skipping the freeze, takes the approach of handling freeze correctly for the cases where the tuple still exists after pruning. I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple cannot be recorded, as observed by Yi Wen. AFAICS that's not reachable because of the way the array is allocated, so an elog(ERROR) is sufficient. I regret my inability to turn the oneliner into a committable test case, but I think that's beyond what I can do for now. FWIW running this against 9.2 I couldn't detect any problems, which I suppose was expected. Changing the lock mode in the test file to FOR SHARE I only see an enormous amount of deadlocks reported (which makes me confident that doing multixact stuff in 9.3 was a good thing, despite what bug chasers may think.) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Вложения
В списке pgsql-bugs по дате отправления: