Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

Поиск
Список
Период
Сортировка
От Wood, Dan
Тема Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple
Дата
Msg-id 835EB743-A57A-49AE-ADA7-A16ACCB737D7@amazon.com
обсуждение исходный текст
Ответ на Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
I’m unclear on what is being repro’d in 9.6.  Are you getting the duplicate rows problem or just the reindex problem?
Areyou testing with asserts enabled(I’m not)?
 

If you are getting the dup rows consider the code in the block in heapam.c that starts with the comment “replace multi
byupdate xid”.
 
When I repro this I find that MultiXactIdGetUpdateXid() returns 0.  There is an updater in the multixact array however
thestatus is MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I assume this is a preliminary status
beforethe following row in the hot chain has it’s multixact set to NoKeyUpdate.
 

Since a 0 is returned this does precede cutoff_xid and TransactionIdDidCommit(0) will return false.  This ends up
abortingthe multixact on the row even though the real xid is committed.  This sets XMAX to 0 and that row becomes
visibleas one of the dups.  Interestingly the real xid of the updater is 122944 and the cutoff_xid is 122945.
 

I’m still debugging but I start late so I’m passing this incomplete info along now.

On 10/7/17, 4:25 PM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
   Peter Geoghegan wrote:   > On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:   > >> As
youmust have seen, Alvaro said he has a variant of Dan's original   > >> script that demonstrates that a problem
remains,at least on 9.6+,   > >> even with today's fix. I think it's the stress-test that plays with   > >> fillfactor,
manyclients, etc [1].   > >   > > I just execute setup.sql once and then run this shell command,   > >   > > while :;
do  > >         psql -e -P pager=off -f ./repro.sql   > >         for i in `seq 1 5`; do   > >                 psql -P
pager=off-e --no-psqlrc -f ./lock.sql &   > >         done   > >         wait && psql -P pager=off -e --no-psqlrc -f
./reindex.sql  > >         psql -P pager=off -e --no-psqlrc -f ./report.sql   > >         echo "done"   > > done   >
>I cannot reproduce the problem on my personal machine using this   > script/stress-test. I tried to do so on the
masterbranch git tip.   > This reinforces the theory that there is some timing sensitivity,   > because the remaining
racecondition is very narrow.      Hmm, I think I added a random sleep (max. 100ms) right after the
HeapTupleSatisfiesVacuumcall in vacuumlazy.c (lazy_scan_heap), and that   makes the race easier to hit.      --
ÁlvaroHerrera                https://www.2ndQuadrant.com/   PostgreSQL Development, 24x7 Support, Remote DBA, Training
&Services   
 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Следующее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: [HACKERS] Slow synchronous logical replication