Possible duplicate release of buffer lock.
От | Kyotaro HORIGUCHI |
---|---|
Тема | Possible duplicate release of buffer lock. |
Дата | |
Msg-id | 20160803.173116.111915228.horiguchi.kyotaro@lab.ntt.co.jp обсуждение исходный текст |
Ответы |
Re: Possible duplicate release of buffer lock.
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: Possible duplicate release of buffer lock. (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) Re: Possible duplicate release of buffer lock. (Peter Geoghegan <pg@heroku.com>) |
Список | pgsql-hackers |
Hello, I had an inquiry about the following log messages. 2016-07-20 10:16:58.294 JST,,,3240,,578ed102.ca8,1,,2016-07-20 10:16:50 JST,30/75,0,LOG,00000,"no left sibling (concurrentdeletion?) in ""some_index_rel""",,,,,,,,"_bt_unlink_halfdead_page, nbtpage.c:1643","" 2016-07-20 10:16:58.294 JST,,,3240,,578ed102.ca8,2,,2016-07-20 10:16:50 JST,30/75,0,ERROR,XX000,"lock main 13879 is not held",,,,,"automaticvacuum of table ""db.nsp.tbl""",,,"LWLockRelease, lwlock.c:1137","" These are gotten after pg_upgrade from 9.1.13 to 9.4. The first line is emitted for simultaneous deletion of a index page, which is impossible by design in a consistent state so the complained situation should be the result of an index corruption before upgading, specifically, inconsistent sibling pointers around a deleted page. I noticed the following part in nbtpage.c related to this. It is the same still in the master. nbtpage.c:1635@9.4.8: > while (P_ISDELETED(opaque) || opaque->btpo_next != target) > { > /* step right one page */ > leftsib = opaque->btpo_next; > _bt_relbuf(rel, lbuf); > if (leftsib == P_NONE) > { > elog(LOG, "no left sibling (concurrent deletion?) in \"%s\"", > RelationGetRelationName(rel)); > return false; With the condition for the while loop, if the just left sibling of target is (mistakenly, of course) in deleted state (and the target is somehow pointing to the deleted page as left sibling), lbuf finally goes beyond to right side of the target. This seems to result in unintentional releasing of the lock on target and the second log message. My point here is that if concurrent deletion can't be perfomed by the current implement, this while loop could be removed and immediately error out or log a message, > if (P_ISDELETED(opaque) || opaque->btpo_next != target) > { > elog(ERROR, "no left sibling of page %d (concurrent deletion?) in \"%s\"",.. or, the while loop at least should stop before overshooting the target. > while (P_ISDELETED(opaque) || opaque->btpo_next != target) > { > /* step right one page */ > leftsib = opaque->btpo_next; > _bt_relbuf(rel, lbuf); > if (leftsib == target || leftsib == P_NONE) > { > elog(ERROR, "no left sibling of page %d (concurrent deletion?) in \"%s\"",.. I'd like to propose to do the former since the latter still is not perfect for such situations, anyway. Any thoughts or opinions? regards, -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-hackers по дате отправления:
Следующее
От: Ashutosh SharmaДата:
Сообщение: OldSnapshotTimemapLock information is missing in monitoring.sgml file