Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD
От | Amit Kapila |
---|---|
Тема | Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD |
Дата | |
Msg-id | CAA4eK1+27zVUa__1V9ysftw48Yn+QEKSikkLumPjbuRacmHaQQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD (Kuntal Ghosh <kuntalghosh.2007@gmail.com>) |
Ответы |
Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD
(Thomas Munro <thomas.munro@enterprisedb.com>)
Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD (Kuntal Ghosh <kuntalghosh.2007@gmail.com>) |
Список | pgsql-hackers |
On Tue, Dec 12, 2017 at 4:00 PM, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > On Mon, Dec 11, 2017 at 2:26 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Mon, Dec 11, 2017 at 8:14 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >>> Thanks for looking into it. I will see if we can write some test. In >>> the meantime if possible, can you please request Patrick Hemmer to >>> verify the attached patch? >> >> Our discussion was on the #postgresql Freenode channel. I pointed him >> at this thread, but I'm not sure if he'll see my message or be able to >> test. > After discussing with Amit, I'm able to reproduce the scenario in a > master-standby setup. The issue occurs when we perform parallel > index(-only) scan on a BTP_HALF_DEAD -marked page. (If a page is > marked as BTP_DELETED, it's already unlinked from the index). > > When a btree page is deleted during vacuum, it's first marked as > BTP_HALF_DEAD in _bt_mark_page_halfdead and then marked as BTP_DELETED > in _bt_unlink_halfdead_page without releasing cleanup lock on the > buffer. Hence, any scan node cannot lock the same buffer. So, the > issue can't be reproduced on master. > > However, after replaying XLOG_BTREE_MARK_PAGE_HALFDEAD record, standby > releases the lock on the same buffer. If we force parallelism, an > index scan on the same page will cause hang the standby server. > Following is a (unpleasant)way to reproduce the issue: > > In master (with autovacuum = off): > 1. create table t1(a int primary key); > 2. insert into t1 select * from generate_series(1,1000); --generates 3 > leaf nodes (block no 1,2,4) and 1 root node (block no 3) > 3. delete from t1 where a>=367 and a<=735; --delete all tuples pointed by leaf 2 > 4. analyze t1; --update the stats > 5. explain analyze select * from t1 where a>=1 and a<=1000; --ensures > that the next vacuum will consider leaf 2 for page deletion What do you mean by next vacuum, here autovacuum is off? Are you missing any step which manually performs the vacuum? > Now, put a break point at _bt_unlink_halfdead_page, so that vacuum > can't unlink the page. > > In standby, > 1. force parallelism. > 2. explain analyze select * from t1 where a>=1 and a<=1000; and the > parallel workers hang at the above-discussed place! > > The attached patch fixes the issue. One comment on the same: > + else if (scan->parallel_scan != NULL) > + { > + /* allow next page be processed by parallel worker */ > + _bt_parallel_release(scan, opaque->btpo_next); > + } > > /* nope, keep going */ > if (scan->parallel_scan != NULL) > > IMHO, There is no need to add an extra if condition. > _bt_parallel_release can be included in the next one. > I don't think so because it is quite possible that _bt_readpage has already released it (consider the case where _bt_readpage returns false). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления:
Следующее
От: Nikhil SontakkeДата:
Сообщение: Re: [HACKERS] logical decoding of two-phase transactions