Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD

Поиск

Список

Период

Сортировка

От	Kuntal Ghosh
Тема	Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD
Дата	12 декабря 2017 г. 16:30:39
Msg-id	CAGz5QCLXjeTqZkW=s1iMUgnoOCCp9SPEPghxUffpkXWD=C-E_Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы	Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD
Список	pgsql-hackers

Дерево обсуждения

On Mon, Dec 11, 2017 at 2:26 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Mon, Dec 11, 2017 at 8:14 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>> Thanks for looking into it.  I will see if we can write some test.  In
>> the meantime if possible, can you please request Patrick Hemmer to
>> verify the attached patch?
>
> Our discussion was on the #postgresql Freenode channel.  I pointed him
> at this thread, but I'm not sure if he'll see my message or be able to
> test.
After discussing with Amit, I'm able to reproduce the scenario in a
master-standby setup. The issue occurs when we perform parallel
index(-only) scan on a BTP_HALF_DEAD -marked page. (If a page is
marked as BTP_DELETED, it's already unlinked from the index).

When a btree page is deleted during vacuum, it's first marked as
BTP_HALF_DEAD in _bt_mark_page_halfdead and then marked as BTP_DELETED
in _bt_unlink_halfdead_page without releasing cleanup lock on the
buffer. Hence, any scan node cannot lock the same buffer. So, the
issue can't be reproduced on master.

However, after replaying XLOG_BTREE_MARK_PAGE_HALFDEAD record, standby
releases the lock on the same buffer. If we force parallelism, an
index scan on the same page will cause hang the standby server.
Following is a (unpleasant)way to reproduce the issue:

In master (with autovacuum = off):
1. create table t1(a int primary key);
2. insert into t1 select * from generate_series(1,1000); --generates 3
leaf nodes (block no 1,2,4) and 1 root node (block no 3)
3. delete from t1 where a>=367 and a<=735; --delete all tuples pointed by leaf 2
4. analyze t1; --update the stats
5. explain analyze select * from t1 where a>=1 and a<=1000; --ensures
that the next vacuum will consider leaf 2 for page deletion
Now, put a break point at _bt_unlink_halfdead_page, so that vacuum
can't unlink the page.

In standby,
1. force parallelism.
2. explain analyze select * from t1 where a>=1 and a<=1000; and the
parallel workers hang at the above-discussed place!

The attached patch fixes the issue. One comment on the same:
+ else if (scan->parallel_scan != NULL)
+ {
+ /* allow next page be processed by parallel worker */
+ _bt_parallel_release(scan, opaque->btpo_next);
+ }

  /* nope, keep going */
  if (scan->parallel_scan != NULL)

IMHO, There is no need to add an extra if condition.
_bt_parallel_release can be included in the next one.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Ashutosh Bapat
Дата: 12 декабря 2017 г., 16:13:11
Сообщение: Re: [HACKERS] Partition-wise aggregation/grouping

Следующее

От: Jesper Pedersen
Дата: 12 декабря 2017 г., 16:34:53
Сообщение: Re: [HACKERS] Runtime Partition Pruning

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD

Предыдущее

Следующее