Re: [HACKERS] Parallel Index Scans

Поиск

Список

Период

Сортировка

От	Anastasia Lubennikova
Тема	Re: [HACKERS] Parallel Index Scans
Дата	23 декабря 2016 г. 16:12:22
Msg-id	4a379d09-6226-6445-21d9-68434b251441@postgrespro.ru обсуждение
Ответ на	Re: [HACKERS] Parallel Index Scans (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	Re: [HACKERS] Parallel Index Scans
Список	pgsql-hackers

Дерево обсуждения

22.12.2016 07:19, Amit Kapila:
> On Wed, Dec 21, 2016 at 8:46 PM, Anastasia Lubennikova
> <lubennikovaav@gmail.com> wrote:
>> The following review has been posted through the commitfest application:
>> make installcheck-world:  tested, passed
>> Implements feature:       tested, passed
>> Spec compliant:           tested, passed
>> Documentation:            tested, passed
>>
>> Hi, thank you for the patch.
>> Results are very promising. Do you see any drawbacks of this feature or something that requires more testing?
>>
> I think you can focus on the handling of array scan keys for testing.
> In general, one of my colleagues has shown interest in testing this
> patch and I think he has tested as well but never posted his findings.
> I will request him to share his findings and what kind of tests he has
> done, if any.


Check please code related to buffer locking and pinning once again.
I got the warning. Here are the steps to reproduce it:
Except "autovacuum = off" config is default.

pgbench -i -s 100 test
pgbench -c 10 -T 120 test
    SELECT count(aid) FROM pgbench_accounts    WHERE aid > 1000 AND aid < 900000 AND bid > 800 AND bid < 900;
WARNING:  buffer refcount leak: [8297] (rel=base/12289/16459, 
blockNum=2469, flags=0x93800000, refcount=1 1) count
-------     0
(1 row)

postgres=# select 16459::regclass;       regclass
----------------------- pgbench_accounts_pkey


>> 2. Don't you mind to rename 'amestimateparallelscan' to let's say 'amparallelscan_spacerequired'
>> or something like this?
> Sure, I am open to other names, but IMHO, lets keep "estimate" in the
> name to keep it consistent with other parallel stuff. Refer
> execParallel.c to see how widely this word is used.
>
>> As far as I understand there is nothing to estimate, we know this size
>> for sure. I guess that you've chosen this name because of 'heap_parallelscan_estimate'.
>> But now it looks similar to 'amestimate' which refers to indexscan cost for optimizer.
>> That leads to the next question.
>>
> Do you mean 'amcostestimate'?  If you want we can rename it
> amparallelscanestimate to be consistent with amcostestimate.

I think that 'amparallelscanestimate' seems less ambiguous than 
amestimateparallelscan.
But it's up to you. There are enough comments to understand the purpose 
of this field.
>
>> And why do you call it pageStatus? What does it have to do with page?
>>
> During scan this tells us whether next page is available for scan.
> Another option could be to name it as scanStatus, but not sure if that
> is better.  Do you think if we add a comment like "indicates whether
> next page is available for scan" for this variable then it will be
> clear?

Yes, I think it describes the flag better.
>> 5. Comment for _bt_parallel_seize() says:
>> "False indicates that we have reached the end of scan for
>>   current scankeys and for that we return block number as P_NONE."
>>
>>   What is the reason to check (blkno == P_NONE) after checking (status == false)
>>   in _bt_first() (see code below)? If comment is correct
>>   we'll never reach _bt_parallel_done()
>>
>> +               blkno = _bt_parallel_seize(scan, &status);
>> +               if (status == false)
>> +               {
>> +                       BTScanPosInvalidate(so->currPos);
>> +                       return false;
>> +               }
>> +               else if (blkno == P_NONE)
>> +               {
>> +                       _bt_parallel_done(scan);
>> +                       BTScanPosInvalidate(so->currPos);
>> +                       return false;
>> +               }
>>
> The first time master backend or worker hits last page (calls this
> API), it will return P_NONE and after that any worker tries to fetch
> next page, it will return status as false.  I think we can expand a
> comment to explain it clearly.  Let me know, if you need more
> clarification, I can explain it in detail.
>
Got it,
I think you can add this explanation to the comment for 
_bt_parallel_seize().

>> 6. To avoid code duplication, I would wrap this into the function
>>
>> +       /* initialize moreLeft/moreRight appropriately for scan direction */
>> +       if (ScanDirectionIsForward(dir))
>> +       {
>> +               so->currPos.moreLeft = false;
>> +               so->currPos.moreRight = true;
>> +       }
>> +       else
>> +       {
>> +               so->currPos.moreLeft = true;
>> +               so->currPos.moreRight = false;
>> +       }
>> +       so->numKilled = 0;                      /* just paranoia */
>> +       so->markItemIndex = -1;         /* ditto */
>>
> Okay, I think we can write a separate function (probably inline
> function) for above.
>
>> And after that we can also get rid of _bt_parallel_readpage() which only
>> bring another level of indirection to the code.
>>
> See, this function is responsible for multiple actions like
> initializing moreLeft/moreRight positions, reading next page, dropping
> the lock/pin.  So replicating all these actions in the caller will
> make the code in caller less readable as compared to now.  Consider
> this point and let me know your view on same.

Thank you for clarification, now I agree with your implementation.
I've just missed that we also handle lock in this function.


Performance results with 2 parallel workers are about 1.5-3 times 
better, just like in your tests.
So, no doubt, this feature will be useful.
But I'm trying to find the worst cases for this feature. And I suppose 
we should test parallel index scans with
concurrent insertions. More parallel readers we have, higher the 
concurrency.
I doubt that it can significantly decrease performance, because number 
of parallel readers is not that big,
but it is worth testing.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Parallel Index Scans