Re: Block level parallel vacuum WIP

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Block level parallel vacuum WIP
Дата
Msg-id CAD21AoB70E=wvj0bE5R7g11H3wxkZeODvVEEEd2NLToAtUv1Dg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Block level parallel vacuum WIP  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Ответы Re: Block level parallel vacuum WIP  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Sat, Sep 10, 2016 at 7:44 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>
>
> On Wed, Aug 24, 2016 at 3:31 AM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>>
>> On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> > On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
>> > <michael.paquier@gmail.com> wrote:
>> >> On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada
>> >> <sawada.mshk@gmail.com> wrote:
>> >>> As for PoC, I implemented parallel vacuum so that each worker
>> >>> processes both 1 and 2 phases for particular block range.
>> >>> Suppose we vacuum 1000 blocks table with 4 workers, each worker
>> >>> processes 250 consecutive blocks in phase 1 and then reclaims dead
>> >>> tuples from heap and indexes (phase 2).
>> >>
>> >> So each worker is assigned a range of blocks, and processes them in
>> >> parallel? This does not sound performance-wise. I recall Robert and
>> >> Amit emails on the matter for sequential scan that this would suck
>> >> performance out particularly for rotating disks.
>> >>
>> >
>> > The implementation in patch is same as we have initially thought for
>> > sequential scan, but turned out that it is not good way to do because
>> > it can lead to inappropriate balance of work among workers.  Suppose
>> > one worker is able to finish it's work, it won't be able to do more.
>>
>> Ah, so it was the reason. Thanks for confirming my doubts on what is
>> proposed.
>> --
>
>
> I believe Sawada-san has got enough feedback on the design to work out the
> next steps. It seems natural that the vacuum workers are assigned a portion
> of the heap to scan and collect dead tuples (similar to what patch does) and
> the same workers to be responsible for the second phase of heap scan.

Yeah, thank you for the feedback.

> But as far as index scans are concerned, I agree with Tom that the best
> strategy is to assign a different index to each worker process and let them
> vacuum indexes in parallel.
> That way the work for each worker process is
> clearly cut out and they don't contend for the same resources, which means
> the first patch to allow multiple backends to wait for a cleanup buffer is
> not required. Later we could extend it further such multiple workers can
> vacuum a single index by splitting the work on physical boundaries, but even
> that will ensure clear demarkation of work and hence no contention on index
> blocks.

I also agree with this idea.
Each worker vacuums different indexes and then the leader process
should update all index statistics after parallel mode exited.

I'm implementing this patch but I need to resolve the problem
regarding lock for extension by multiple parallel workers.
In parallel vacuum, multiple workers could try to acquire the
exclusive lock for extension on same relation.
Since acquiring the exclusive lock for extension by multiple workers
is regarded as locking from same locking group, multiple workers
extend fsm or vm at the same time and end up with error.
I thought that it might be involved with parallel update operation, so
I'd like to discuss about this in advance.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kuntal Ghosh
Дата:
Сообщение: Re: WAL consistency check facility
Следующее
От: Rahila Syed
Дата:
Сообщение: Re: Surprising behaviour of \set AUTOCOMMIT ON