Re: [HACKERS] Block level parallel vacuum

Поиск
Список
Период
Сортировка
От Mahendra Singh
Тема Re: [HACKERS] Block level parallel vacuum
Дата
Msg-id CAKYtNAq8aJPy1ArMOqwQ-v6XHKWJAJ72hAdgkfOS5xvt2cp23Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Block level parallel vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: [HACKERS] Block level parallel vacuum  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers
Hi
I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum" mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for all existence test suite.
For reference, I am attaching patch.

What does this patch?
As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test, I used existence guc force_parallel_mode and tested parallel vacuuming.

If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallel workers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option is not given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader in this case), but if there is more than one index, then i am using leader as a worker for one index and launching workers for all other indexes.

After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)

I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode is set as on, or we should use new GUC to test parallel worker vacuum for existence test suite?

Please let me know your thoughts for this patch.

Thanks and Regards
Mahendra Thalor

On Tue, 29 Oct 2019 at 12:37, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > What do you think?
> > > > >
> > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > Otherwise, I will try to write it after finishing the first one
> > > > > (approach b).
> > > > >
> > > > I have come up with the POC for approach (a).
>
> > > Can we compute the overall throttling (sleep time) in the operation
> > > separately for heap and index, then divide the index's sleep_time with
> > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > a bit easier to compare the data between parallel and non-parallel
> > > case.
> I have come up with a patch to compute the total delay during the
> vacuum.  So the idea of computing the total cost delay is
>
> Total cost delay = Total dealy of heap scan + Total dealy of
> index/worker;  Patch is attached for the same.
>
> I have prepared this patch on the latest patch of the parallel
> vacuum[1].  I have also rebased the patch for the approach [b] for
> dividing the vacuum cost limit and done some testing for computing the
> I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> rebased on top of v31-0006, because v31-0006 is implementing the I/O
> throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> doing the same with another approach.   But,
> 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> well (just 1-2 lines conflict).
>
> Testing:  I have performed 2 tests, one with the same size indexes and
> second with the different size indexes and measured total I/O delay
> with the attached patch.
>
> Setup:
> VacuumCostDelay=10ms
> VacuumCostLimit=2000
>
> Test1 (Same size index):
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b);
> create index idx3 on test(c);
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
>                       Vacuum (Head)                   Parallel Vacuum
>            Vacuum Cost Divide Patch
> Total Delay        1784 (ms)                           1398(ms)
>                  1938(ms)
>
>
> Test2 (Variable size dead tuple in index)
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b) where a > 100000;
> create index idx3 on test(c) where a > 150000;
>
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
> Vacuum (Head)                                   Parallel Vacuum
>               Vacuum Cost Divide Patch
> Total Delay 1438 (ms)                               1029(ms)
>                    1529(ms)
>
>
> Conclusion:
> 1. The tests prove that the total I/O delay is significantly less with
> the parallel vacuum.
> 2. With the vacuum cost divide the problem is solved but the delay bit
> more compared to the non-parallel version.  The reason could be the
> problem discussed at[2], but it needs further investigation.
>
> Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> will also try to test different types of indexes.
>

Thank you for testing!

I realized that v31-0006 patch doesn't work fine so I've attached the
updated version patch that also incorporated some comments I got so
far. Sorry for the inconvenience. I'll apply your 0001 patch and also
test the total delay time.

Regards,

--
Masahiko Sawada
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: [HACKERS] WAL logging problem in 9.4.3?
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: cost based vacuum (parallel)