Обсуждение: Re: [HACKERS] Block level parallel vacuum

Поиск
Список
Период
Сортировка

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier
<> wrote:
> On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <> wrote:
>> Yeah, I was thinking the commit is relevant with this issue but as
>> Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
>> I don't find out the cause of this issue yet. With the previous
>> version patch, autovacuum workers were woking with one parallel worker
>> but it never drops relations. So it's possible that the error might
>> not have been relevant with the patch but anywayI'll continue to work
>> on that.
>
> This depends on the extension lock patch from
> https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=/
> if I am following correctly. So I propose to mark this patch as
> returned with feedback for now, and come back to it once the root
> problems are addressed. Feel free to correct me if you think that's
> not adapted.

I've re-designed the parallel vacuum patch. Attached the latest
version patch. As the discussion so far, this patch depends on the
extension lock patch[1]. However I think we can discuss the design
part of parallel vacuum independently from that patch. That's way I'm
proposing the new patch. In this patch, I structured and refined the
lazy_scan_heap() because it's a single big function and not suitable
for making it parallel.

The parallel vacuum worker processes keep waiting for commands from
the parallel vacuum leader process. Before entering each phase of lazy
vacuum such as scanning heap, vacuum index and vacuum heap, the leader
process changes the all workers state to the next state. Vacuum worker
processes do the job according to the their state and wait for the
next command after finished. Also in before entering the next phase,
the leader process does some preparation works while vacuum workers is
sleeping; for example, clearing shared dead tuple space before
entering the 'scanning heap' phase. The status of vacuum workers are
stored into a DSM area pointed by WorkerState variables, and
controlled by the leader process. FOr the basic design and performance
improvements please refer to my presentation at PGCon 2018[2].

The number of parallel vacuum workers is determined according to
either the table size or PARALLEL option in VACUUM command. The
maximum of parallel workers is max_parallel_maintenance_workers.

I've separated the code for vacuum worker process to
backends/commands/vacuumworker.c, and created
includes/commands/vacuum_internal.h file to declare the definitions
for the lazy vacuum.

For autovacuum, this patch allows autovacuum worker process to use the
parallel option according to the relation size or the reloption. But
autovacuum delay, since there is no slots for parallel worker of
autovacuum in AutoVacuumShmem this patch doesn't support the change of
the autovacuum delay configuration during running.

Please apply this patch with the extension lock patch[1] when testing
as this patch can try to extend visibility map pages concurrently.

[1] https://www.postgresql.org/message-id/CAD21AoBn8WbOt21MFfj1mQmL2ZD8KVgMHYrOe1F5ozsQC4Z_hw%40mail.gmail.com
[2] https://www.pgcon.org/2018/schedule/events/1202.en.html

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Aug 14, 2018 at 9:31 AM Masahiko Sawada <> wrote:
>
> On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier
> <> wrote:
> > On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <> wrote:
> >> Yeah, I was thinking the commit is relevant with this issue but as
> >> Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
> >> I don't find out the cause of this issue yet. With the previous
> >> version patch, autovacuum workers were woking with one parallel worker
> >> but it never drops relations. So it's possible that the error might
> >> not have been relevant with the patch but anywayI'll continue to work
> >> on that.
> >
> > This depends on the extension lock patch from
> > https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=/
> > if I am following correctly. So I propose to mark this patch as
> > returned with feedback for now, and come back to it once the root
> > problems are addressed. Feel free to correct me if you think that's
> > not adapted.
>
> I've re-designed the parallel vacuum patch. Attached the latest
> version patch. As the discussion so far, this patch depends on the
> extension lock patch[1]. However I think we can discuss the design
> part of parallel vacuum independently from that patch. That's way I'm
> proposing the new patch. In this patch, I structured and refined the
> lazy_scan_heap() because it's a single big function and not suitable
> for making it parallel.
>
> The parallel vacuum worker processes keep waiting for commands from
> the parallel vacuum leader process. Before entering each phase of lazy
> vacuum such as scanning heap, vacuum index and vacuum heap, the leader
> process changes the all workers state to the next state. Vacuum worker
> processes do the job according to the their state and wait for the
> next command after finished. Also in before entering the next phase,
> the leader process does some preparation works while vacuum workers is
> sleeping; for example, clearing shared dead tuple space before
> entering the 'scanning heap' phase. The status of vacuum workers are
> stored into a DSM area pointed by WorkerState variables, and
> controlled by the leader process. FOr the basic design and performance
> improvements please refer to my presentation at PGCon 2018[2].
>
> The number of parallel vacuum workers is determined according to
> either the table size or PARALLEL option in VACUUM command. The
> maximum of parallel workers is max_parallel_maintenance_workers.
>
> I've separated the code for vacuum worker process to
> backends/commands/vacuumworker.c, and created
> includes/commands/vacuum_internal.h file to declare the definitions
> for the lazy vacuum.
>
> For autovacuum, this patch allows autovacuum worker process to use the
> parallel option according to the relation size or the reloption. But
> autovacuum delay, since there is no slots for parallel worker of
> autovacuum in AutoVacuumShmem this patch doesn't support the change of
> the autovacuum delay configuration during running.
>

Attached rebased version patch to the current HEAD.

> Please apply this patch with the extension lock patch[1] when testing
> as this patch can try to extend visibility map pages concurrently.
>

Because the patch leads performance degradation in the case where
bulk-loading to a partitioned table I think that the original
proposal, which makes group locking conflict when relation extension
locks, is more realistic approach. So I worked on this with the simple
patch instead of [1]. Attached three patches:

* 0001 patch publishes some static functions such as
heap_paralellscan_startblock_init so that the parallel vacuum code can
use them.
* 0002 patch makes the group locking conflict when relation extension locks.
* 0003 patch add paralel option to lazy vacuum.

Please review them.

[1] https://www.postgresql.org/message-id/CAD21AoBn8WbOt21MFfj1mQmL2ZD8KVgMHYrOe1F5ozsQC4Z_hw%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Oct 30, 2018 at 5:30 PM Masahiko Sawada <> wrote:
>
> On Tue, Aug 14, 2018 at 9:31 AM Masahiko Sawada <> wrote:
> >
> > On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier
> > <> wrote:
> > > On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <> wrote:
> > >> Yeah, I was thinking the commit is relevant with this issue but as
> > >> Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
> > >> I don't find out the cause of this issue yet. With the previous
> > >> version patch, autovacuum workers were woking with one parallel worker
> > >> but it never drops relations. So it's possible that the error might
> > >> not have been relevant with the patch but anywayI'll continue to work
> > >> on that.
> > >
> > > This depends on the extension lock patch from
> > > https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=/
> > > if I am following correctly. So I propose to mark this patch as
> > > returned with feedback for now, and come back to it once the root
> > > problems are addressed. Feel free to correct me if you think that's
> > > not adapted.
> >
> > I've re-designed the parallel vacuum patch. Attached the latest
> > version patch. As the discussion so far, this patch depends on the
> > extension lock patch[1]. However I think we can discuss the design
> > part of parallel vacuum independently from that patch. That's way I'm
> > proposing the new patch. In this patch, I structured and refined the
> > lazy_scan_heap() because it's a single big function and not suitable
> > for making it parallel.
> >
> > The parallel vacuum worker processes keep waiting for commands from
> > the parallel vacuum leader process. Before entering each phase of lazy
> > vacuum such as scanning heap, vacuum index and vacuum heap, the leader
> > process changes the all workers state to the next state. Vacuum worker
> > processes do the job according to the their state and wait for the
> > next command after finished. Also in before entering the next phase,
> > the leader process does some preparation works while vacuum workers is
> > sleeping; for example, clearing shared dead tuple space before
> > entering the 'scanning heap' phase. The status of vacuum workers are
> > stored into a DSM area pointed by WorkerState variables, and
> > controlled by the leader process. FOr the basic design and performance
> > improvements please refer to my presentation at PGCon 2018[2].
> >
> > The number of parallel vacuum workers is determined according to
> > either the table size or PARALLEL option in VACUUM command. The
> > maximum of parallel workers is max_parallel_maintenance_workers.
> >
> > I've separated the code for vacuum worker process to
> > backends/commands/vacuumworker.c, and created
> > includes/commands/vacuum_internal.h file to declare the definitions
> > for the lazy vacuum.
> >
> > For autovacuum, this patch allows autovacuum worker process to use the
> > parallel option according to the relation size or the reloption. But
> > autovacuum delay, since there is no slots for parallel worker of
> > autovacuum in AutoVacuumShmem this patch doesn't support the change of
> > the autovacuum delay configuration during running.
> >
>
> Attached rebased version patch to the current HEAD.
>
> > Please apply this patch with the extension lock patch[1] when testing
> > as this patch can try to extend visibility map pages concurrently.
> >
>
> Because the patch leads performance degradation in the case where
> bulk-loading to a partitioned table I think that the original
> proposal, which makes group locking conflict when relation extension
> locks, is more realistic approach. So I worked on this with the simple
> patch instead of [1]. Attached three patches:
>
> * 0001 patch publishes some static functions such as
> heap_paralellscan_startblock_init so that the parallel vacuum code can
> use them.
> * 0002 patch makes the group locking conflict when relation extension locks.
> * 0003 patch add paralel option to lazy vacuum.
>
> Please review them.
>

Oops, forgot to attach patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Yura Sokolov
Дата:
Excuse me for being noisy.

Increasing vacuum's ring buffer improves vacuum upto 6 times.
https://www.postgresql.org/message-id/flat/20170720190405.GM1769%40tamriel.snowman.net
This is one-line change.

How much improvement parallel vacuum gives?

31.10.2018 3:23, Masahiko Sawada пишет:
> On Tue, Oct 30, 2018 at 5:30 PM Masahiko Sawada <> wrote:
>>
>> On Tue, Aug 14, 2018 at 9:31 AM Masahiko Sawada <> wrote:
>>>
>>> On Thu, Nov 30, 2017 at 11:09 AM, Michael Paquier
>>> <> wrote:
>>>> On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <> wrote:
>>>>> Yeah, I was thinking the commit is relevant with this issue but as
>>>>> Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
>>>>> I don't find out the cause of this issue yet. With the previous
>>>>> version patch, autovacuum workers were woking with one parallel worker
>>>>> but it never drops relations. So it's possible that the error might
>>>>> not have been relevant with the patch but anywayI'll continue to work
>>>>> on that.
>>>>
>>>> This depends on the extension lock patch from
>>>> https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=/
>>>> if I am following correctly. So I propose to mark this patch as
>>>> returned with feedback for now, and come back to it once the root
>>>> problems are addressed. Feel free to correct me if you think that's
>>>> not adapted.
>>>
>>> I've re-designed the parallel vacuum patch. Attached the latest
>>> version patch. As the discussion so far, this patch depends on the
>>> extension lock patch[1]. However I think we can discuss the design
>>> part of parallel vacuum independently from that patch. That's way I'm
>>> proposing the new patch. In this patch, I structured and refined the
>>> lazy_scan_heap() because it's a single big function and not suitable
>>> for making it parallel.
>>>
>>> The parallel vacuum worker processes keep waiting for commands from
>>> the parallel vacuum leader process. Before entering each phase of lazy
>>> vacuum such as scanning heap, vacuum index and vacuum heap, the leader
>>> process changes the all workers state to the next state. Vacuum worker
>>> processes do the job according to the their state and wait for the
>>> next command after finished. Also in before entering the next phase,
>>> the leader process does some preparation works while vacuum workers is
>>> sleeping; for example, clearing shared dead tuple space before
>>> entering the 'scanning heap' phase. The status of vacuum workers are
>>> stored into a DSM area pointed by WorkerState variables, and
>>> controlled by the leader process. FOr the basic design and performance
>>> improvements please refer to my presentation at PGCon 2018[2].
>>>
>>> The number of parallel vacuum workers is determined according to
>>> either the table size or PARALLEL option in VACUUM command. The
>>> maximum of parallel workers is max_parallel_maintenance_workers.
>>>
>>> I've separated the code for vacuum worker process to
>>> backends/commands/vacuumworker.c, and created
>>> includes/commands/vacuum_internal.h file to declare the definitions
>>> for the lazy vacuum.
>>>
>>> For autovacuum, this patch allows autovacuum worker process to use the
>>> parallel option according to the relation size or the reloption. But
>>> autovacuum delay, since there is no slots for parallel worker of
>>> autovacuum in AutoVacuumShmem this patch doesn't support the change of
>>> the autovacuum delay configuration during running.
>>>
>>
>> Attached rebased version patch to the current HEAD.
>>
>>> Please apply this patch with the extension lock patch[1] when testing
>>> as this patch can try to extend visibility map pages concurrently.
>>>
>>
>> Because the patch leads performance degradation in the case where
>> bulk-loading to a partitioned table I think that the original
>> proposal, which makes group locking conflict when relation extension
>> locks, is more realistic approach. So I worked on this with the simple
>> patch instead of [1]. Attached three patches:
>>
>> * 0001 patch publishes some static functions such as
>> heap_paralellscan_startblock_init so that the parallel vacuum code can
>> use them.
>> * 0002 patch makes the group locking conflict when relation extension locks.
>> * 0003 patch add paralel option to lazy vacuum.
>>
>> Please review them.
>>
> 
> Oops, forgot to attach patches.
> 
> Regards,
> 
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
> 



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
Hi,

On Thu, Nov 1, 2018 at 2:28 PM Yura Sokolov <> wrote:
>
> Excuse me for being noisy.
>
> Increasing vacuum's ring buffer improves vacuum upto 6 times.
> https://www.postgresql.org/message-id/flat/20170720190405.GM1769%40tamriel.snowman.net
> This is one-line change.
>
> How much improvement parallel vacuum gives?

It depends on hardware resources you can use.

In current design the scanning heap and vacuuming heap are procesed
with parallel workers at block level (using parallel sequential scan)
and the vacuuming indexes are also processed with parallel worker at
index-level. So even if a table is not large enough the more a table
has indexes you can get better performance. The performance test
result (I attached) I did before shows that parallel vacuum is up to
almost 10 times faster than single-process vacuum in a case. The test
used not-large table (4GB table) with many indexes but it would be
insteresting to test with large table.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <> wrote:
>
> Attached rebased version patch to the current HEAD.
>
> > Please apply this patch with the extension lock patch[1] when testing
> > as this patch can try to extend visibility map pages concurrently.
> >
>
> Because the patch leads performance degradation in the case where
> bulk-loading to a partitioned table I think that the original
> proposal, which makes group locking conflict when relation extension
> locks, is more realistic approach. So I worked on this with the simple
> patch instead of [1]. Attached three patches:
>
> * 0001 patch publishes some static functions such as
> heap_paralellscan_startblock_init so that the parallel vacuum code can
> use them.
> * 0002 patch makes the group locking conflict when relation extension locks.
> * 0003 patch add paralel option to lazy vacuum.
>
> Please review them.
>

I could see that you have put a lot of effort on this patch and still
we are not able to make much progress mainly I guess because of
relation extension lock problem.  I think we can park that problem for
some time (as already we have invested quite some time on it), discuss
a bit about actual parallel vacuum patch and then come back to it.  I
don't know if that is right or not.  I am not sure we can make this
ready for PG12 timeframe, but I feel this patch deserves some
attention.  I have started reading the main parallel vacuum patch and
below are some assorted comments.

+     <para>
+      Execute <command>VACUUM</command> in parallel by <replaceable
class="parameter">N
+      </replaceable>a background workers. Collecting garbage on table
is processed
+      in block-level parallel. For tables with indexes, parallel
vacuum assigns each
+      index to each parallel vacuum worker and all garbages on a
index are processed
+      by particular parallel vacuum worker. The maximum nunber of
parallel workers
+      is <xref linkend="guc-max-parallel-workers-maintenance"/>. This
option can not
+      use with <literal>FULL</literal> option.
+     </para>

There are a couple of mistakes in above para:
(a) "..a background workers." a seems redundant.
(b) "Collecting garbage on table is processed in block-level
parallel."/"Collecting garbage on table is processed at block-level in
parallel."
(c) "For tables with indexes, parallel vacuum assigns each index to
each parallel vacuum worker and all garbages on a index are processed
by particular parallel vacuum worker."
We can rephrase it as:
"For tables with indexes, parallel vacuum assigns a worker to each
index and all garbages on a index are processed by particular that
parallel vacuum worker."
(d) Typo: nunber/number
(e) Typo: can not/cannot

I have glanced part of the patch, but didn't find any README or doc
containing the design of this patch. I think without having design in
place, it is difficult to review a patch of this size and complexity.
To start with at least explain how the work is distributed among
workers, say there are two workers which needs to vacuum a table with
four indexes, how it works?  How does the leader participate and
coordinate the work.  The other parts that you can explain how the
state is maintained during parallel vacuum, something like you are
trying to do in below function:

+ * lazy_prepare_next_state
+ *
+ * Before enter the next state prepare the next state. In parallel lazy vacuum,
+ * we must wait for the all vacuum workers to finish the previous state before
+ * preparation. Also, after prepared we change the state ot all vacuum workers
+ * and wake up them.
+ */
+static void
+lazy_prepare_next_state(LVState *lvstate, LVLeader *lvleader, int next_state)

Still other things are how the stats are shared among leader and
worker.  I can understand few things in bits and pieces while glancing
through the patch, but it would be easier to understand if you
document it at one place.  It can help reviewers to understand it.

Can you consider to split the patch so that the refactoring you have
done in current code to make it usable by parallel vacuum is a
separate patch?

+/*
+ * Vacuum all indexes. In parallel vacuum, each workers take indexes
+ * one by one. Also after vacuumed index they mark it as done. This marking
+ * is necessary to guarantee that all indexes are vacuumed based on
+ * the current collected dead tuples. The leader process continues to
+ * vacuum even if any indexes is not vacuumed completely due to failure of
+ * parallel worker for whatever reason. The mark will be checked
before entering
+ * the next state.
+ */
+void
+lazy_vacuum_all_indexes(LVState *lvstate)

I didn't understand what you want to say here.  Do you mean that
leader can continue collecting more dead tuple TIDs when workers are
vacuuming the index?  How does it deal with the errors if any during
index vacuum?

+ * plan_lazy_vacuum_workers_index_workers
+ * Use the planner to decide how many parallel worker processes
+ * VACUUM and autovacuum should request for use
+ *
+ * tableOid is the table begin vacuumed which must not be non-tables or
+ * special system tables.
..
+ plan_lazy_vacuum_workers(Oid tableOid, int nworkers_requested)

The comment starting from tableOid is not clear.  The actual function
name(plan_lazy_vacuum_workers) and name in comments
(plan_lazy_vacuum_workers_index_workers) doesn't match.  Can you take
relation as input parameter instead of taking tableOid as that can
save a lot of code in this function.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <> wrote:
> On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <> wrote:
> >
>
> I could see that you have put a lot of effort on this patch and still
> we are not able to make much progress mainly I guess because of
> relation extension lock problem.  I think we can park that problem for
> some time (as already we have invested quite some time on it), discuss
> a bit about actual parallel vacuum patch and then come back to it.
>

Today, I was reading this and previous related thread [1] and it seems
to me multiple people Andres [2], Simon [3] have pointed out that
parallelization for index portion is more valuable.  Also, some of the
results [4] indicate the same.  Now, when there are no indexes,
parallelizing heap scans also have benefit, but I think in practice we
will see more cases where the user wants to vacuum tables with
indexes.  So how about if we break this problem in the following way
where each piece give the benefit of its own:
(a) Parallelize index scans wherein the workers will be launched only
to vacuum indexes.  Only one worker per index will be spawned.
(b) Parallelize per-index vacuum.  Each index can be vacuumed by
multiple workers.
(c) Parallelize heap scans where multiple workers will scan the heap,
collect dead TIDs and then launch multiple workers for indexes.

I think if we break this problem into multiple patches, it will reduce
the scope of each patch and help us in making progress.   Now, it's
been more than 2 years that we are trying to solve this problem, but
still didn't make much progress.  I understand there are various
genuine reasons and all of that work will help us in solving all the
problems in this area.  How about if we first target problem (a) and
once we are done with that we can see which of (b) or (c) we want to
do first?


[1] - https://www.postgresql.org/message-id/CAD21AoD1xAqp4zK-Vi1cuY3feq2oO8HcpJiz32UDUfe0BE31Xw%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/20160823164836.naody2ht6cutioiz%40alap3.anarazel.de
[3] - https://www.postgresql.org/message-id/CANP8%2BjKWOw6AAorFOjdynxUKqs6XRReOcNy-VXRFFU_4bBT8ww%40mail.gmail.com
[4] - https://www.postgresql.org/message-id/CAGTBQpbU3R_VgyWk6jaD%3D6v-Wwrm8%2B6CbrzQxQocH0fmedWRkw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Sun, Nov 25, 2018 at 2:35 PM Amit Kapila <> wrote:
>
> On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <> wrote:
> > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <> wrote:
> > >
> >

Thank you for the comment.

> > I could see that you have put a lot of effort on this patch and still
> > we are not able to make much progress mainly I guess because of
> > relation extension lock problem.  I think we can park that problem for
> > some time (as already we have invested quite some time on it), discuss
> > a bit about actual parallel vacuum patch and then come back to it.
> >
>
> Today, I was reading this and previous related thread [1] and it seems
> to me multiple people Andres [2], Simon [3] have pointed out that
> parallelization for index portion is more valuable.  Also, some of the
> results [4] indicate the same.  Now, when there are no indexes,
> parallelizing heap scans also have benefit, but I think in practice we
> will see more cases where the user wants to vacuum tables with
> indexes.  So how about if we break this problem in the following way
> where each piece give the benefit of its own:
> (a) Parallelize index scans wherein the workers will be launched only
> to vacuum indexes.  Only one worker per index will be spawned.
> (b) Parallelize per-index vacuum.  Each index can be vacuumed by
> multiple workers.
> (c) Parallelize heap scans where multiple workers will scan the heap,
> collect dead TIDs and then launch multiple workers for indexes.
>
> I think if we break this problem into multiple patches, it will reduce
> the scope of each patch and help us in making progress.   Now, it's
> been more than 2 years that we are trying to solve this problem, but
> still didn't make much progress.  I understand there are various
> genuine reasons and all of that work will help us in solving all the
> problems in this area.  How about if we first target problem (a) and
> once we are done with that we can see which of (b) or (c) we want to
> do first?

Thank you for suggestion. It seems good to me. We would get a nice
performance scalability even by only (a), and vacuum will get more
powerful by (b) or (c). Also, (a) would not require to resovle the
relation extension lock issue IIUC. I'll change the patch and submit
to the next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Mon, Nov 26, 2018 at 2:08 PM Masahiko Sawada <> wrote:
>
> On Sun, Nov 25, 2018 at 2:35 PM Amit Kapila <> wrote:
> >
> > On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <> wrote:
> > > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <> wrote:
> > > >
> > >
>
> Thank you for the comment.
>
> > > I could see that you have put a lot of effort on this patch and still
> > > we are not able to make much progress mainly I guess because of
> > > relation extension lock problem.  I think we can park that problem for
> > > some time (as already we have invested quite some time on it), discuss
> > > a bit about actual parallel vacuum patch and then come back to it.
> > >
> >
> > Today, I was reading this and previous related thread [1] and it seems
> > to me multiple people Andres [2], Simon [3] have pointed out that
> > parallelization for index portion is more valuable.  Also, some of the
> > results [4] indicate the same.  Now, when there are no indexes,
> > parallelizing heap scans also have benefit, but I think in practice we
> > will see more cases where the user wants to vacuum tables with
> > indexes.  So how about if we break this problem in the following way
> > where each piece give the benefit of its own:
> > (a) Parallelize index scans wherein the workers will be launched only
> > to vacuum indexes.  Only one worker per index will be spawned.
> > (b) Parallelize per-index vacuum.  Each index can be vacuumed by
> > multiple workers.
> > (c) Parallelize heap scans where multiple workers will scan the heap,
> > collect dead TIDs and then launch multiple workers for indexes.
> >
> > I think if we break this problem into multiple patches, it will reduce
> > the scope of each patch and help us in making progress.   Now, it's
> > been more than 2 years that we are trying to solve this problem, but
> > still didn't make much progress.  I understand there are various
> > genuine reasons and all of that work will help us in solving all the
> > problems in this area.  How about if we first target problem (a) and
> > once we are done with that we can see which of (b) or (c) we want to
> > do first?
>
> Thank you for suggestion. It seems good to me. We would get a nice
> performance scalability even by only (a), and vacuum will get more
> powerful by (b) or (c). Also, (a) would not require to resovle the
> relation extension lock issue IIUC.
>

Yes, I also think so.  We do acquire 'relation extension lock' during
index vacuum, but as part of (a), we are talking one worker per-index,
so there shouldn't be a problem with respect to deadlocks.

> I'll change the patch and submit
> to the next CF.
>

Okay.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Nov 27, 2018 at 11:26 AM Amit Kapila <> wrote:
>
> On Mon, Nov 26, 2018 at 2:08 PM Masahiko Sawada <> wrote:
> >
> > On Sun, Nov 25, 2018 at 2:35 PM Amit Kapila <> wrote:
> > >
> > > On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <> wrote:
> > > > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <> wrote:
> > > > >
> > > >
> >
> > Thank you for the comment.
> >
> > > > I could see that you have put a lot of effort on this patch and still
> > > > we are not able to make much progress mainly I guess because of
> > > > relation extension lock problem.  I think we can park that problem for
> > > > some time (as already we have invested quite some time on it), discuss
> > > > a bit about actual parallel vacuum patch and then come back to it.
> > > >
> > >
> > > Today, I was reading this and previous related thread [1] and it seems
> > > to me multiple people Andres [2], Simon [3] have pointed out that
> > > parallelization for index portion is more valuable.  Also, some of the
> > > results [4] indicate the same.  Now, when there are no indexes,
> > > parallelizing heap scans also have benefit, but I think in practice we
> > > will see more cases where the user wants to vacuum tables with
> > > indexes.  So how about if we break this problem in the following way
> > > where each piece give the benefit of its own:
> > > (a) Parallelize index scans wherein the workers will be launched only
> > > to vacuum indexes.  Only one worker per index will be spawned.
> > > (b) Parallelize per-index vacuum.  Each index can be vacuumed by
> > > multiple workers.
> > > (c) Parallelize heap scans where multiple workers will scan the heap,
> > > collect dead TIDs and then launch multiple workers for indexes.
> > >
> > > I think if we break this problem into multiple patches, it will reduce
> > > the scope of each patch and help us in making progress.   Now, it's
> > > been more than 2 years that we are trying to solve this problem, but
> > > still didn't make much progress.  I understand there are various
> > > genuine reasons and all of that work will help us in solving all the
> > > problems in this area.  How about if we first target problem (a) and
> > > once we are done with that we can see which of (b) or (c) we want to
> > > do first?
> >
> > Thank you for suggestion. It seems good to me. We would get a nice
> > performance scalability even by only (a), and vacuum will get more
> > powerful by (b) or (c). Also, (a) would not require to resovle the
> > relation extension lock issue IIUC.
> >
>
> Yes, I also think so.  We do acquire 'relation extension lock' during
> index vacuum, but as part of (a), we are talking one worker per-index,
> so there shouldn't be a problem with respect to deadlocks.
>
> > I'll change the patch and submit
> > to the next CF.
> >
>
> Okay.
>

Attached the updated patches. I scaled back the scope of this patch.
The patch now includes only feature (a), that is it execute both index
vacuum and cleanup index in parallel. It also doesn't include
autovacuum support for now.

The PARALLEL option works alomst same as before patch. In VACUUM
command, we can specify 'PARALLEL n' option where n is the number of
parallel workers to request. If the n is omitted the number of
parallel worekrs would be # of indexes -1. Also we can specify
parallel degree by parallel_worker reloption. The number or parallel
workers is capped by Min(# of indexes - 1,
max_maintenance_parallel_workers). That is, parallel vacuum can be
executed for a table if it has more than one indexes.

For internal design, the details are written at the top of comment in
vacuumlazy.c file. In parallel vacuum mode, we allocate DSM at the
beginning of lazy vacuum which stores shared information as well as
dead tuples. When starting either index vacuum or cleanup vacuum we
launch parallel workers. The parallel workers perform either index
vacuum or clenaup vacuum for each indexes, and then exit after done
all indexes. Then the leader process re-initialize DSM and re-launch
at the next time, not destroy parallel context here. After done lazy
vacuum, the leader process exits the parallel mode and updates index
statistics since we are not allowed any writes during parallel mode.

Also I've attached 0002 patch to support parallel lazy vacuum for
vacuumdb command.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Tue, Dec 18, 2018 at 1:29 PM Masahiko Sawada <> wrote:
>
> Attached the updated patches. I scaled back the scope of this patch.
> The patch now includes only feature (a), that is it execute both index
> vacuum and cleanup index in parallel. It also doesn't include
> autovacuum support for now.
>
> The PARALLEL option works alomst same as before patch. In VACUUM
> command, we can specify 'PARALLEL n' option where n is the number of
> parallel workers to request. If the n is omitted the number of
> parallel worekrs would be # of indexes -1.
>

I think for now this is okay, but I guess in furture when we make
heapscans also parallel or maybe allow more than one worker per-index
vacuum, then this won't hold good. So, I am not sure if below text in
docs is most appropriate.

+    <term><literal>PARALLEL <replaceable
class="parameter">N</replaceable></literal></term>
+    <listitem>
+     <para>
+      Execute index vacuum and cleanup index in parallel with
+      <replaceable class="parameter">N</replaceable> background
workers. If the parallel
+      degree <replaceable class="parameter">N</replaceable> is omitted,
+      <command>VACUUM</command> requests the number of indexes - 1
processes, which is the
+      maximum number of parallel vacuum workers since individual
indexes is processed by
+      one process. The actual number of parallel vacuum workers may
be less due to the
+      setting of <xref linkend="guc-max-parallel-workers-maintenance"/>.
+      This option can not use with  <literal>FULL</literal> option.

It might be better to use some generic language in docs, something
like "If the parallel degree N is omitted, then vacuum decides the
number of workers based on number of indexes on the relation which is
further limited by max-parallel-workers-maintenance".   I think you
also need to mention in some way that you consider storage option
parallel_workers.

Few assorted comments:
1.
+lazy_begin_parallel_vacuum_index(LVState *lvstate, bool for_cleanup)
{
..
+
+ LaunchParallelWorkers(lvstate->pcxt);
+
+ /*
+ * if no workers launched, we vacuum all indexes by the leader process
+ * alone. Since there is hope that we can launch workers in the next
+ * execution time we don't want to end the parallel mode yet.
+ */
+ if (lvstate->pcxt->nworkers_launched == 0)
+ return;

It is quite possible that the workers are not launched because we fail
to allocate memory, basically when pcxt->nworkers is zero.  I think in
such cases there is no use for being in parallel mode.  You can even
detect that before calling lazy_begin_parallel_vacuum_index.

2.
static void
+lazy_vacuum_all_indexes_for_leader(LVState *lvstate,
IndexBulkDeleteResult **stats,
+    LVTidMap *dead_tuples, bool do_parallel,
+    bool for_cleanup)
{
..
+ if (do_parallel)
+ lazy_begin_parallel_vacuum_index(lvstate, for_cleanup);
+
+ for (;;)
+ {
+ IndexBulkDeleteResult *r = NULL;
+
+ /*
+ * Get the next index number to vacuum and set index statistics. In parallel
+ * lazy vacuum, index bulk-deletion results are stored in the shared memory
+ * segment. If it's already updated we use it rather than setting it to NULL.
+ * In single vacuum, we can always use an element of the 'stats'.
+ */
+ if (do_parallel)
+ {
+ idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
+
+ if (lvshared->indstats[idx].updated)
+ r = &(lvshared->indstats[idx].stats);
+ }

It is quite possible that we are not able to launch any workers in
lazy_begin_parallel_vacuum_index, in such cases, we should not use
parallel mode path, basically as written we can't rely on
'do_parallel' flag.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Thu, Dec 20, 2018 at 3:38 PM Amit Kapila <> wrote:
>
> On Tue, Dec 18, 2018 at 1:29 PM Masahiko Sawada <> wrote:
> >
> > Attached the updated patches. I scaled back the scope of this patch.
> > The patch now includes only feature (a), that is it execute both index
> > vacuum and cleanup index in parallel. It also doesn't include
> > autovacuum support for now.
> >
> > The PARALLEL option works alomst same as before patch. In VACUUM
> > command, we can specify 'PARALLEL n' option where n is the number of
> > parallel workers to request. If the n is omitted the number of
> > parallel worekrs would be # of indexes -1.
> >
>
> I think for now this is okay, but I guess in furture when we make
> heapscans also parallel or maybe allow more than one worker per-index
> vacuum, then this won't hold good. So, I am not sure if below text in
> docs is most appropriate.
>
> +    <term><literal>PARALLEL <replaceable
> class="parameter">N</replaceable></literal></term>
> +    <listitem>
> +     <para>
> +      Execute index vacuum and cleanup index in parallel with
> +      <replaceable class="parameter">N</replaceable> background
> workers. If the parallel
> +      degree <replaceable class="parameter">N</replaceable> is omitted,
> +      <command>VACUUM</command> requests the number of indexes - 1
> processes, which is the
> +      maximum number of parallel vacuum workers since individual
> indexes is processed by
> +      one process. The actual number of parallel vacuum workers may
> be less due to the
> +      setting of <xref linkend="guc-max-parallel-workers-maintenance"/>.
> +      This option can not use with  <literal>FULL</literal> option.
>
> It might be better to use some generic language in docs, something
> like "If the parallel degree N is omitted, then vacuum decides the
> number of workers based on number of indexes on the relation which is
> further limited by max-parallel-workers-maintenance".

Thank you for the review.

I agreed your concern and the text you suggested.

>  I think you
> also need to mention in some way that you consider storage option
> parallel_workers.

Added.

>
> Few assorted comments:
> 1.
> +lazy_begin_parallel_vacuum_index(LVState *lvstate, bool for_cleanup)
> {
> ..
> +
> + LaunchParallelWorkers(lvstate->pcxt);
> +
> + /*
> + * if no workers launched, we vacuum all indexes by the leader process
> + * alone. Since there is hope that we can launch workers in the next
> + * execution time we don't want to end the parallel mode yet.
> + */
> + if (lvstate->pcxt->nworkers_launched == 0)
> + return;
>
> It is quite possible that the workers are not launched because we fail
> to allocate memory, basically when pcxt->nworkers is zero.  I think in
> such cases there is no use for being in parallel mode.  You can even
> detect that before calling lazy_begin_parallel_vacuum_index.

Agreed. we can stop preparation and exit parallel mode if
pcxt->nworkers got 0 after InitializeParallelDSM() .

>
> 2.
> static void
> +lazy_vacuum_all_indexes_for_leader(LVState *lvstate,
> IndexBulkDeleteResult **stats,
> +    LVTidMap *dead_tuples, bool do_parallel,
> +    bool for_cleanup)
> {
> ..
> + if (do_parallel)
> + lazy_begin_parallel_vacuum_index(lvstate, for_cleanup);
> +
> + for (;;)
> + {
> + IndexBulkDeleteResult *r = NULL;
> +
> + /*
> + * Get the next index number to vacuum and set index statistics. In parallel
> + * lazy vacuum, index bulk-deletion results are stored in the shared memory
> + * segment. If it's already updated we use it rather than setting it to NULL.
> + * In single vacuum, we can always use an element of the 'stats'.
> + */
> + if (do_parallel)
> + {
> + idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
> +
> + if (lvshared->indstats[idx].updated)
> + r = &(lvshared->indstats[idx].stats);
> + }
>
> It is quite possible that we are not able to launch any workers in
> lazy_begin_parallel_vacuum_index, in such cases, we should not use
> parallel mode path, basically as written we can't rely on
> 'do_parallel' flag.
>

Fixed.

Attached new version patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Dec 28, 2018 at 11:43 AM Masahiko Sawada <> wrote:
>
> On Thu, Dec 20, 2018 at 3:38 PM Amit Kapila <> wrote:
> >
> > On Tue, Dec 18, 2018 at 1:29 PM Masahiko Sawada <> wrote:
> > >
> > > Attached the updated patches. I scaled back the scope of this patch.
> > > The patch now includes only feature (a), that is it execute both index
> > > vacuum and cleanup index in parallel. It also doesn't include
> > > autovacuum support for now.
> > >
> > > The PARALLEL option works alomst same as before patch. In VACUUM
> > > command, we can specify 'PARALLEL n' option where n is the number of
> > > parallel workers to request. If the n is omitted the number of
> > > parallel worekrs would be # of indexes -1.
> > >
> >
> > I think for now this is okay, but I guess in furture when we make
> > heapscans also parallel or maybe allow more than one worker per-index
> > vacuum, then this won't hold good. So, I am not sure if below text in
> > docs is most appropriate.
> >
> > +    <term><literal>PARALLEL <replaceable
> > class="parameter">N</replaceable></literal></term>
> > +    <listitem>
> > +     <para>
> > +      Execute index vacuum and cleanup index in parallel with
> > +      <replaceable class="parameter">N</replaceable> background
> > workers. If the parallel
> > +      degree <replaceable class="parameter">N</replaceable> is omitted,
> > +      <command>VACUUM</command> requests the number of indexes - 1
> > processes, which is the
> > +      maximum number of parallel vacuum workers since individual
> > indexes is processed by
> > +      one process. The actual number of parallel vacuum workers may
> > be less due to the
> > +      setting of <xref linkend="guc-max-parallel-workers-maintenance"/>.
> > +      This option can not use with  <literal>FULL</literal> option.
> >
> > It might be better to use some generic language in docs, something
> > like "If the parallel degree N is omitted, then vacuum decides the
> > number of workers based on number of indexes on the relation which is
> > further limited by max-parallel-workers-maintenance".
>
> Thank you for the review.
>
> I agreed your concern and the text you suggested.
>
> >  I think you
> > also need to mention in some way that you consider storage option
> > parallel_workers.
>
> Added.
>
> >
> > Few assorted comments:
> > 1.
> > +lazy_begin_parallel_vacuum_index(LVState *lvstate, bool for_cleanup)
> > {
> > ..
> > +
> > + LaunchParallelWorkers(lvstate->pcxt);
> > +
> > + /*
> > + * if no workers launched, we vacuum all indexes by the leader process
> > + * alone. Since there is hope that we can launch workers in the next
> > + * execution time we don't want to end the parallel mode yet.
> > + */
> > + if (lvstate->pcxt->nworkers_launched == 0)
> > + return;
> >
> > It is quite possible that the workers are not launched because we fail
> > to allocate memory, basically when pcxt->nworkers is zero.  I think in
> > such cases there is no use for being in parallel mode.  You can even
> > detect that before calling lazy_begin_parallel_vacuum_index.
>
> Agreed. we can stop preparation and exit parallel mode if
> pcxt->nworkers got 0 after InitializeParallelDSM() .
>
> >
> > 2.
> > static void
> > +lazy_vacuum_all_indexes_for_leader(LVState *lvstate,
> > IndexBulkDeleteResult **stats,
> > +    LVTidMap *dead_tuples, bool do_parallel,
> > +    bool for_cleanup)
> > {
> > ..
> > + if (do_parallel)
> > + lazy_begin_parallel_vacuum_index(lvstate, for_cleanup);
> > +
> > + for (;;)
> > + {
> > + IndexBulkDeleteResult *r = NULL;
> > +
> > + /*
> > + * Get the next index number to vacuum and set index statistics. In parallel
> > + * lazy vacuum, index bulk-deletion results are stored in the shared memory
> > + * segment. If it's already updated we use it rather than setting it to NULL.
> > + * In single vacuum, we can always use an element of the 'stats'.
> > + */
> > + if (do_parallel)
> > + {
> > + idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
> > +
> > + if (lvshared->indstats[idx].updated)
> > + r = &(lvshared->indstats[idx].stats);
> > + }
> >
> > It is quite possible that we are not able to launch any workers in
> > lazy_begin_parallel_vacuum_index, in such cases, we should not use
> > parallel mode path, basically as written we can't rely on
> > 'do_parallel' flag.
> >
>
> Fixed.
>
> Attached new version patch.
>

Rebased.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Haribabu Kommi
Дата:

On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada <> wrote:

Rebased.

I started reviewing the patch, I didn't finish my review yet.
Following are some of the comments.

+    <term><literal>PARALLEL <replaceable class="parameter">N</replaceable></literal></term>
+    <listitem>
+     <para>
+      Execute index vacuum and cleanup index in parallel with

I doubt that user can understand the terms index vacuum and cleanup index.
May be it needs some more detailed information.


- VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7 /* don't skip any pages */
+ VACOPT_PARALLEL = 1 << 7, /* do lazy VACUUM in parallel */
+ VACOPT_DISABLE_PAGE_SKIPPING = 1 << 8 /* don't skip any pages */
+} VacuumOptionFlag;

Any specific reason behind not adding it as last member of the enum?


-typedef enum VacuumOption
+typedef enum VacuumOptionFlag
 {

I don't find the new name quite good, how about VacuumFlags?


+typedef struct VacuumOption
+{

How about VacuumOptions? Because this structure can contains all the
options provided to vacuum operation. 



+ vacopt1->flags |= vacopt2->flags;
+ if (vacopt2->flags == VACOPT_PARALLEL)
+ vacopt1->nworkers = vacopt2->nworkers;
+ pfree(vacopt2);
+ $$ = vacopt1;
+ }

As the above statement indicates the the last parallel number of workers 
is considered into the account, can we explain it in docs?


postgres=# vacuum (parallel 2, verbose) tbl;

With verbose, no parallel workers related information is available.
I feel giving that information is required even when it is not parallel
vacuum also.


Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Jan 18, 2019 at 10:38 AM Haribabu Kommi
<> wrote:
>
>
> On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada <> wrote:
>>
>>
>> Rebased.
>
>
> I started reviewing the patch, I didn't finish my review yet.
> Following are some of the comments.

Thank you for reviewing the patch.

>
> +    <term><literal>PARALLEL <replaceable class="parameter">N</replaceable></literal></term>
> +    <listitem>
> +     <para>
> +      Execute index vacuum and cleanup index in parallel with
>
> I doubt that user can understand the terms index vacuum and cleanup index.
> May be it needs some more detailed information.
>

Agreed. Table 27.22 "Vacuum phases" has a good description of vacuum
phases. So maybe adding the referencint to it would work.

>
> - VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7 /* don't skip any pages */
> + VACOPT_PARALLEL = 1 << 7, /* do lazy VACUUM in parallel */
> + VACOPT_DISABLE_PAGE_SKIPPING = 1 << 8 /* don't skip any pages */
> +} VacuumOptionFlag;
>
> Any specific reason behind not adding it as last member of the enum?
>

My mistake, fixed it.

>
> -typedef enum VacuumOption
> +typedef enum VacuumOptionFlag
>  {
>
> I don't find the new name quite good, how about VacuumFlags?
>

Agreed with removing "Option" from the name but I think VacuumFlag
would be better because this enum represents only one flag. Thoughts?

>
> +typedef struct VacuumOption
> +{
>
> How about VacuumOptions? Because this structure can contains all the
> options provided to vacuum operation.
>

Agreed.

>
>
> + vacopt1->flags |= vacopt2->flags;
> + if (vacopt2->flags == VACOPT_PARALLEL)
> + vacopt1->nworkers = vacopt2->nworkers;
> + pfree(vacopt2);
> + $$ = vacopt1;
> + }
>
> As the above statement indicates the the last parallel number of workers
> is considered into the account, can we explain it in docs?
>

Agreed.

>
> postgres=# vacuum (parallel 2, verbose) tbl;
>
> With verbose, no parallel workers related information is available.
> I feel giving that information is required even when it is not parallel
> vacuum also.
>

Agreed. How about the folloiwng verbose output? I've added the number
of launched, planned and requested vacuum workers and purpose (vacuum
or cleanup).

postgres(1:91536)=# vacuum (verbose, parallel 30) test; -- table
'test' has 3 indexes
INFO:  vacuuming "public.test"
INFO:  launched 2 parallel vacuum workers for index vacuum (planned:
2, requested: 30)
INFO:  scanned index "test_idx1" to remove 2000 row versions
DETAIL:  CPU: user: 0.12 s, system: 0.00 s, elapsed: 0.12 s
INFO:  scanned index "test_idx2" to remove 2000 row versions by
parallel vacuum worker
DETAIL:  CPU: user: 0.07 s, system: 0.05 s, elapsed: 0.12 s
INFO:  scanned index "test_idx3" to remove 2000 row versions by
parallel vacuum worker
DETAIL:  CPU: user: 0.09 s, system: 0.05 s, elapsed: 0.14 s
INFO:  "test": removed 2000 row versions in 10 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  launched 2 parallel vacuum workers for index cleanup (planned:
2, requested: 30)
INFO:  index "test_idx1" now contains 991151 row versions in 2745 pages
DETAIL:  2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "test_idx2" now contains 991151 row versions in 2745 pages
DETAIL:  2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "test_idx3" now contains 991151 row versions in 2745 pages
DETAIL:  2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "test": found 2000 removable, 367 nonremovable row versions in
41 out of 4425 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 500
There were 6849 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.12 s, system: 0.01 s, elapsed: 0.17 s.
VACUUM

Since the previous patch conflicts with 285d8e12 I've attached the
latest version patch that incorporated the review comment I got.




Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Haribabu Kommi
Дата:

On Fri, Jan 18, 2019 at 11:42 PM Masahiko Sawada <> wrote:
On Fri, Jan 18, 2019 at 10:38 AM Haribabu Kommi
<> wrote:
>
>
> On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada <> wrote:
>>
>>
>> Rebased.
>
>
> I started reviewing the patch, I didn't finish my review yet.
> Following are some of the comments.

Thank you for reviewing the patch.

>
> +    <term><literal>PARALLEL <replaceable class="parameter">N</replaceable></literal></term>
> +    <listitem>
> +     <para>
> +      Execute index vacuum and cleanup index in parallel with
>
> I doubt that user can understand the terms index vacuum and cleanup index.
> May be it needs some more detailed information.
>

Agreed. Table 27.22 "Vacuum phases" has a good description of vacuum
phases. So maybe adding the referencint to it would work.

OK.
 
>
> -typedef enum VacuumOption
> +typedef enum VacuumOptionFlag
>  {
>
> I don't find the new name quite good, how about VacuumFlags?
>

Agreed with removing "Option" from the name but I think VacuumFlag
would be better because this enum represents only one flag. Thoughts?

OK.
 

> postgres=# vacuum (parallel 2, verbose) tbl;
>
> With verbose, no parallel workers related information is available.
> I feel giving that information is required even when it is not parallel
> vacuum also.
>

Agreed. How about the folloiwng verbose output? I've added the number
of launched, planned and requested vacuum workers and purpose (vacuum
or cleanup).

postgres(1:91536)=# vacuum (verbose, parallel 30) test; -- table
'test' has 3 indexes
INFO:  vacuuming "public.test"
INFO:  launched 2 parallel vacuum workers for index vacuum (planned:
2, requested: 30)
INFO:  scanned index "test_idx1" to remove 2000 row versions
DETAIL:  CPU: user: 0.12 s, system: 0.00 s, elapsed: 0.12 s
INFO:  scanned index "test_idx2" to remove 2000 row versions by
parallel vacuum worker
DETAIL:  CPU: user: 0.07 s, system: 0.05 s, elapsed: 0.12 s
INFO:  scanned index "test_idx3" to remove 2000 row versions by
parallel vacuum worker
DETAIL:  CPU: user: 0.09 s, system: 0.05 s, elapsed: 0.14 s
INFO:  "test": removed 2000 row versions in 10 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  launched 2 parallel vacuum workers for index cleanup (planned:
2, requested: 30)
INFO:  index "test_idx1" now contains 991151 row versions in 2745 pages
DETAIL:  2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "test_idx2" now contains 991151 row versions in 2745 pages
DETAIL:  2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "test_idx3" now contains 991151 row versions in 2745 pages
DETAIL:  2000 index row versions were removed.
24 index pages have been deleted, 18 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "test": found 2000 removable, 367 nonremovable row versions in
41 out of 4425 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 500
There were 6849 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.12 s, system: 0.01 s, elapsed: 0.17 s.
VACUUM
 
The verbose output is good.

Since the previous patch conflicts with 285d8e12 I've attached the
latest version patch that incorporated the review comment I got.

Thanks for the latest patch. I have some more minor comments.

+      Execute index vacuum and cleanup index in parallel with

Better to use vacuum index and cleanup index? This is in same with
the description of vacuum phases. It is better to follow same notation
in the patch.


+ dead_tuples = lazy_space_alloc(lvstate, nblocks, parallel_workers);

With the change, the lazy_space_alloc takes care of initializing the
parallel vacuum, can we write something related to that in the comments.


+ initprog_val[2] = dead_tuples->max_dead_tuples;

dead_tuples variable may need rename for better reading?



+ if (lvshared->indstats[idx].updated)
+ result = &(lvshared->indstats[idx].stats);
+ else
+ copy_result = true;


I don't see a need for copy_result variable, how about directly using
the updated flag to decide whether to copy or not? Once the result is
copied update the flag.


+use Test::More tests => 34;

I don't find any new tetst are added in this patch.

I am thinking of performance penalty if we use the parallel option of
vacuum on a small sized table? 

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Jan 22, 2019 at 9:59 PM Haribabu Kommi <> wrote:
>
>
> Thanks for the latest patch. I have some more minor comments.

Thank you for reviewing the patch.

>
> +      Execute index vacuum and cleanup index in parallel with
>
> Better to use vacuum index and cleanup index? This is in same with
> the description of vacuum phases. It is better to follow same notation
> in the patch.

Agreed. I've changed it to "Vacuum index and cleanup index in parallel
with ...".

>
>
> + dead_tuples = lazy_space_alloc(lvstate, nblocks, parallel_workers);
>
> With the change, the lazy_space_alloc takes care of initializing the
> parallel vacuum, can we write something related to that in the comments.
>

Agreed.

>
> + initprog_val[2] = dead_tuples->max_dead_tuples;
>
> dead_tuples variable may need rename for better reading?
>

I might not get your comment correctly but I've tried to fix it.
Please review it.

>
>
> + if (lvshared->indstats[idx].updated)
> + result = &(lvshared->indstats[idx].stats);
> + else
> + copy_result = true;
>
>
> I don't see a need for copy_result variable, how about directly using
> the updated flag to decide whether to copy or not? Once the result is
> copied update the flag.
>

You're right. Fixed.

>
> +use Test::More tests => 34;
>
> I don't find any new tetst are added in this patch.

Fixed.

>
> I am thinking of performance penalty if we use the parallel option of
> vacuum on a small sized table?

Hm, unlike other parallel operations other than ParallelAppend the
parallel vacuum executes multiple index vacuum simultaneously.
Therefore this can avoid contension. I think there is a performance
penalty but it would not be big.

Attached the latest patches.




Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Haribabu Kommi
Дата:

On Thu, Jan 24, 2019 at 1:16 PM Masahiko Sawada <> wrote:

Attached the latest patches.

Thanks for the updated patches.
Some more code review comments.

+         started by a single utility command.  Currently, the parallel
+         utility commands that support the use of parallel workers are
+         <command>CREATE INDEX</command> and <command>VACUUM</command>
+         without <literal>FULL</literal> option, and only when building
+         a B-tree index.  Parallel workers are taken from the pool of


I feel the above sentence may not give the proper picture, how about the 
adding following modification?

<command>CREATE INDEX</command> only when building a B-tree index 
and <command>VACUUM</command> without <literal>FULL</literal> option.



+ * parallel vacuum, we perform both index vacuum and index cleanup in parallel.
+ * Individual indexes is processed by one vacuum process. At beginning of

How about vacuum index and cleanup index similar like other places?


+ * memory space for dead tuples. When starting either index vacuum or cleanup
+ * vacuum, we launch parallel worker processes. Once all indexes are processed

same here as well?


+ * Before starting parallel index vacuum and parallel cleanup index we launch
+ * parallel workers. All parallel workers will exit after processed all indexes

parallel vacuum index and parallel cleanup index?


+ /*
+ * If there is already-updated result in the shared memory we
+ * use it. Otherwise we pass NULL to index AMs and copy the
+ * result to the shared memory segment.
+ */
+ if (lvshared->indstats[idx].updated)
+ result = &(lvshared->indstats[idx].stats);

I didn't really find a need of the flag to differentiate the stats pointer from
first run to second run? I don't see any problem in passing directing the stats
and the same stats are updated in the worker side and leader side. Anyway no two
processes will do the index vacuum at same time. Am I missing something?

Even if this flag is to identify whether the stats are updated or not before
writing them, I don't see a need of it compared to normal vacuum.


+ * Enter the parallel mode, allocate and initialize a DSM segment. Return
+ * the memory space for storing dead tuples or NULL if no workers are prepared.
+ */

+ pcxt = CreateParallelContext("postgres", "heap_parallel_vacuum_main",
+ request, true);

But we are passing as serializable_okay flag as true, means it doesn't return
NULL. Is it expected?


+ initStringInfo(&buf);
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker %s (planned: %d",
+   "launched %d parallel vacuum workers %s (planned: %d",
+   lvstate->pcxt->nworkers_launched),
+ lvstate->pcxt->nworkers_launched,
+ for_cleanup ? "for index cleanup" : "for index vacuum",
+ lvstate->pcxt->nworkers);
+ if (lvstate->options.nworkers > 0)
+ appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);

what is the difference between planned workers and requested workers, aren't both
are same?


- COMPARE_SCALAR_FIELD(options);
- COMPARE_NODE_FIELD(rels);
+ if (a->options.flags != b->options.flags)
+ return false;
+ if (a->options.nworkers != b->options.nworkers)
+ return false;

Options is changed from SCALAR to check, but why the rels check is removed?
The options is changed from int to a structure so using SCALAR may not work
in other function like _copyVacuumStmt and etc?


+typedef struct VacuumOptions
+{
+ VacuumFlag flags; /* OR of VacuumFlag */
+ int nworkers; /* # of parallel vacuum workers */
+} VacuumOptions;


Do we need to add NodeTag for the above structure? Because this structure is
part of VacuumStmt structure.


+        <application>vacuumdb</application> will require background workers,
+        so make sure your <xref linkend="guc-max-parallel-workers-maintenance"/>
+        setting is more than one.

removing vacuumdb and changing it as "This option will ..."? 

I will continue the testing of this patch and share the details. 

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <> wrote:
>
>
> On Thu, Jan 24, 2019 at 1:16 PM Masahiko Sawada <> wrote:
>>
>>
>> Attached the latest patches.
>
>
> Thanks for the updated patches.
> Some more code review comments.
>

Thank you!

> +         started by a single utility command.  Currently, the parallel
> +         utility commands that support the use of parallel workers are
> +         <command>CREATE INDEX</command> and <command>VACUUM</command>
> +         without <literal>FULL</literal> option, and only when building
> +         a B-tree index.  Parallel workers are taken from the pool of
>
>
> I feel the above sentence may not give the proper picture, how about the
> adding following modification?
>
> <command>CREATE INDEX</command> only when building a B-tree index
> and <command>VACUUM</command> without <literal>FULL</literal> option.
>
>

Agreed.

>
> + * parallel vacuum, we perform both index vacuum and index cleanup in parallel.
> + * Individual indexes is processed by one vacuum process. At beginning of
>
> How about vacuum index and cleanup index similar like other places?
>
>
> + * memory space for dead tuples. When starting either index vacuum or cleanup
> + * vacuum, we launch parallel worker processes. Once all indexes are processed
>
> same here as well?
>
>
> + * Before starting parallel index vacuum and parallel cleanup index we launch
> + * parallel workers. All parallel workers will exit after processed all indexes
>
> parallel vacuum index and parallel cleanup index?
>
>

ISTM we're using like "index vacuuming", "index cleanup" and "FSM
vacuming" in vacuumlazy.c so maybe "parallel index vacuuming" and
"parallel index cleanup" would be better?

> + /*
> + * If there is already-updated result in the shared memory we
> + * use it. Otherwise we pass NULL to index AMs and copy the
> + * result to the shared memory segment.
> + */
> + if (lvshared->indstats[idx].updated)
> + result = &(lvshared->indstats[idx].stats);
>
> I didn't really find a need of the flag to differentiate the stats pointer from
> first run to second run? I don't see any problem in passing directing the stats
> and the same stats are updated in the worker side and leader side. Anyway no two
> processes will do the index vacuum at same time. Am I missing something?
>
> Even if this flag is to identify whether the stats are updated or not before
> writing them, I don't see a need of it compared to normal vacuum.
>

The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
first time execution. For example, btvacuumcleanup skips cleanup if
it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
amvacuumcleanup when the first time calling. And they store the result
stats to the memory allocated int the local memory. Therefore in the
parallel vacuum I think that both worker and leader need to move it to
the shared memory and mark it as updated as different worker could
vacuum different indexes at the next time.

>
> + * Enter the parallel mode, allocate and initialize a DSM segment. Return
> + * the memory space for storing dead tuples or NULL if no workers are prepared.
> + */
>
> + pcxt = CreateParallelContext("postgres", "heap_parallel_vacuum_main",
> + request, true);
>
> But we are passing as serializable_okay flag as true, means it doesn't return
> NULL. Is it expected?
>
>

I think you're right. Since the request never be 0 and
serializable_okey is true it should not return NULL. Will fix.

> + initStringInfo(&buf);
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker %s (planned: %d",
> +   "launched %d parallel vacuum workers %s (planned: %d",
> +   lvstate->pcxt->nworkers_launched),
> + lvstate->pcxt->nworkers_launched,
> + for_cleanup ? "for index cleanup" : "for index vacuum",
> + lvstate->pcxt->nworkers);
> + if (lvstate->options.nworkers > 0)
> + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
>
> what is the difference between planned workers and requested workers, aren't both
> are same?

The request is the parallel degree that is specified explicitly by
user whereas the planned is the actual number we planned based on the
number of indexes the table has. For example, if we do like 'VACUUM
(PARALLEL 3000) tbl' where the tbl has 4 indexes, the request is 3000
and the planned is 4. Also if max_parallel_maintenance_workers is 2
the planned is 2.

>
>
> - COMPARE_SCALAR_FIELD(options);
> - COMPARE_NODE_FIELD(rels);
> + if (a->options.flags != b->options.flags)
> + return false;
> + if (a->options.nworkers != b->options.nworkers)
> + return false;
>
> Options is changed from SCALAR to check, but why the rels check is removed?
> The options is changed from int to a structure so using SCALAR may not work
> in other function like _copyVacuumStmt and etc?

Agreed and will fix.

>
> +typedef struct VacuumOptions
> +{
> + VacuumFlag flags; /* OR of VacuumFlag */
> + int nworkers; /* # of parallel vacuum workers */
> +} VacuumOptions;
>
>
> Do we need to add NodeTag for the above structure? Because this structure is
> part of VacuumStmt structure.

Yes, I will add it.

>
>
> +        <application>vacuumdb</application> will require background workers,
> +        so make sure your <xref linkend="guc-max-parallel-workers-maintenance"/>
> +        setting is more than one.
>
> removing vacuumdb and changing it as "This option will ..."?
>
Agreed.

> I will continue the testing of this patch and share the details.
>

Thank you. I'll submit the updated patch set.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Fri, Feb 1, 2019 at 2:49 AM Masahiko Sawada <> wrote:
>
> On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <> wrote:
>
> Thank you. I'll submit the updated patch set.
>

I don't see any chance of getting this committed in the next few days,
so, moved to next CF.   Thanks for working on this and I hope you will
continue work on this project.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Sat, Feb 2, 2019 at 4:06 AM Amit Kapila <> wrote:
>
> On Fri, Feb 1, 2019 at 2:49 AM Masahiko Sawada <> wrote:
> >
> > On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <> wrote:
> >
> > Thank you. I'll submit the updated patch set.
> >
>
> I don't see any chance of getting this committed in the next few days,
> so, moved to next CF.   Thanks for working on this and I hope you will
> continue work on this project.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Thu, Jan 31, 2019 at 10:18 PM Masahiko Sawada <> wrote:
>
> Thank you. I'll submit the updated patch set.
>

Attached the latest patch set.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Haribabu Kommi
Дата:

On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <> wrote:
On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi <> wrote:
>
>
>
>
> + * Before starting parallel index vacuum and parallel cleanup index we launch
> + * parallel workers. All parallel workers will exit after processed all indexes
>
> parallel vacuum index and parallel cleanup index?
>
>

ISTM we're using like "index vacuuming", "index cleanup" and "FSM
vacuming" in vacuumlazy.c so maybe "parallel index vacuuming" and
"parallel index cleanup" would be better?

OK.
 
> + /*
> + * If there is already-updated result in the shared memory we
> + * use it. Otherwise we pass NULL to index AMs and copy the
> + * result to the shared memory segment.
> + */
> + if (lvshared->indstats[idx].updated)
> + result = &(lvshared->indstats[idx].stats);
>
> I didn't really find a need of the flag to differentiate the stats pointer from
> first run to second run? I don't see any problem in passing directing the stats
> and the same stats are updated in the worker side and leader side. Anyway no two
> processes will do the index vacuum at same time. Am I missing something?
>
> Even if this flag is to identify whether the stats are updated or not before
> writing them, I don't see a need of it compared to normal vacuum.
>

The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
first time execution. For example, btvacuumcleanup skips cleanup if
it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
amvacuumcleanup when the first time calling. And they store the result
stats to the memory allocated int the local memory. Therefore in the
parallel vacuum I think that both worker and leader need to move it to
the shared memory and mark it as updated as different worker could
vacuum different indexes at the next time.

OK, understood the point. But for btbulkdelete whenever the stats are NULL,
it allocates the memory. So I don't see a problem with it. 

The only problem is with btvacuumcleanup, when there are no dead tuples
present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
is called at the end of vacuum, in that scenario, there is code flow difference
based on the stats. so why can't we use the deadtuples number to differentiate
instead of adding another flag? And also this scenario is not very often, so avoiding
memcpy for normal operations would be better. It may be a small gain, just 
thought of it.
 

> + initStringInfo(&buf);
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker %s (planned: %d",
> +   "launched %d parallel vacuum workers %s (planned: %d",
> +   lvstate->pcxt->nworkers_launched),
> + lvstate->pcxt->nworkers_launched,
> + for_cleanup ? "for index cleanup" : "for index vacuum",
> + lvstate->pcxt->nworkers);
> + if (lvstate->options.nworkers > 0)
> + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
>
> what is the difference between planned workers and requested workers, aren't both
> are same?

The request is the parallel degree that is specified explicitly by
user whereas the planned is the actual number we planned based on the
number of indexes the table has. For example, if we do like 'VACUUM
(PARALLEL 3000) tbl' where the tbl has 4 indexes, the request is 3000
and the planned is 4. Also if max_parallel_maintenance_workers is 2
the planned is 2.

OK.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <> wrote:
>
>
> On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <> wrote:
>>
>>
>> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> first time execution. For example, btvacuumcleanup skips cleanup if
>> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> amvacuumcleanup when the first time calling. And they store the result
>> stats to the memory allocated int the local memory. Therefore in the
>> parallel vacuum I think that both worker and leader need to move it to
>> the shared memory and mark it as updated as different worker could
>> vacuum different indexes at the next time.
>
>
> OK, understood the point. But for btbulkdelete whenever the stats are NULL,
> it allocates the memory. So I don't see a problem with it.
>
> The only problem is with btvacuumcleanup, when there are no dead tuples
> present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
> is called at the end of vacuum, in that scenario, there is code flow difference
> based on the stats. so why can't we use the deadtuples number to differentiate
> instead of adding another flag?

I don't understand your suggestion. What do we compare deadtuples
number to? Could you elaborate on that please?

> And also this scenario is not very often, so avoiding
> memcpy for normal operations would be better. It may be a small gain, just
> thought of it.
>

This scenario could happen periodically on an insert-only table.
Additional memcpy is executed once per indexes in a vacuuming but I
agree that the avoiding memcpy would be good.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Haribabu Kommi
Дата:

On Sat, Feb 9, 2019 at 11:47 PM Masahiko Sawada <> wrote:
On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <> wrote:
>
>
> On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <> wrote:
>>
>>
>> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> first time execution. For example, btvacuumcleanup skips cleanup if
>> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> amvacuumcleanup when the first time calling. And they store the result
>> stats to the memory allocated int the local memory. Therefore in the
>> parallel vacuum I think that both worker and leader need to move it to
>> the shared memory and mark it as updated as different worker could
>> vacuum different indexes at the next time.
>
>
> OK, understood the point. But for btbulkdelete whenever the stats are NULL,
> it allocates the memory. So I don't see a problem with it.
>
> The only problem is with btvacuumcleanup, when there are no dead tuples
> present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
> is called at the end of vacuum, in that scenario, there is code flow difference
> based on the stats. so why can't we use the deadtuples number to differentiate
> instead of adding another flag?

I don't understand your suggestion. What do we compare deadtuples
number to? Could you elaborate on that please?

The scenario where the stats should pass NULL to btvacuumcleanup function is
when there no dead tuples, I just think that we may use that deadtuples structure
to find out whether stats should pass NULL or not while avoiding the extra
memcpy.
 
> And also this scenario is not very often, so avoiding
> memcpy for normal operations would be better. It may be a small gain, just
> thought of it.
>

This scenario could happen periodically on an insert-only table.
Additional memcpy is executed once per indexes in a vacuuming but I
agree that the avoiding memcpy would be good.

Yes, understood. If possible removing the need of memcpy would be good.
The latest patch doesn't apply anymore. Needs a rebase.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Wed, Feb 13, 2019 at 9:32 PM Haribabu Kommi <> wrote:
>
>
> On Sat, Feb 9, 2019 at 11:47 PM Masahiko Sawada <> wrote:
>>
>> On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <> wrote:
>> >
>> >
>> > On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <> wrote:
>> >>
>> >>
>> >> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> >> first time execution. For example, btvacuumcleanup skips cleanup if
>> >> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> >> amvacuumcleanup when the first time calling. And they store the result
>> >> stats to the memory allocated int the local memory. Therefore in the
>> >> parallel vacuum I think that both worker and leader need to move it to
>> >> the shared memory and mark it as updated as different worker could
>> >> vacuum different indexes at the next time.
>> >
>> >
>> > OK, understood the point. But for btbulkdelete whenever the stats are NULL,
>> > it allocates the memory. So I don't see a problem with it.
>> >
>> > The only problem is with btvacuumcleanup, when there are no dead tuples
>> > present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
>> > is called at the end of vacuum, in that scenario, there is code flow difference
>> > based on the stats. so why can't we use the deadtuples number to differentiate
>> > instead of adding another flag?
>>
>> I don't understand your suggestion. What do we compare deadtuples
>> number to? Could you elaborate on that please?
>
>
> The scenario where the stats should pass NULL to btvacuumcleanup function is
> when there no dead tuples, I just think that we may use that deadtuples structure
> to find out whether stats should pass NULL or not while avoiding the extra
> memcpy.
>

Thank you for your explanation. I understood. Maybe I'm worrying too
much but I'm concernced compatibility; currently we handle indexes
individually. So if there is an index access method whose ambulkdelete
returns NULL at the first call but returns a palloc'd struct at the
second time or other, that doesn't work fine.

The documentation says that passed-in 'stats' is NULL at the first
time call of ambulkdelete but doesn't say about the second time or
more. Index access methods may expect that the passed-in 'stats'  is
the same as what they has returned last time. So I think to add an
extra flag for keeping comptibility.

>>
>> > And also this scenario is not very often, so avoiding
>> > memcpy for normal operations would be better. It may be a small gain, just
>> > thought of it.
>> >
>>
>> This scenario could happen periodically on an insert-only table.
>> Additional memcpy is executed once per indexes in a vacuuming but I
>> agree that the avoiding memcpy would be good.
>
>
> Yes, understood. If possible removing the need of memcpy would be good.
> The latest patch doesn't apply anymore. Needs a rebase.
>

Thank you. Attached the rebased patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Haribabu Kommi
Дата:

On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <> wrote:
On Wed, Feb 13, 2019 at 9:32 PM Haribabu Kommi <> wrote:
>
>
> On Sat, Feb 9, 2019 at 11:47 PM Masahiko Sawada <> wrote:
>>
>> On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <> wrote:
>> >
>> >
>> > On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <> wrote:
>> >>
>> >>
>> >> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> >> first time execution. For example, btvacuumcleanup skips cleanup if
>> >> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> >> amvacuumcleanup when the first time calling. And they store the result
>> >> stats to the memory allocated int the local memory. Therefore in the
>> >> parallel vacuum I think that both worker and leader need to move it to
>> >> the shared memory and mark it as updated as different worker could
>> >> vacuum different indexes at the next time.
>> >
>> >
>> > OK, understood the point. But for btbulkdelete whenever the stats are NULL,
>> > it allocates the memory. So I don't see a problem with it.
>> >
>> > The only problem is with btvacuumcleanup, when there are no dead tuples
>> > present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
>> > is called at the end of vacuum, in that scenario, there is code flow difference
>> > based on the stats. so why can't we use the deadtuples number to differentiate
>> > instead of adding another flag?
>>
>> I don't understand your suggestion. What do we compare deadtuples
>> number to? Could you elaborate on that please?
>
>
> The scenario where the stats should pass NULL to btvacuumcleanup function is
> when there no dead tuples, I just think that we may use that deadtuples structure
> to find out whether stats should pass NULL or not while avoiding the extra
> memcpy.
>

Thank you for your explanation. I understood. Maybe I'm worrying too
much but I'm concernced compatibility; currently we handle indexes
individually. So if there is an index access method whose ambulkdelete
returns NULL at the first call but returns a palloc'd struct at the
second time or other, that doesn't work fine.

The documentation says that passed-in 'stats' is NULL at the first
time call of ambulkdelete but doesn't say about the second time or
more. Index access methods may expect that the passed-in 'stats'  is
the same as what they has returned last time. So I think to add an
extra flag for keeping comptibility.

I checked some of the ambulkdelete functions, and they are not returning
a NULL data whenever those functions are called. But the palloc'd structure
doesn't get filled with the details.

IMO, there is no need of any extra code that is required for parallel vacuum
compared to normal vacuum.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

От
Haribabu Kommi
Дата:
On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <> wrote:
Thank you. Attached the rebased patch.

I ran some performance tests to compare the parallelism benefits,
but I got some strange results of performance overhead, may be it is
because, I tested it on my laptop.

FYI,

Table schema:

create table tbl(f1 int, f2 char(100), f3 float4, f4 char(100), f5 float8, f6 char(100), f7 bigint);


Tbl with 3 indexes

1000 record deletion
master - 22ms
patch - 25ms with 0 parallel workers
patch - 43ms with 1 parallel worker
patch - 72ms with 2 parallel workers


10000 record deletion
master - 52ms
patch - 56ms with 0 parallel workers
patch - 79ms with 1 parallel worker
patch - 86ms with 2 parallel workers


100000 record deletion
master - 410ms
patch - 379ms with 0 parallel workers
patch - 330ms with 1 parallel worker
patch - 289ms with 2 parallel workers


Tbl with 5 indexes

1000 record deletion
master - 28ms
patch - 34ms with 0 parallel workers
patch - 86ms with 2 parallel workers
patch - 106ms with 4 parallel workers


10000 record deletion
master - 58ms
patch - 63ms with 0 parallel workers
patch - 101ms with 2 parallel workers
patch - 118ms with 4 parallel workers


100000 record deletion
master - 632ms
patch - 490ms with 0 parallel workers
patch - 455ms with 2 parallel workers
patch - 403ms with 4 parallel workers



Tbl with 7 indexes

1000 record deletion
master - 35ms
patch - 44ms with 0 parallel workers
patch - 93ms with 2 parallel workers
patch - 110ms with 4 parallel workers
patch - 123ms with 6 parallel workers

10000 record deletion
master - 76ms
patch - 78ms with 0 parallel workers
patch - 135ms with 2 parallel workers
patch - 143ms with 4 parallel workers
patch - 139ms with 6 parallel workers

100000 record deletion
master - 641ms
patch - 656ms with 0 parallel workers
patch - 613ms with 2 parallel workers
patch - 735ms with 4 parallel workers
patch - 679ms with 6 parallel workers


Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <> wrote:
>
> On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <> wrote:
>>
>> Thank you. Attached the rebased patch.
>
>
> I ran some performance tests to compare the parallelism benefits,

Thank you for testing!

> but I got some strange results of performance overhead, may be it is
> because, I tested it on my laptop.

Hmm, I think the parallel vacuum would help for heavy workloads like a
big table with multiple indexes. In your test result, all executions
are completed within 1 sec, which seems to be one use case that the
parallel vacuum wouldn't help. I suspect that the table is small,
right? Anyway I'll also do performance tests.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Sat, Feb 23, 2019 at 10:28 PM Haribabu Kommi
<> wrote:
>
>
> On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <> wrote:
>>
>> On Wed, Feb 13, 2019 at 9:32 PM Haribabu Kommi <> wrote:
>> >
>> >
>> > On Sat, Feb 9, 2019 at 11:47 PM Masahiko Sawada <> wrote:
>> >>
>> >> On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi <> wrote:
>> >> >
>> >> >
>> >> > On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada <> wrote:
>> >> >>
>> >> >>
>> >> >> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> >> >> first time execution. For example, btvacuumcleanup skips cleanup if
>> >> >> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> >> >> amvacuumcleanup when the first time calling. And they store the result
>> >> >> stats to the memory allocated int the local memory. Therefore in the
>> >> >> parallel vacuum I think that both worker and leader need to move it to
>> >> >> the shared memory and mark it as updated as different worker could
>> >> >> vacuum different indexes at the next time.
>> >> >
>> >> >
>> >> > OK, understood the point. But for btbulkdelete whenever the stats are NULL,
>> >> > it allocates the memory. So I don't see a problem with it.
>> >> >
>> >> > The only problem is with btvacuumcleanup, when there are no dead tuples
>> >> > present in the table, the btbulkdelete is not called and directly the btvacuumcleanup
>> >> > is called at the end of vacuum, in that scenario, there is code flow difference
>> >> > based on the stats. so why can't we use the deadtuples number to differentiate
>> >> > instead of adding another flag?
>> >>
>> >> I don't understand your suggestion. What do we compare deadtuples
>> >> number to? Could you elaborate on that please?
>> >
>> >
>> > The scenario where the stats should pass NULL to btvacuumcleanup function is
>> > when there no dead tuples, I just think that we may use that deadtuples structure
>> > to find out whether stats should pass NULL or not while avoiding the extra
>> > memcpy.
>> >
>>
>> Thank you for your explanation. I understood. Maybe I'm worrying too
>> much but I'm concernced compatibility; currently we handle indexes
>> individually. So if there is an index access method whose ambulkdelete
>> returns NULL at the first call but returns a palloc'd struct at the
>> second time or other, that doesn't work fine.
>>
>> The documentation says that passed-in 'stats' is NULL at the first
>> time call of ambulkdelete but doesn't say about the second time or
>> more. Index access methods may expect that the passed-in 'stats'  is
>> the same as what they has returned last time. So I think to add an
>> extra flag for keeping comptibility.
>
>
> I checked some of the ambulkdelete functions, and they are not returning
> a NULL data whenever those functions are called. But the palloc'd structure
> doesn't get filled with the details.
>
> IMO, there is no need of any extra code that is required for parallel vacuum
> compared to normal vacuum.
>

Hmm, I think that this code is necessary to faithfully keep the same
index vacuum behavior, especially for communication between lazy
vacuum and IAMs, as it is. The IAMs in postgres don't worry about that
but other third party AMs might not, and it might be developed in the
future. On the other hand, I can understand your concerns; if such IAM
is quite rare we might not need to make the code complicated
needlessly. I'd like to hear more opinions also from other hackers.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Robert Haas
Дата:
On Thu, Feb 14, 2019 at 5:17 AM Masahiko Sawada <> wrote:
> Thank you. Attached the rebased patch.

Here are some review comments.

+         started by a single utility command.  Currently, the parallel
+         utility commands that support the use of parallel workers are
+         <command>CREATE INDEX</command> and <command>VACUUM</command>
+         without <literal>FULL</literal> option, and only when building
+         a B-tree index.  Parallel workers are taken from the pool of

That sentence is garbled.  The end part about b-tree indexes applies
only to CREATE INDEX, not to VACUUM, since VACUUM does build indexes.

+      Vacuum index and cleanup index in parallel
+      <replaceable class="parameter">N</replaceable> background
workers (for the detail
+      of each vacuum phases, please refer to <xref
linkend="vacuum-phases"/>. If the

I have two problems with this.  One is that I can't understand the
English very well. I think you mean something like: "Perform the
'vacuum index' and 'cleanup index' phases of VACUUM in parallel using
N background workers," but I'm not entirely sure.  The other is that
if that is what you mean, I don't think it's a sufficient description.
Users need to understand whether, for example, only one worker can be
used per index, or whether the work for a single index can be split
across workers.

+      parallel degree <replaceable class="parameter">N</replaceable>
is omitted,
+      then <command>VACUUM</command> decides the number of workers based on
+      number of indexes on the relation which further limited by
+      <xref linkend="guc-max-parallel-workers-maintenance"/>. Also if
this option

Now this makes it sound like it's one worker per index, but you could
be more explicit about it.

+      is specified multile times, the last parallel degree
+      <replaceable class="parameter">N</replaceable> is considered
into the account.

Typo, but I'd just delete this sentence altogether; the behavior if
the option is multiply specified seems like a triviality that need not
be documented.

+    Setting a value for <literal>parallel_workers</literal> via
+    <xref linkend="sql-altertable"/> also controls how many parallel
+    worker processes will be requested by a <command>VACUUM</command>
+    against the table. This setting is overwritten by setting
+    <replaceable class="parameter">N</replaceable> of
<literal>PARALLEL</literal>
+    option.

I wonder if we really want this behavior.  Should a setting that
controls the degree of parallelism when scanning the table also affect
VACUUM?  I tend to think that we probably don't ever want VACUUM of a
table to be parallel by default, but rather something that the user
must explicitly request.  Happy to hear other opinions.  If we do want
this behavior, I think this should be written differently, something
like this: The PARALLEL N option to VACUUM takes precedence over this
option.

+ * parallel mode nor destories the parallel context. For updating the index

Spelling.

+/* DSM keys for parallel lazy vacuum */
+#define PARALLEL_VACUUM_KEY_SHARED UINT64CONST(0xFFFFFFFFFFF00001)
+#define PARALLEL_VACUUM_KEY_DEAD_TUPLES UINT64CONST(0xFFFFFFFFFFF00002)
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT UINT64CONST(0xFFFFFFFFFFF00003)

Any special reason not to use just 1, 2, 3 here?  The general
infrastructure stuff uses high numbers to avoid conflicting with
plan_node_id values, but end clients of the parallel infrastructure
can generally just use small integers.

+ bool updated; /* is the stats updated? */

is -> are

+ * LVDeadTuples controls the dead tuple TIDs collected during heap scan.

what do you mean by "controls", exactly? stores?

+ * This is allocated in a dynamic shared memory segment when parallel
+ * lazy vacuum mode, or allocated in a local memory.

If this is in DSM, then max_tuples is a wart, I think.  We can't grow
the segment at that point.  I'm suspicious that we need a better
design here.  It looks like you gather all of the dead tuples in
backend-local memory and then allocate an equal amount of DSM to copy
them.  But that means that we are using twice as much memory, which
seems pretty bad.  You'd have to do that at least momentarily no
matter what, but it's not obvious that the backend-local copy is ever
freed.  There's another patch kicking around to allocate memory for
vacuum in chunks rather than preallocating the whole slab of memory at
once; we might want to think about getting that committed first and
then having this build on top of it.  At least we need something
smarter than this.

-heap_vacuum_rel(Relation onerel, int options, VacuumParams *params,
+heap_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,

We generally avoid passing a struct by value; copying the struct can
be expensive and having multiple shallow copies of the same data
sometimes leads to surprising results.  I think it might be a good
idea to propose a preliminary refactoring patch that invents
VacuumOptions and gives it just a single 'int' member and refactors
everything to use it, and then that can be committed first.  It should
pass a pointer, though, not the actual struct.

+ LVState    *lvstate;

It's not clear to me why we need this new LVState thing.  What's the
motivation for that?  If it's a good idea, could it be done as a
separate, preparatory patch?  It seems to be responsible for a lot of
code churn in this patch.   It also leads to strange stuff like this:

  ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
+ (errmsg("scanned index \"%s\" to remove %d row versions %s",
  RelationGetRelationName(indrel),
- vacrelstats->num_dead_tuples),
+ dead_tuples->num_tuples,
+ IsParallelWorker() ? "by parallel vacuum worker" : ""),

This doesn't seem to be great grammar, and translation guidelines
generally discourage this sort of incremental message construction
quite strongly.  Since the user can probably infer what happened by a
suitable choice of log_line_prefix, I'm not totally sure this is worth
doing in the first place, but if we're going to do it, it should
probably have two completely separate message strings and pick between
them using IsParallelWorker(), rather than building it up
incrementally like this.

+compute_parallel_workers(Relation rel, int nrequests, int nindexes)

I think 'nrequets' is meant to be 'nrequested'.  It isn't the number
of requests; it's the number of workers that were requested.

+ /* quick exit if no workers are prepared, e.g. under serializable isolation */

That comment makes very little sense in this context.

+ /* Report parallel vacuum worker information */
+ initStringInfo(&buf);
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker %s (planned: %d",
+   "launched %d parallel vacuum workers %s (planned: %d",
+   lvstate->pcxt->nworkers_launched),
+ lvstate->pcxt->nworkers_launched,
+ for_cleanup ? "for index cleanup" : "for index vacuum",
+ lvstate->pcxt->nworkers);
+ if (lvstate->options.nworkers > 0)
+ appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
+
+ appendStringInfo(&buf, ")");
+ ereport(elevel, (errmsg("%s", buf.data)));

This is another example of incremental message construction, again
violating translation guidelines.

+ WaitForParallelWorkersToAttach(lvstate->pcxt);

Why?

+ /*
+ * If there is already-updated result in the shared memory we use it.
+ * Otherwise we pass NULL to index AMs, meaning it's first time call,
+ * and copy the result to the shared memory segment.
+ */

I'm probably missing something here, but isn't the intention that we
only do each index once?  If so, how would there be anything there
already?  Once from for_cleanup = false and once for for_cleanup =
true?

+ if (a->options.flags != b->options.flags)
+ return false;
+ if (a->options.nworkers != b->options.nworkers)
+ return false;

You could just do COMPARE_SCALAR_FIELD(options.flags);
COMPARE_SCALAR_FIELD(options.nworkers);

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Thu, Feb 28, 2019 at 2:44 AM Robert Haas <> wrote:
>
> On Thu, Feb 14, 2019 at 5:17 AM Masahiko Sawada <> wrote:
> > Thank you. Attached the rebased patch.
>
> Here are some review comments.

Thank you for reviewing the patches!

>
> +         started by a single utility command.  Currently, the parallel
> +         utility commands that support the use of parallel workers are
> +         <command>CREATE INDEX</command> and <command>VACUUM</command>
> +         without <literal>FULL</literal> option, and only when building
> +         a B-tree index.  Parallel workers are taken from the pool of
>
> That sentence is garbled.  The end part about b-tree indexes applies
> only to CREATE INDEX, not to VACUUM, since VACUUM does build indexes.

Fixed.

>
> +      Vacuum index and cleanup index in parallel
> +      <replaceable class="parameter">N</replaceable> background
> workers (for the detail
> +      of each vacuum phases, please refer to <xref
> linkend="vacuum-phases"/>. If the
>
> I have two problems with this.  One is that I can't understand the
> English very well. I think you mean something like: "Perform the
> 'vacuum index' and 'cleanup index' phases of VACUUM in parallel using
> N background workers," but I'm not entirely sure.  The other is that
> if that is what you mean, I don't think it's a sufficient description.
> Users need to understand whether, for example, only one worker can be
> used per index, or whether the work for a single index can be split
> across workers.
>
> +      parallel degree <replaceable class="parameter">N</replaceable>
> is omitted,
> +      then <command>VACUUM</command> decides the number of workers based on
> +      number of indexes on the relation which further limited by
> +      <xref linkend="guc-max-parallel-workers-maintenance"/>. Also if
> this option
>
> Now this makes it sound like it's one worker per index, but you could
> be more explicit about it.

Fixed.

>
> +      is specified multile times, the last parallel degree
> +      <replaceable class="parameter">N</replaceable> is considered
> into the account.
>
> Typo, but I'd just delete this sentence altogether; the behavior if
> the option is multiply specified seems like a triviality that need not
> be documented.

Understood, removed.

>
> +    Setting a value for <literal>parallel_workers</literal> via
> +    <xref linkend="sql-altertable"/> also controls how many parallel
> +    worker processes will be requested by a <command>VACUUM</command>
> +    against the table. This setting is overwritten by setting
> +    <replaceable class="parameter">N</replaceable> of
> <literal>PARALLEL</literal>
> +    option.
>
> I wonder if we really want this behavior.  Should a setting that
> controls the degree of parallelism when scanning the table also affect
> VACUUM?  I tend to think that we probably don't ever want VACUUM of a
> table to be parallel by default, but rather something that the user
> must explicitly request.  Happy to hear other opinions.  If we do want
> this behavior, I think this should be written differently, something
> like this: The PARALLEL N option to VACUUM takes precedence over this
> option.

For example, I can imagine a use case where a batch job does parallel
vacuum to some tables in a maintenance window. The batch operation
will need to compute and specify the degree of parallelism every time
according to for instance the number of indexes, which would be
troublesome. But if we can set the degree of parallelism for each
tables it can just to do 'VACUUM (PARALLEL)'.

>
> + * parallel mode nor destories the parallel context. For updating the index
>
> Spelling.

Fixed.

>
> +/* DSM keys for parallel lazy vacuum */
> +#define PARALLEL_VACUUM_KEY_SHARED UINT64CONST(0xFFFFFFFFFFF00001)
> +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES UINT64CONST(0xFFFFFFFFFFF00002)
> +#define PARALLEL_VACUUM_KEY_QUERY_TEXT UINT64CONST(0xFFFFFFFFFFF00003)
>
> Any special reason not to use just 1, 2, 3 here?  The general
> infrastructure stuff uses high numbers to avoid conflicting with
> plan_node_id values, but end clients of the parallel infrastructure
> can generally just use small integers.

It seems that I was worrying unnecessarily, changed to 1, 2, 3.

>
> + bool updated; /* is the stats updated? */
>
> is -> are
>
> + * LVDeadTuples controls the dead tuple TIDs collected during heap scan.
>
> what do you mean by "controls", exactly? stores?

Fixed.

>
> + * This is allocated in a dynamic shared memory segment when parallel
> + * lazy vacuum mode, or allocated in a local memory.
>
> If this is in DSM, then max_tuples is a wart, I think.  We can't grow
> the segment at that point.  I'm suspicious that we need a better
> design here.  It looks like you gather all of the dead tuples in
> backend-local memory and then allocate an equal amount of DSM to copy
> them.  But that means that we are using twice as much memory, which
> seems pretty bad.  You'd have to do that at least momentarily no
> matter what, but it's not obvious that the backend-local copy is ever
> freed.

Hmm, the current design is more simple; only the leader process scans
heap and save dead tuples TID to DSM. The DSM is allocated at once
when starting lazy vacuum and we never need to enlarge DSM . Also we
can use the same code around heap vacuum and collecting dead tuples
for both single process vacuum and parallel vacuum. Once index vacuum
is completed, the leader process reinitializes DSM and reuse it in the
next time.

> There's another patch kicking around to allocate memory for
> vacuum in chunks rather than preallocating the whole slab of memory at
> once; we might want to think about getting that committed first and
> then having this build on top of it.  At least we need something
> smarter than this.

Since the parallel vacuum uses memory in the same manner as the single
process vacuum it's not deteriorated. I'd agree that that patch is
more smarter and this patch can be built on top of it but I'm
concerned that there two proposals on that thread and the discussion
has not been active for 8 months. I wonder if  it would be worth to
think of improving the memory allocating based on that patch after the
parallel vacuum get committed.

>
> -heap_vacuum_rel(Relation onerel, int options, VacuumParams *params,
> +heap_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,
>
> We generally avoid passing a struct by value; copying the struct can
> be expensive and having multiple shallow copies of the same data
> sometimes leads to surprising results.  I think it might be a good
> idea to propose a preliminary refactoring patch that invents
> VacuumOptions and gives it just a single 'int' member and refactors
> everything to use it, and then that can be committed first.  It should
> pass a pointer, though, not the actual struct.

Agreed. I'll separate patches and propose it.

>
> + LVState    *lvstate;
>
> It's not clear to me why we need this new LVState thing.  What's the
> motivation for that?  If it's a good idea, could it be done as a
> separate, preparatory patch?  It seems to be responsible for a lot of
> code churn in this patch.   It also leads to strange stuff like this:

The main motivations are refactoring and improving readability but
it's mainly for the previous version patch which implements parallel
heap vacuum. It might no longer need here. I'll try to implement
without LVState. Thank you.

>
>   ereport(elevel,
> - (errmsg("scanned index \"%s\" to remove %d row versions",
> + (errmsg("scanned index \"%s\" to remove %d row versions %s",
>   RelationGetRelationName(indrel),
> - vacrelstats->num_dead_tuples),
> + dead_tuples->num_tuples,
> + IsParallelWorker() ? "by parallel vacuum worker" : ""),
>
> This doesn't seem to be great grammar, and translation guidelines
> generally discourage this sort of incremental message construction
> quite strongly.  Since the user can probably infer what happened by a
> suitable choice of log_line_prefix, I'm not totally sure this is worth
> doing in the first place, but if we're going to do it, it should
> probably have two completely separate message strings and pick between
> them using IsParallelWorker(), rather than building it up
> incrementally like this.

Fixed.

>
> +compute_parallel_workers(Relation rel, int nrequests, int nindexes)
>
> I think 'nrequets' is meant to be 'nrequested'.  It isn't the number
> of requests; it's the number of workers that were requested.

Fixed.

>
> + /* quick exit if no workers are prepared, e.g. under serializable isolation */
>
> That comment makes very little sense in this context.

Fixed.

>
> + /* Report parallel vacuum worker information */
> + initStringInfo(&buf);
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker %s (planned: %d",
> +   "launched %d parallel vacuum workers %s (planned: %d",
> +   lvstate->pcxt->nworkers_launched),
> + lvstate->pcxt->nworkers_launched,
> + for_cleanup ? "for index cleanup" : "for index vacuum",
> + lvstate->pcxt->nworkers);
> + if (lvstate->options.nworkers > 0)
> + appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
> +
> + appendStringInfo(&buf, ")");
> + ereport(elevel, (errmsg("%s", buf.data)));
>
> This is another example of incremental message construction, again
> violating translation guidelines.

Fixed.

>
> + WaitForParallelWorkersToAttach(lvstate->pcxt);
>
> Why?

Oh not necessary, removed.

>
> + /*
> + * If there is already-updated result in the shared memory we use it.
> + * Otherwise we pass NULL to index AMs, meaning it's first time call,
> + * and copy the result to the shared memory segment.
> + */
>
> I'm probably missing something here, but isn't the intention that we
> only do each index once?  If so, how would there be anything there
> already?  Once from for_cleanup = false and once for for_cleanup =
> true?

We call ambulkdelete (for_cleanup = false) 0 or more times for each
index and call amvacuumcleanup (for_cleanup = true) at the end. In the
first time calling either ambulkdelete or amvacuumcleanup the lazy
vacuum must pass NULL to them. They return either palloc'd
IndexBulkDeleteResult or NULL. If they returns the former the lazy
vacuum must pass it to them again at the next time. In current design,
since there is no guarantee that an index is always processed by the
same vacuum process each vacuum processes save the result to DSM in
order to share those results among vacuum processes. The 'updated'
flags indicates that its slot is used. So we can pass the address of
DSM if 'updated' is true, otherwise pass NULL.

>
> + if (a->options.flags != b->options.flags)
> + return false;
> + if (a->options.nworkers != b->options.nworkers)
> + return false;
>
> You could just do COMPARE_SCALAR_FIELD(options.flags);
> COMPARE_SCALAR_FIELD(options.nworkers);

Fixed.

Almost comments I got have been incorporated to the local branch but a
few comments need discussion. I'll submit the updated version patch
once I addressed all of comments.





Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Robert Haas
Дата:
On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <> wrote:
> > I wonder if we really want this behavior.  Should a setting that
> > controls the degree of parallelism when scanning the table also affect
> > VACUUM?  I tend to think that we probably don't ever want VACUUM of a
> > table to be parallel by default, but rather something that the user
> > must explicitly request.  Happy to hear other opinions.  If we do want
> > this behavior, I think this should be written differently, something
> > like this: The PARALLEL N option to VACUUM takes precedence over this
> > option.
>
> For example, I can imagine a use case where a batch job does parallel
> vacuum to some tables in a maintenance window. The batch operation
> will need to compute and specify the degree of parallelism every time
> according to for instance the number of indexes, which would be
> troublesome. But if we can set the degree of parallelism for each
> tables it can just to do 'VACUUM (PARALLEL)'.

True, but the setting in question would also affect the behavior of
sequential scans and index scans.  TBH, I'm not sure that the
parallel_workers reloption is really a great design as it is: is
hard-coding the number of workers really what people want?  Do they
really want the same degree of parallelism for sequential scans and
index scans?  Why should they want the same degree of parallelism also
for VACUUM?  Maybe they do, and maybe somebody explain why they do,
but as of now, it's not obvious to me why that should be true.

> Since the parallel vacuum uses memory in the same manner as the single
> process vacuum it's not deteriorated. I'd agree that that patch is
> more smarter and this patch can be built on top of it but I'm
> concerned that there two proposals on that thread and the discussion
> has not been active for 8 months. I wonder if  it would be worth to
> think of improving the memory allocating based on that patch after the
> parallel vacuum get committed.

Well, I think we can't just say "oh, this patch is going to use twice
as much memory as before," which is what it looks like it's doing
right now. If you think it's not doing that, can you explain further?

> Agreed. I'll separate patches and propose it.

Cool.  Probably best to keep that on this thread.

> The main motivations are refactoring and improving readability but
> it's mainly for the previous version patch which implements parallel
> heap vacuum. It might no longer need here. I'll try to implement
> without LVState. Thank you.

Oh, OK.

> > + /*
> > + * If there is already-updated result in the shared memory we use it.
> > + * Otherwise we pass NULL to index AMs, meaning it's first time call,
> > + * and copy the result to the shared memory segment.
> > + */
> >
> > I'm probably missing something here, but isn't the intention that we
> > only do each index once?  If so, how would there be anything there
> > already?  Once from for_cleanup = false and once for for_cleanup =
> > true?
>
> We call ambulkdelete (for_cleanup = false) 0 or more times for each
> index and call amvacuumcleanup (for_cleanup = true) at the end. In the
> first time calling either ambulkdelete or amvacuumcleanup the lazy
> vacuum must pass NULL to them. They return either palloc'd
> IndexBulkDeleteResult or NULL. If they returns the former the lazy
> vacuum must pass it to them again at the next time. In current design,
> since there is no guarantee that an index is always processed by the
> same vacuum process each vacuum processes save the result to DSM in
> order to share those results among vacuum processes. The 'updated'
> flags indicates that its slot is used. So we can pass the address of
> DSM if 'updated' is true, otherwise pass NULL.

Ah, OK.  Thanks for explaining.

> Almost comments I got have been incorporated to the local branch but a
> few comments need discussion. I'll submit the updated version patch
> once I addressed all of comments.

Cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Sat, Mar 2, 2019 at 3:54 AM Robert Haas <> wrote:
>
> On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <> wrote:
> > > I wonder if we really want this behavior.  Should a setting that
> > > controls the degree of parallelism when scanning the table also affect
> > > VACUUM?  I tend to think that we probably don't ever want VACUUM of a
> > > table to be parallel by default, but rather something that the user
> > > must explicitly request.  Happy to hear other opinions.  If we do want
> > > this behavior, I think this should be written differently, something
> > > like this: The PARALLEL N option to VACUUM takes precedence over this
> > > option.
> >
> > For example, I can imagine a use case where a batch job does parallel
> > vacuum to some tables in a maintenance window. The batch operation
> > will need to compute and specify the degree of parallelism every time
> > according to for instance the number of indexes, which would be
> > troublesome. But if we can set the degree of parallelism for each
> > tables it can just to do 'VACUUM (PARALLEL)'.
>
> True, but the setting in question would also affect the behavior of
> sequential scans and index scans.  TBH, I'm not sure that the
> parallel_workers reloption is really a great design as it is: is
> hard-coding the number of workers really what people want?  Do they
> really want the same degree of parallelism for sequential scans and
> index scans?  Why should they want the same degree of parallelism also
> for VACUUM?  Maybe they do, and maybe somebody explain why they do,
> but as of now, it's not obvious to me why that should be true.

I think that there are users who want to specify the degree of
parallelism. I think that hard-coding the number of workers would be
good design for something like VACUUM which is a simple operation for
single object; since there are no joins, aggregations it'd be
relatively easy to compute it. That's why the patch introduces
PARALLEL N option as well. I think that a reloption for parallel
vacuum would be just a way to save the degree of parallelism. And I
agree that users don't want to use same degree of parallelism for
VACUUM, so maybe it'd better to add new reloption like
parallel_vacuum_workers. On the other hand, it can be a separate
patch, I can remove the reloption part from this patch and will
propose it when there are requests.

>
> > Since the parallel vacuum uses memory in the same manner as the single
> > process vacuum it's not deteriorated. I'd agree that that patch is
> > more smarter and this patch can be built on top of it but I'm
> > concerned that there two proposals on that thread and the discussion
> > has not been active for 8 months. I wonder if  it would be worth to
> > think of improving the memory allocating based on that patch after the
> > parallel vacuum get committed.
>
> Well, I think we can't just say "oh, this patch is going to use twice
> as much memory as before," which is what it looks like it's doing
> right now. If you think it's not doing that, can you explain further?

In the current design, the leader process allocates the whole DSM at
once when starting and records dead tuple's TIDs to the DSM. This is
the same behaviour as before except for it's recording dead tuples TID
to the shared memory segment. Once index vacuuming finished the leader
process re-initialize DSM for the next time. So parallel vacuum uses
the same amount of memory as before  during execution.

>
> > Agreed. I'll separate patches and propose it.
>
> Cool.  Probably best to keep that on this thread.

Understood.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Mon, Mar 4, 2019 at 10:27 AM Masahiko Sawada <> wrote:
>
> On Sat, Mar 2, 2019 at 3:54 AM Robert Haas <> wrote:
> >
> > On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <> wrote:
> > > > I wonder if we really want this behavior.  Should a setting that
> > > > controls the degree of parallelism when scanning the table also affect
> > > > VACUUM?  I tend to think that we probably don't ever want VACUUM of a
> > > > table to be parallel by default, but rather something that the user
> > > > must explicitly request.  Happy to hear other opinions.  If we do want
> > > > this behavior, I think this should be written differently, something
> > > > like this: The PARALLEL N option to VACUUM takes precedence over this
> > > > option.
> > >
> > > For example, I can imagine a use case where a batch job does parallel
> > > vacuum to some tables in a maintenance window. The batch operation
> > > will need to compute and specify the degree of parallelism every time
> > > according to for instance the number of indexes, which would be
> > > troublesome. But if we can set the degree of parallelism for each
> > > tables it can just to do 'VACUUM (PARALLEL)'.
> >
> > True, but the setting in question would also affect the behavior of
> > sequential scans and index scans.  TBH, I'm not sure that the
> > parallel_workers reloption is really a great design as it is: is
> > hard-coding the number of workers really what people want?  Do they
> > really want the same degree of parallelism for sequential scans and
> > index scans?  Why should they want the same degree of parallelism also
> > for VACUUM?  Maybe they do, and maybe somebody explain why they do,
> > but as of now, it's not obvious to me why that should be true.
>
> I think that there are users who want to specify the degree of
> parallelism. I think that hard-coding the number of workers would be
> good design for something like VACUUM which is a simple operation for
> single object; since there are no joins, aggregations it'd be
> relatively easy to compute it. That's why the patch introduces
> PARALLEL N option as well. I think that a reloption for parallel
> vacuum would be just a way to save the degree of parallelism. And I
> agree that users don't want to use same degree of parallelism for
> VACUUM, so maybe it'd better to add new reloption like
> parallel_vacuum_workers. On the other hand, it can be a separate
> patch, I can remove the reloption part from this patch and will
> propose it when there are requests.
>

Okay, attached the latest version of patch set. I've incorporated all
comments I got and separated the patch for making vacuum options a
Node (0001 patch). And the patch doesn't use parallel_workers. It
might be proposed in the another form again in the future if
requested.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Robert Haas
Дата:
On Wed, Mar 6, 2019 at 1:26 AM Masahiko Sawada <> wrote:
> Okay, attached the latest version of patch set. I've incorporated all
> comments I got and separated the patch for making vacuum options a
> Node (0001 patch). And the patch doesn't use parallel_workers. It
> might be proposed in the another form again in the future if
> requested.

Why make it a Node?  I mean I think a struct makes sense, but what's
the point of giving it a NodeTag?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Thu, Mar 7, 2019 at 2:54 AM Robert Haas <> wrote:
>
> On Wed, Mar 6, 2019 at 1:26 AM Masahiko Sawada <> wrote:
> > Okay, attached the latest version of patch set. I've incorporated all
> > comments I got and separated the patch for making vacuum options a
> > Node (0001 patch). And the patch doesn't use parallel_workers. It
> > might be proposed in the another form again in the future if
> > requested.
>
> Why make it a Node?  I mean I think a struct makes sense, but what's
> the point of giving it a NodeTag?
>

Well, the main point is consistency with other nodes and keep the code clean.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Robert Haas
Дата:
On Wed, Mar 6, 2019 at 10:58 PM Masahiko Sawada <> wrote:
> > Why make it a Node?  I mean I think a struct makes sense, but what's
> > the point of giving it a NodeTag?
>
> Well, the main point is consistency with other nodes and keep the code clean.

It looks to me like if we made it a plain struct rather than a node,
and embedded that struct (not a pointer) in VacuumStmt, then what
would happen is that _copyVacuumStmt and _equalVacuumStmt would have
clauses for each vacuum option individually, with a dot, like
COPY_SCALAR_FIELD(options.flags).

Also, the grammar production for VacuumStmt would need to be jiggered
around a bit; the way that options consolidation is done there would
have to be changed.

Neither of those things sound terribly hard or terribly messy, but on
the other hand I guess there's nothing really wrong with the way you
did it, either ... anybody else have an opinion?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Mar 8, 2019 at 12:22 AM Robert Haas <> wrote:
>
> On Wed, Mar 6, 2019 at 10:58 PM Masahiko Sawada <> wrote:
> > > Why make it a Node?  I mean I think a struct makes sense, but what's
> > > the point of giving it a NodeTag?
> >
> > Well, the main point is consistency with other nodes and keep the code clean.
>
> It looks to me like if we made it a plain struct rather than a node,
> and embedded that struct (not a pointer) in VacuumStmt, then what
> would happen is that _copyVacuumStmt and _equalVacuumStmt would have
> clauses for each vacuum option individually, with a dot, like
> COPY_SCALAR_FIELD(options.flags).
>
> Also, the grammar production for VacuumStmt would need to be jiggered
> around a bit; the way that options consolidation is done there would
> have to be changed.
>
> Neither of those things sound terribly hard or terribly messy, but on
> the other hand I guess there's nothing really wrong with the way you
> did it, either ... anybody else have an opinion?
>

I don't have a strong opinion but the using a Node would be more
suitable in the future when we add more options to vacuum. And it
seems to me that it's unlikely to change a Node to a plain struct. So
there is an idea of doing it now anyway if we might need to do it
someday.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Robert Haas
Дата:
On Wed, Mar 13, 2019 at 1:56 AM Masahiko Sawada <> wrote:
> I don't have a strong opinion but the using a Node would be more
> suitable in the future when we add more options to vacuum. And it
> seems to me that it's unlikely to change a Node to a plain struct. So
> there is an idea of doing it now anyway if we might need to do it
> someday.

I just tried to apply 0001 again and noticed a conflict in the
autovac_table structure in postmaster.c.

That conflict got me thinking: aren't parameters and options an awful
lot alike?  Why do we need to pass around a VacuumOptions structure
*and* a VacuumParams structure to all of these functions?  Couldn't we
just have one?  That led to the attached patch, which just gets rid of
the separate options flag and folds it into VacuumParams.  If we took
this approach, the degree of parallelism would just be another thing
that would get added to VacuumParams, and VacuumOptions wouldn't end
up existing at all.

This patch does not address the question of what the *parse tree*
representation of the PARALLEL option should look like; the idea would
be that ExecVacuum() would need to extra the value for that option and
put it into VacuumParams just as it already does for various other
things in VacuumParams.  Maybe the most natural approach would be to
convert the grammar productions for the VACUUM options list so that
they just build a list of DefElems, and then have ExecVacuum() iterate
over that list and make sense of it, as for example ExplainQuery()
already does.

I kinda like the idea of doing it that way, but then I came up with
it, so maybe you or others will think it's terrible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Thu, Mar 14, 2019 at 6:41 AM Robert Haas <> wrote:
>
> On Wed, Mar 13, 2019 at 1:56 AM Masahiko Sawada <> wrote:
> > I don't have a strong opinion but the using a Node would be more
> > suitable in the future when we add more options to vacuum. And it
> > seems to me that it's unlikely to change a Node to a plain struct. So
> > there is an idea of doing it now anyway if we might need to do it
> > someday.
>
> I just tried to apply 0001 again and noticed a conflict in the
> autovac_table structure in postmaster.c.
>
> That conflict got me thinking: aren't parameters and options an awful
> lot alike?  Why do we need to pass around a VacuumOptions structure
> *and* a VacuumParams structure to all of these functions?  Couldn't we
> just have one?  That led to the attached patch, which just gets rid of
> the separate options flag and folds it into VacuumParams.

Indeed. I like this approach. The comment of vacuum() says,

* options is a bitmask of VacuumOption flags, indicating what to do.
* (snip)
* params contains a set of parameters that can be used to customize the
* behavior.

It seems to me that the purpose of both variables are different. But
it would be acceptable even if we merge them.

BTW your patch seems to not apply to the current HEAD cleanly and to
need to update the comment of vacuum().

> If we took
> this approach, the degree of parallelism would just be another thing
> that would get added to VacuumParams, and VacuumOptions wouldn't end
> up existing at all.
>

Agreed.

> This patch does not address the question of what the *parse tree*
> representation of the PARALLEL option should look like; the idea would
> be that ExecVacuum() would need to extra the value for that option and
> put it into VacuumParams just as it already does for various other
> things in VacuumParams.  Maybe the most natural approach would be to
> convert the grammar productions for the VACUUM options list so that
> they just build a list of DefElems, and then have ExecVacuum() iterate
> over that list and make sense of it, as for example ExplainQuery()
> already does.
>

Agreed. That change would help for the discussion changing VACUUM
option syntax to field-and-value style.

Attached the updated patch you proposed and the patch that converts
the grammer productions for the VACUUM option on top of the former
patch. The latter patch moves VacuumOption to vacuum.h since the
parser no longer needs such information.

If we take this direction I will change the parallel vacuum patch so
that it adds new PARALLEL option and adds 'nworkers' to VacuumParams.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada <> wrote:
>
> On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <> wrote:
> >
> > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <> wrote:
> >>
> >> Thank you. Attached the rebased patch.
> >
> >
> > I ran some performance tests to compare the parallelism benefits,
>
> Thank you for testing!
>
> > but I got some strange results of performance overhead, may be it is
> > because, I tested it on my laptop.
>
> Hmm, I think the parallel vacuum would help for heavy workloads like a
> big table with multiple indexes. In your test result, all executions
> are completed within 1 sec, which seems to be one use case that the
> parallel vacuum wouldn't help. I suspect that the table is small,
> right? Anyway I'll also do performance tests.
>

Here is the performance test results. I've setup a 500MB table with
several indexes and made 10% of table dirty before each vacuum.
Compared execution time of the patched postgrse with the current HEAD
(at 'speed_up' column). In my environment,

 indexes | parallel_degree |  patched   |    head    | speed_up
---------+-----------------+------------+------------+----------
      0 |               0 |   238.2085 |   244.7625 |   1.0275
      0 |               1 |   237.7050 |   244.7625 |   1.0297
      0 |               2 |   238.0390 |   244.7625 |   1.0282
      0 |               4 |   238.1045 |   244.7625 |   1.0280
      0 |               8 |   237.8995 |   244.7625 |   1.0288
      0 |              16 |   237.7775 |   244.7625 |   1.0294
      1 |               0 |  1328.8590 |  1334.9125 |   1.0046
      1 |               1 |  1325.9140 |  1334.9125 |   1.0068
      1 |               2 |  1333.3665 |  1334.9125 |   1.0012
      1 |               4 |  1329.5205 |  1334.9125 |   1.0041
      1 |               8 |  1334.2255 |  1334.9125 |   1.0005
      1 |              16 |  1335.1510 |  1334.9125 |   0.9998
      2 |               0 |  2426.2905 |  2427.5165 |   1.0005
      2 |               1 |  1416.0595 |  2427.5165 |   1.7143
      2 |               2 |  1411.6270 |  2427.5165 |   1.7197
      2 |               4 |  1411.6490 |  2427.5165 |   1.7196
      2 |               8 |  1410.1750 |  2427.5165 |   1.7214
      2 |              16 |  1413.4985 |  2427.5165 |   1.7174
      4 |               0 |  4622.5060 |  4619.0340 |   0.9992
      4 |               1 |  2536.8435 |  4619.0340 |   1.8208
      4 |               2 |  2548.3615 |  4619.0340 |   1.8126
      4 |               4 |  1467.9655 |  4619.0340 |   3.1466
      4 |               8 |  1486.3155 |  4619.0340 |   3.1077
      4 |              16 |  1481.7150 |  4619.0340 |   3.1174
      8 |               0 |  9039.3810 |  8990.4735 |   0.9946
      8 |               1 |  4807.5880 |  8990.4735 |   1.8701
      8 |               2 |  3786.7620 |  8990.4735 |   2.3742
      8 |               4 |  2924.2205 |  8990.4735 |   3.0745
      8 |               8 |  2684.2545 |  8990.4735 |   3.3493
      8 |              16 |  2672.9800 |  8990.4735 |   3.3635
     16 |               0 | 17821.4715 | 17740.1300 |   0.9954
     16 |               1 |  9318.3810 | 17740.1300 |   1.9038
     16 |               2 |  7260.6315 | 17740.1300 |   2.4433
     16 |               4 |  5538.5225 | 17740.1300 |   3.2030
     16 |               8 |  5368.5255 | 17740.1300 |   3.3045
     16 |              16 |  5291.8510 | 17740.1300 |   3.3523
(36 rows)

Attached the updated version patches. The patches apply to the current
HEAD cleanly but the 0001 patch still changes the vacuum option to a
Node since it's under the discussion. After the direction has been
decided, I'll update the patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Kyotaro HORIGUCHI
Дата:
Hello.

At Mon, 18 Mar 2019 11:54:42 +0900, Masahiko Sawada <> wrote in
<>
> Here is the performance test results. I've setup a 500MB table with
> several indexes and made 10% of table dirty before each vacuum.
> Compared execution time of the patched postgrse with the current HEAD
> (at 'speed_up' column). In my environment,
> 
>  indexes | parallel_degree |  patched   |    head    | speed_up
> ---------+-----------------+------------+------------+----------
>       0 |               0 |   238.2085 |   244.7625 |   1.0275
>       0 |               1 |   237.7050 |   244.7625 |   1.0297
>       0 |               2 |   238.0390 |   244.7625 |   1.0282
>       0 |               4 |   238.1045 |   244.7625 |   1.0280
>       0 |               8 |   237.8995 |   244.7625 |   1.0288
>       0 |              16 |   237.7775 |   244.7625 |   1.0294
>       1 |               0 |  1328.8590 |  1334.9125 |   1.0046
>       1 |               1 |  1325.9140 |  1334.9125 |   1.0068
>       1 |               2 |  1333.3665 |  1334.9125 |   1.0012
>       1 |               4 |  1329.5205 |  1334.9125 |   1.0041
>       1 |               8 |  1334.2255 |  1334.9125 |   1.0005
>       1 |              16 |  1335.1510 |  1334.9125 |   0.9998
>       2 |               0 |  2426.2905 |  2427.5165 |   1.0005
>       2 |               1 |  1416.0595 |  2427.5165 |   1.7143
>       2 |               2 |  1411.6270 |  2427.5165 |   1.7197
>       2 |               4 |  1411.6490 |  2427.5165 |   1.7196
>       2 |               8 |  1410.1750 |  2427.5165 |   1.7214
>       2 |              16 |  1413.4985 |  2427.5165 |   1.7174
>       4 |               0 |  4622.5060 |  4619.0340 |   0.9992
>       4 |               1 |  2536.8435 |  4619.0340 |   1.8208
>       4 |               2 |  2548.3615 |  4619.0340 |   1.8126
>       4 |               4 |  1467.9655 |  4619.0340 |   3.1466
>       4 |               8 |  1486.3155 |  4619.0340 |   3.1077
>       4 |              16 |  1481.7150 |  4619.0340 |   3.1174
>       8 |               0 |  9039.3810 |  8990.4735 |   0.9946
>       8 |               1 |  4807.5880 |  8990.4735 |   1.8701
>       8 |               2 |  3786.7620 |  8990.4735 |   2.3742
>       8 |               4 |  2924.2205 |  8990.4735 |   3.0745
>       8 |               8 |  2684.2545 |  8990.4735 |   3.3493
>       8 |              16 |  2672.9800 |  8990.4735 |   3.3635
>      16 |               0 | 17821.4715 | 17740.1300 |   0.9954
>      16 |               1 |  9318.3810 | 17740.1300 |   1.9038
>      16 |               2 |  7260.6315 | 17740.1300 |   2.4433
>      16 |               4 |  5538.5225 | 17740.1300 |   3.2030
>      16 |               8 |  5368.5255 | 17740.1300 |   3.3045
>      16 |              16 |  5291.8510 | 17740.1300 |   3.3523
> (36 rows)

For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
almost the same. I suspect that the indexes are too-small and all
the index pages were on memory and CPU is saturated. Maybe you
had four cores and parallel workers more than the number had no
effect.  Other normal backends should have been able do almost
nothing meanwhile. Usually the number of parallel workers is
determined so that IO capacity is filled up but this feature
intermittently saturates CPU capacity very under such a
situation.

I'm not sure, but what if we do index vacuum in one-tuple-by-one
manner? That is, heap vacuum passes dead tuple one-by-one (or
buffering few tuples) to workers and workers process it not by
bulkdelete, but just tuple_delete (we don't have one). That could
avoid the sleep time of heap-scan while index bulkdelete.


> Attached the updated version patches. The patches apply to the current
> HEAD cleanly but the 0001 patch still changes the vacuum option to a
> Node since it's under the discussion. After the direction has been
> decided, I'll update the patches.

As for the to-be-or-not-to-be a node problem, I don't think it is
needed but from the point of consistency, it seems reasonable and
it is seen in other nodes that *Stmt Node holds option Node. But
makeVacOpt and it's usage, and subsequent operations on the node
look somewhat strange.. Why don't you just do
"makeNode(VacuumOptions)"?


>+    /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
>+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);

If I understand this correctly, nindexes is always > 1 there. At
lesat asserted that > 0 there.

>+    estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),

I don't think the name is good. (dt menant detach by the first look for me..)

>+        if (lps->nworkers_requested > 0)
>+            appendStringInfo(&buf,
>+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested
%d)",

"planned"?


>+        /* Get the next index to vacuum */
>+        if (do_parallel)
>+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
>+        else
>+            idx = nprocessed++;

It seems that both of the two cases can be handled using
LVParallelState and most of the branches by lps or do_parallel
can be removed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

От
Robert Haas
Дата:
On Thu, Mar 14, 2019 at 3:37 AM Masahiko Sawada <> wrote:
> BTW your patch seems to not apply to the current HEAD cleanly and to
> need to update the comment of vacuum().

Yeah, I omitted some hunks by being stupid with 'git'.

Since you seem to like the approach, I put back the hunks I intended
to have there, pulled in one change from your v2 that looked good,
made one other tweak, and committed this.  I think I like what I did
with vacuum_open_relation a bit better than what you did; actually, I
think it cannot be right to just pass 'params' when the current code
is passing params->options & ~(VACOPT_VACUUM).  My approach avoids
that particular pitfall.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

От
Robert Haas
Дата:
On Thu, Mar 14, 2019 at 3:37 AM Masahiko Sawada <> wrote:
> Attached the updated patch you proposed and the patch that converts
> the grammer productions for the VACUUM option on top of the former
> patch. The latter patch moves VacuumOption to vacuum.h since the
> parser no longer needs such information.

Committed.

> If we take this direction I will change the parallel vacuum patch so
> that it adds new PARALLEL option and adds 'nworkers' to VacuumParams.

Sounds good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Mar 19, 2019 at 3:05 AM Robert Haas <> wrote:
>
> On Thu, Mar 14, 2019 at 3:37 AM Masahiko Sawada <> wrote:
> > BTW your patch seems to not apply to the current HEAD cleanly and to
> > need to update the comment of vacuum().
>
> Yeah, I omitted some hunks by being stupid with 'git'.
>
> Since you seem to like the approach, I put back the hunks I intended
> to have there, pulled in one change from your v2 that looked good,
> made one other tweak, and committed this.

Thank you!

>   I think I like what I did
> with vacuum_open_relation a bit better than what you did; actually, I
> think it cannot be right to just pass 'params' when the current code
> is passing params->options & ~(VACOPT_VACUUM).  My approach avoids
> that particular pitfall.

Agreed. Thanks.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Haribabu Kommi
Дата:

On Mon, Mar 18, 2019 at 1:58 PM Masahiko Sawada <> wrote:
On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada <> wrote:
>
> On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <> wrote:
> >
> > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <> wrote:
> >>
> >> Thank you. Attached the rebased patch.
> >
> >
> > I ran some performance tests to compare the parallelism benefits,
>
> Thank you for testing!
>
> > but I got some strange results of performance overhead, may be it is
> > because, I tested it on my laptop.
>
> Hmm, I think the parallel vacuum would help for heavy workloads like a
> big table with multiple indexes. In your test result, all executions
> are completed within 1 sec, which seems to be one use case that the
> parallel vacuum wouldn't help. I suspect that the table is small,
> right? Anyway I'll also do performance tests.
>

Here is the performance test results. I've setup a 500MB table with
several indexes and made 10% of table dirty before each vacuum.
Compared execution time of the patched postgrse with the current HEAD
(at 'speed_up' column). In my environment,

 indexes | parallel_degree |  patched   |    head    | speed_up
---------+-----------------+------------+------------+----------
      0 |               0 |   238.2085 |   244.7625 |   1.0275
      0 |               1 |   237.7050 |   244.7625 |   1.0297
      0 |               2 |   238.0390 |   244.7625 |   1.0282
      0 |               4 |   238.1045 |   244.7625 |   1.0280
      0 |               8 |   237.8995 |   244.7625 |   1.0288
      0 |              16 |   237.7775 |   244.7625 |   1.0294
      1 |               0 |  1328.8590 |  1334.9125 |   1.0046
      1 |               1 |  1325.9140 |  1334.9125 |   1.0068
      1 |               2 |  1333.3665 |  1334.9125 |   1.0012
      1 |               4 |  1329.5205 |  1334.9125 |   1.0041
      1 |               8 |  1334.2255 |  1334.9125 |   1.0005
      1 |              16 |  1335.1510 |  1334.9125 |   0.9998
      2 |               0 |  2426.2905 |  2427.5165 |   1.0005
      2 |               1 |  1416.0595 |  2427.5165 |   1.7143
      2 |               2 |  1411.6270 |  2427.5165 |   1.7197
      2 |               4 |  1411.6490 |  2427.5165 |   1.7196
      2 |               8 |  1410.1750 |  2427.5165 |   1.7214
      2 |              16 |  1413.4985 |  2427.5165 |   1.7174
      4 |               0 |  4622.5060 |  4619.0340 |   0.9992
      4 |               1 |  2536.8435 |  4619.0340 |   1.8208
      4 |               2 |  2548.3615 |  4619.0340 |   1.8126
      4 |               4 |  1467.9655 |  4619.0340 |   3.1466
      4 |               8 |  1486.3155 |  4619.0340 |   3.1077
      4 |              16 |  1481.7150 |  4619.0340 |   3.1174
      8 |               0 |  9039.3810 |  8990.4735 |   0.9946
      8 |               1 |  4807.5880 |  8990.4735 |   1.8701
      8 |               2 |  3786.7620 |  8990.4735 |   2.3742
      8 |               4 |  2924.2205 |  8990.4735 |   3.0745
      8 |               8 |  2684.2545 |  8990.4735 |   3.3493
      8 |              16 |  2672.9800 |  8990.4735 |   3.3635
     16 |               0 | 17821.4715 | 17740.1300 |   0.9954
     16 |               1 |  9318.3810 | 17740.1300 |   1.9038
     16 |               2 |  7260.6315 | 17740.1300 |   2.4433
     16 |               4 |  5538.5225 | 17740.1300 |   3.2030
     16 |               8 |  5368.5255 | 17740.1300 |   3.3045
     16 |              16 |  5291.8510 | 17740.1300 |   3.3523
(36 rows)

The performance results are good. Do we want to add the recommended
size in the document for the parallel option? the parallel option for smaller
tables can lead to performance overhead.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Mon, Mar 18, 2019 at 7:06 PM Kyotaro HORIGUCHI
<> wrote:
>
> Hello.
>
> At Mon, 18 Mar 2019 11:54:42 +0900, Masahiko Sawada <> wrote in
<>
> > Here is the performance test results. I've setup a 500MB table with
> > several indexes and made 10% of table dirty before each vacuum.
> > Compared execution time of the patched postgrse with the current HEAD
> > (at 'speed_up' column). In my environment,
> >
> >  indexes | parallel_degree |  patched   |    head    | speed_up
> > ---------+-----------------+------------+------------+----------
> >       0 |               0 |   238.2085 |   244.7625 |   1.0275
> >       0 |               1 |   237.7050 |   244.7625 |   1.0297
> >       0 |               2 |   238.0390 |   244.7625 |   1.0282
> >       0 |               4 |   238.1045 |   244.7625 |   1.0280
> >       0 |               8 |   237.8995 |   244.7625 |   1.0288
> >       0 |              16 |   237.7775 |   244.7625 |   1.0294
> >       1 |               0 |  1328.8590 |  1334.9125 |   1.0046
> >       1 |               1 |  1325.9140 |  1334.9125 |   1.0068
> >       1 |               2 |  1333.3665 |  1334.9125 |   1.0012
> >       1 |               4 |  1329.5205 |  1334.9125 |   1.0041
> >       1 |               8 |  1334.2255 |  1334.9125 |   1.0005
> >       1 |              16 |  1335.1510 |  1334.9125 |   0.9998
> >       2 |               0 |  2426.2905 |  2427.5165 |   1.0005
> >       2 |               1 |  1416.0595 |  2427.5165 |   1.7143
> >       2 |               2 |  1411.6270 |  2427.5165 |   1.7197
> >       2 |               4 |  1411.6490 |  2427.5165 |   1.7196
> >       2 |               8 |  1410.1750 |  2427.5165 |   1.7214
> >       2 |              16 |  1413.4985 |  2427.5165 |   1.7174
> >       4 |               0 |  4622.5060 |  4619.0340 |   0.9992
> >       4 |               1 |  2536.8435 |  4619.0340 |   1.8208
> >       4 |               2 |  2548.3615 |  4619.0340 |   1.8126
> >       4 |               4 |  1467.9655 |  4619.0340 |   3.1466
> >       4 |               8 |  1486.3155 |  4619.0340 |   3.1077
> >       4 |              16 |  1481.7150 |  4619.0340 |   3.1174
> >       8 |               0 |  9039.3810 |  8990.4735 |   0.9946
> >       8 |               1 |  4807.5880 |  8990.4735 |   1.8701
> >       8 |               2 |  3786.7620 |  8990.4735 |   2.3742
> >       8 |               4 |  2924.2205 |  8990.4735 |   3.0745
> >       8 |               8 |  2684.2545 |  8990.4735 |   3.3493
> >       8 |              16 |  2672.9800 |  8990.4735 |   3.3635
> >      16 |               0 | 17821.4715 | 17740.1300 |   0.9954
> >      16 |               1 |  9318.3810 | 17740.1300 |   1.9038
> >      16 |               2 |  7260.6315 | 17740.1300 |   2.4433
> >      16 |               4 |  5538.5225 | 17740.1300 |   3.2030
> >      16 |               8 |  5368.5255 | 17740.1300 |   3.3045
> >      16 |              16 |  5291.8510 | 17740.1300 |   3.3523
> > (36 rows)
>
> For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> almost the same. I suspect that the indexes are too-small and all
> the index pages were on memory and CPU is saturated. Maybe you
> had four cores and parallel workers more than the number had no
> effect.  Other normal backends should have been able do almost
> nothing meanwhile. Usually the number of parallel workers is
> determined so that IO capacity is filled up but this feature
> intermittently saturates CPU capacity very under such a
> situation.
>

I'm sorry I didn't make it clear enough. If the parallel degree is
higher than 'the number of indexes - 1' redundant workers are not
launched. So for indexes=4, 8, 16 the number of actually launched
parallel workers is up to 3, 7, 15 respectively. That's why the result
shows almost the same execution time in the cases where nindexes <=
parallel_degree.

I'll share the performance test result of more larger tables and indexes.

> I'm not sure, but what if we do index vacuum in one-tuple-by-one
> manner? That is, heap vacuum passes dead tuple one-by-one (or
> buffering few tuples) to workers and workers process it not by
> bulkdelete, but just tuple_delete (we don't have one). That could
> avoid the sleep time of heap-scan while index bulkdelete.
>

Just to be clear, in parallel lazy vacuum all parallel vacuum
processes including the leader process do index vacuuming, no one
doesn't sleep during index vacuuming. The leader process does heap
scan and launches parallel workers before index vacuuming. Each
processes exclusively processes indexes one by one.

Such index deletion method could be an optimization but I'm not sure
that the calling tuple_delete many times would be faster than one
bulkdelete. If there are many dead tuples vacuum has to call
tuple_delete as much as dead tuples. In general one seqscan is faster
than tons of indexscan. There is the proposal for such one by one
index deletions[1] but it's not a replacement of bulkdelete.

>
> > Attached the updated version patches. The patches apply to the current
> > HEAD cleanly but the 0001 patch still changes the vacuum option to a
> > Node since it's under the discussion. After the direction has been
> > decided, I'll update the patches.
>
> As for the to-be-or-not-to-be a node problem, I don't think it is
> needed but from the point of consistency, it seems reasonable and
> it is seen in other nodes that *Stmt Node holds option Node. But
> makeVacOpt and it's usage, and subsequent operations on the node
> look somewhat strange.. Why don't you just do
> "makeNode(VacuumOptions)"?

Thank you for the comment but this part has gone away as the recent
commit changed the grammar production of vacuum command.

>
>
> >+      /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> >+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);
>
> If I understand this correctly, nindexes is always > 1 there. At
> lesat asserted that > 0 there.
>
> >+      estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),
>
> I don't think the name is good. (dt menant detach by the first look for me..)

Fixed.

>
> >+        if (lps->nworkers_requested > 0)
> >+            appendStringInfo(&buf,
> >+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d, requested
%d)",
>
> "planned"?

The 'planned' shows how many parallel workers we planned to launch.
The degree of parallelism is determined based on either user request
or the number of indexes that the table has.

>
>
> >+        /* Get the next index to vacuum */
> >+        if (do_parallel)
> >+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
> >+        else
> >+            idx = nprocessed++;
>
> It seems that both of the two cases can be handled using
> LVParallelState and most of the branches by lps or do_parallel
> can be removed.
>

Sorry I couldn't get your comment. You meant to move nprocessed to
LVParallelState?

[1] https://www.postgresql.org/message-id/flat/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Kyotaro HORIGUCHI
Дата:
At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <> wrote in
<>
> > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > almost the same. I suspect that the indexes are too-small and all
> > the index pages were on memory and CPU is saturated. Maybe you
> > had four cores and parallel workers more than the number had no
> > effect.  Other normal backends should have been able do almost
> > nothing meanwhile. Usually the number of parallel workers is
> > determined so that IO capacity is filled up but this feature
> > intermittently saturates CPU capacity very under such a
> > situation.
> >
> 
> I'm sorry I didn't make it clear enough. If the parallel degree is
> higher than 'the number of indexes - 1' redundant workers are not
> launched. So for indexes=4, 8, 16 the number of actually launched
> parallel workers is up to 3, 7, 15 respectively. That's why the result
> shows almost the same execution time in the cases where nindexes <=
> parallel_degree.

In the 16 indexes case, the performance saturated at 4 workers
which contradicts to your explanation.

> I'll share the performance test result of more larger tables and indexes.
> 
> > I'm not sure, but what if we do index vacuum in one-tuple-by-one
> > manner? That is, heap vacuum passes dead tuple one-by-one (or
> > buffering few tuples) to workers and workers process it not by
> > bulkdelete, but just tuple_delete (we don't have one). That could
> > avoid the sleep time of heap-scan while index bulkdelete.
> >
> 
> Just to be clear, in parallel lazy vacuum all parallel vacuum
> processes including the leader process do index vacuuming, no one
> doesn't sleep during index vacuuming. The leader process does heap
> scan and launches parallel workers before index vacuuming. Each
> processes exclusively processes indexes one by one.

The leader doesn't continue heap-scan while index vacuuming is
running. And the index-page-scan seems eat up CPU easily. If
index vacuum can run simultaneously with the next heap scan
phase, we can make index scan finishes almost the same time with
the next round of heap scan. It would reduce the (possible) CPU
contention. But this requires as the twice size of shared
memoryas the current implement.

> Such index deletion method could be an optimization but I'm not sure
> that the calling tuple_delete many times would be faster than one
> bulkdelete. If there are many dead tuples vacuum has to call
> tuple_delete as much as dead tuples. In general one seqscan is faster
> than tons of indexscan. There is the proposal for such one by one
> index deletions[1] but it's not a replacement of bulkdelete.

I'm not sure what you mean by 'replacement' but it depends on how
large part of a table is removed at once. As mentioned in the
thread. But unfortunately it doesn't seem easy to do..

> > > Attached the updated version patches. The patches apply to the current
> > > HEAD cleanly but the 0001 patch still changes the vacuum option to a
> > > Node since it's under the discussion. After the direction has been
> > > decided, I'll update the patches.
> >
> > As for the to-be-or-not-to-be a node problem, I don't think it is
> > needed but from the point of consistency, it seems reasonable and
> > it is seen in other nodes that *Stmt Node holds option Node. But
> > makeVacOpt and it's usage, and subsequent operations on the node
> > look somewhat strange.. Why don't you just do
> > "makeNode(VacuumOptions)"?
> 
> Thank you for the comment but this part has gone away as the recent
> commit changed the grammar production of vacuum command.

Oops!


> > >+      /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> > >+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);
> >
> > If I understand this correctly, nindexes is always > 1 there. At
> > lesat asserted that > 0 there.
> >
> > >+      estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),
> >
> > I don't think the name is good. (dt menant detach by the first look for me..)
> 
> Fixed.
> 
> >
> > >+        if (lps->nworkers_requested > 0)
> > >+            appendStringInfo(&buf,
> > >+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d,
requested%d)",
 
> >
> > "planned"?
> 
> The 'planned' shows how many parallel workers we planned to launch.
> The degree of parallelism is determined based on either user request
> or the number of indexes that the table has.
> 
> >
> >
> > >+        /* Get the next index to vacuum */
> > >+        if (do_parallel)
> > >+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
> > >+        else
> > >+            idx = nprocessed++;
> >
> > It seems that both of the two cases can be handled using
> > LVParallelState and most of the branches by lps or do_parallel
> > can be removed.
> >
> 
> Sorry I couldn't get your comment. You meant to move nprocessed to
> LVParallelState?

Exactly. I meant letting lvshared points to private memory, but
it might introduce confusion.


> [1] https://www.postgresql.org/message-id/flat/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Mar 19, 2019 at 10:39 AM Haribabu Kommi
<> wrote:
>
>
> On Mon, Mar 18, 2019 at 1:58 PM Masahiko Sawada <> wrote:
>>
>> On Tue, Feb 26, 2019 at 7:20 PM Masahiko Sawada <> wrote:
>> >
>> > On Tue, Feb 26, 2019 at 1:35 PM Haribabu Kommi <> wrote:
>> > >
>> > > On Thu, Feb 14, 2019 at 9:17 PM Masahiko Sawada <> wrote:
>> > >>
>> > >> Thank you. Attached the rebased patch.
>> > >
>> > >
>> > > I ran some performance tests to compare the parallelism benefits,
>> >
>> > Thank you for testing!
>> >
>> > > but I got some strange results of performance overhead, may be it is
>> > > because, I tested it on my laptop.
>> >
>> > Hmm, I think the parallel vacuum would help for heavy workloads like a
>> > big table with multiple indexes. In your test result, all executions
>> > are completed within 1 sec, which seems to be one use case that the
>> > parallel vacuum wouldn't help. I suspect that the table is small,
>> > right? Anyway I'll also do performance tests.
>> >
>>
>> Here is the performance test results. I've setup a 500MB table with
>> several indexes and made 10% of table dirty before each vacuum.
>> Compared execution time of the patched postgrse with the current HEAD
>> (at 'speed_up' column). In my environment,
>>
>>  indexes | parallel_degree |  patched   |    head    | speed_up
>> ---------+-----------------+------------+------------+----------
>>       0 |               0 |   238.2085 |   244.7625 |   1.0275
>>       0 |               1 |   237.7050 |   244.7625 |   1.0297
>>       0 |               2 |   238.0390 |   244.7625 |   1.0282
>>       0 |               4 |   238.1045 |   244.7625 |   1.0280
>>       0 |               8 |   237.8995 |   244.7625 |   1.0288
>>       0 |              16 |   237.7775 |   244.7625 |   1.0294
>>       1 |               0 |  1328.8590 |  1334.9125 |   1.0046
>>       1 |               1 |  1325.9140 |  1334.9125 |   1.0068
>>       1 |               2 |  1333.3665 |  1334.9125 |   1.0012
>>       1 |               4 |  1329.5205 |  1334.9125 |   1.0041
>>       1 |               8 |  1334.2255 |  1334.9125 |   1.0005
>>       1 |              16 |  1335.1510 |  1334.9125 |   0.9998
>>       2 |               0 |  2426.2905 |  2427.5165 |   1.0005
>>       2 |               1 |  1416.0595 |  2427.5165 |   1.7143
>>       2 |               2 |  1411.6270 |  2427.5165 |   1.7197
>>       2 |               4 |  1411.6490 |  2427.5165 |   1.7196
>>       2 |               8 |  1410.1750 |  2427.5165 |   1.7214
>>       2 |              16 |  1413.4985 |  2427.5165 |   1.7174
>>       4 |               0 |  4622.5060 |  4619.0340 |   0.9992
>>       4 |               1 |  2536.8435 |  4619.0340 |   1.8208
>>       4 |               2 |  2548.3615 |  4619.0340 |   1.8126
>>       4 |               4 |  1467.9655 |  4619.0340 |   3.1466
>>       4 |               8 |  1486.3155 |  4619.0340 |   3.1077
>>       4 |              16 |  1481.7150 |  4619.0340 |   3.1174
>>       8 |               0 |  9039.3810 |  8990.4735 |   0.9946
>>       8 |               1 |  4807.5880 |  8990.4735 |   1.8701
>>       8 |               2 |  3786.7620 |  8990.4735 |   2.3742
>>       8 |               4 |  2924.2205 |  8990.4735 |   3.0745
>>       8 |               8 |  2684.2545 |  8990.4735 |   3.3493
>>       8 |              16 |  2672.9800 |  8990.4735 |   3.3635
>>      16 |               0 | 17821.4715 | 17740.1300 |   0.9954
>>      16 |               1 |  9318.3810 | 17740.1300 |   1.9038
>>      16 |               2 |  7260.6315 | 17740.1300 |   2.4433
>>      16 |               4 |  5538.5225 | 17740.1300 |   3.2030
>>      16 |               8 |  5368.5255 | 17740.1300 |   3.3045
>>      16 |              16 |  5291.8510 | 17740.1300 |   3.3523
>> (36 rows)
>
>
> The performance results are good. Do we want to add the recommended
> size in the document for the parallel option? the parallel option for smaller
> tables can lead to performance overhead.
>

Hmm, I don't think we can add the specific recommended size because
the performance gain by parallel lazy vacuum depends on various things
such as CPU cores, the number of indexes, shared buffer size, index
types, HDD or SSD. I suppose that users who want to use this option
have some sort of performance problem such as that vacuum takes a very
long time. They would use it for relatively larger tables.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI
<> wrote:
>
> At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <> wrote in
<>
> > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > > almost the same. I suspect that the indexes are too-small and all
> > > the index pages were on memory and CPU is saturated. Maybe you
> > > had four cores and parallel workers more than the number had no
> > > effect.  Other normal backends should have been able do almost
> > > nothing meanwhile. Usually the number of parallel workers is
> > > determined so that IO capacity is filled up but this feature
> > > intermittently saturates CPU capacity very under such a
> > > situation.
> > >
> >
> > I'm sorry I didn't make it clear enough. If the parallel degree is
> > higher than 'the number of indexes - 1' redundant workers are not
> > launched. So for indexes=4, 8, 16 the number of actually launched
> > parallel workers is up to 3, 7, 15 respectively. That's why the result
> > shows almost the same execution time in the cases where nindexes <=
> > parallel_degree.
>
> In the 16 indexes case, the performance saturated at 4 workers
> which contradicts to your explanation.

Because the machine I used has 4 cores the performance doesn't get
improved even if more than 4 parallel workers are launched.

>
> > I'll share the performance test result of more larger tables and indexes.
> >
> > > I'm not sure, but what if we do index vacuum in one-tuple-by-one
> > > manner? That is, heap vacuum passes dead tuple one-by-one (or
> > > buffering few tuples) to workers and workers process it not by
> > > bulkdelete, but just tuple_delete (we don't have one). That could
> > > avoid the sleep time of heap-scan while index bulkdelete.
> > >
> >
> > Just to be clear, in parallel lazy vacuum all parallel vacuum
> > processes including the leader process do index vacuuming, no one
> > doesn't sleep during index vacuuming. The leader process does heap
> > scan and launches parallel workers before index vacuuming. Each
> > processes exclusively processes indexes one by one.
>
> The leader doesn't continue heap-scan while index vacuuming is
> running. And the index-page-scan seems eat up CPU easily. If
> index vacuum can run simultaneously with the next heap scan
> phase, we can make index scan finishes almost the same time with
> the next round of heap scan. It would reduce the (possible) CPU
> contention. But this requires as the twice size of shared
> memoryas the current implement.

Yeah, I've considered that something like pipe-lining approach that
one process continue to queue the dead tuples and other process
fetches and processes them during index vacuuming but the current
version patch employed the most simple approach as the first step.
Once we had the retail index deletion approach we might be able to use
it for parallel vacuum.

>
> > Such index deletion method could be an optimization but I'm not sure
> > that the calling tuple_delete many times would be faster than one
> > bulkdelete. If there are many dead tuples vacuum has to call
> > tuple_delete as much as dead tuples. In general one seqscan is faster
> > than tons of indexscan. There is the proposal for such one by one
> > index deletions[1] but it's not a replacement of bulkdelete.
>
> I'm not sure what you mean by 'replacement' but it depends on how
> large part of a table is removed at once. As mentioned in the
> thread. But unfortunately it doesn't seem easy to do..
>
> > > > Attached the updated version patches. The patches apply to the current
> > > > HEAD cleanly but the 0001 patch still changes the vacuum option to a
> > > > Node since it's under the discussion. After the direction has been
> > > > decided, I'll update the patches.
> > >
> > > As for the to-be-or-not-to-be a node problem, I don't think it is
> > > needed but from the point of consistency, it seems reasonable and
> > > it is seen in other nodes that *Stmt Node holds option Node. But
> > > makeVacOpt and it's usage, and subsequent operations on the node
> > > look somewhat strange.. Why don't you just do
> > > "makeNode(VacuumOptions)"?
> >
> > Thank you for the comment but this part has gone away as the recent
> > commit changed the grammar production of vacuum command.
>
> Oops!
>
>
> > > >+      /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> > > >+    maxtuples = compute_max_dead_tuples(nblocks, nindexes > 0);
> > >
> > > If I understand this correctly, nindexes is always > 1 there. At
> > > lesat asserted that > 0 there.
> > >
> > > >+      estdt = MAXALIGN(add_size(sizeof(LVDeadTuples),
> > >
> > > I don't think the name is good. (dt menant detach by the first look for me..)
> >
> > Fixed.
> >
> > >
> > > >+        if (lps->nworkers_requested > 0)
> > > >+            appendStringInfo(&buf,
> > > >+                             ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d,
requested%d)",
 
> > >
> > > "planned"?
> >
> > The 'planned' shows how many parallel workers we planned to launch.
> > The degree of parallelism is determined based on either user request
> > or the number of indexes that the table has.
> >
> > >
> > >
> > > >+        /* Get the next index to vacuum */
> > > >+        if (do_parallel)
> > > >+            idx = pg_atomic_fetch_add_u32(&(lps->lvshared->nprocessed), 1);
> > > >+        else
> > > >+            idx = nprocessed++;
> > >
> > > It seems that both of the two cases can be handled using
> > > LVParallelState and most of the branches by lps or do_parallel
> > > can be removed.
> > >
> >
> > Sorry I couldn't get your comment. You meant to move nprocessed to
> > LVParallelState?
>
> Exactly. I meant letting lvshared points to private memory, but
> it might introduce confusion.

Hmm, I'm not sure it would be a good idea. It would introduce
confusion as you mentioned. And since 'nprocessed' have to be
pg_atomic_uint32 in parallel mode we will end up with having an
another branch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Kyotaro HORIGUCHI
Дата:
At Tue, 19 Mar 2019 19:01:06 +0900, Masahiko Sawada <> wrote in
<CAD21AoA3PpkcNNzcQmiNgFL3DudhdLRWoTvQE6=>
> On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI
> <> wrote:
> >
> > At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <> wrote in
<>
> > > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > > > almost the same. I suspect that the indexes are too-small and all
> > > > the index pages were on memory and CPU is saturated. Maybe you
> > > > had four cores and parallel workers more than the number had no
> > > > effect.  Other normal backends should have been able do almost
> > > > nothing meanwhile. Usually the number of parallel workers is
> > > > determined so that IO capacity is filled up but this feature
> > > > intermittently saturates CPU capacity very under such a
> > > > situation.
> > > >
> > >
> > > I'm sorry I didn't make it clear enough. If the parallel degree is
> > > higher than 'the number of indexes - 1' redundant workers are not
> > > launched. So for indexes=4, 8, 16 the number of actually launched
> > > parallel workers is up to 3, 7, 15 respectively. That's why the result
> > > shows almost the same execution time in the cases where nindexes <=
> > > parallel_degree.
> >
> > In the 16 indexes case, the performance saturated at 4 workers
> > which contradicts to your explanation.
> 
> Because the machine I used has 4 cores the performance doesn't get
> improved even if more than 4 parallel workers are launched.

That is what I mentioned in the cited phrases. Sorry for perhaps
hard-to-read phrases.. 

> >
> > > I'll share the performance test result of more larger tables and indexes.
> > >
> > > > I'm not sure, but what if we do index vacuum in one-tuple-by-one
> > > > manner? That is, heap vacuum passes dead tuple one-by-one (or
> > > > buffering few tuples) to workers and workers process it not by
> > > > bulkdelete, but just tuple_delete (we don't have one). That could
> > > > avoid the sleep time of heap-scan while index bulkdelete.
> > > >
> > >
> > > Just to be clear, in parallel lazy vacuum all parallel vacuum
> > > processes including the leader process do index vacuuming, no one
> > > doesn't sleep during index vacuuming. The leader process does heap
> > > scan and launches parallel workers before index vacuuming. Each
> > > processes exclusively processes indexes one by one.
> >
> > The leader doesn't continue heap-scan while index vacuuming is
> > running. And the index-page-scan seems eat up CPU easily. If
> > index vacuum can run simultaneously with the next heap scan
> > phase, we can make index scan finishes almost the same time with
> > the next round of heap scan. It would reduce the (possible) CPU
> > contention. But this requires as the twice size of shared
> > memoryas the current implement.
> 
> Yeah, I've considered that something like pipe-lining approach that
> one process continue to queue the dead tuples and other process
> fetches and processes them during index vacuuming but the current
> version patch employed the most simple approach as the first step.
> Once we had the retail index deletion approach we might be able to use
> it for parallel vacuum.

Ok, I understood the direction.

...
> > > Sorry I couldn't get your comment. You meant to move nprocessed to
> > > LVParallelState?
> >
> > Exactly. I meant letting lvshared points to private memory, but
> > it might introduce confusion.
> 
> Hmm, I'm not sure it would be a good idea. It would introduce
> confusion as you mentioned. And since 'nprocessed' have to be
> pg_atomic_uint32 in parallel mode we will end up with having an
> another branch.

Ok. Agreed. Thank you for the pacience.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

От
Kyotaro HORIGUCHI
Дата:
At Tue, 19 Mar 2019 17:51:32 +0900, Masahiko Sawada <> wrote in
<>
> On Tue, Mar 19, 2019 at 10:39 AM Haribabu Kommi
> <> wrote:
> > The performance results are good. Do we want to add the recommended
> > size in the document for the parallel option? the parallel option for smaller
> > tables can lead to performance overhead.
> >
> 
> Hmm, I don't think we can add the specific recommended size because
> the performance gain by parallel lazy vacuum depends on various things
> such as CPU cores, the number of indexes, shared buffer size, index
> types, HDD or SSD. I suppose that users who want to use this option
> have some sort of performance problem such as that vacuum takes a very
> long time. They would use it for relatively larger tables.

Agree that we have no recommended setting, but I strongly think that documentation on the downside or possible side
effectof this feature is required for those who are to use the feature.
 

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Mar 19, 2019 at 7:15 PM Kyotaro HORIGUCHI
<> wrote:
>
> At Tue, 19 Mar 2019 19:01:06 +0900, Masahiko Sawada <> wrote in
<CAD21AoA3PpkcNNzcQmiNgFL3DudhdLRWoTvQE6=>
> > On Tue, Mar 19, 2019 at 4:59 PM Kyotaro HORIGUCHI
> > <> wrote:
> > >
> > > At Tue, 19 Mar 2019 13:31:04 +0900, Masahiko Sawada <> wrote in
<>
> > > > > For indexes=4,8,16, the cases with parallel_degree=4,8,16 behave
> > > > > almost the same. I suspect that the indexes are too-small and all
> > > > > the index pages were on memory and CPU is saturated. Maybe you
> > > > > had four cores and parallel workers more than the number had no
> > > > > effect.  Other normal backends should have been able do almost
> > > > > nothing meanwhile. Usually the number of parallel workers is
> > > > > determined so that IO capacity is filled up but this feature
> > > > > intermittently saturates CPU capacity very under such a
> > > > > situation.
> > > > >
> > > >
> > > > I'm sorry I didn't make it clear enough. If the parallel degree is
> > > > higher than 'the number of indexes - 1' redundant workers are not
> > > > launched. So for indexes=4, 8, 16 the number of actually launched
> > > > parallel workers is up to 3, 7, 15 respectively. That's why the result
> > > > shows almost the same execution time in the cases where nindexes <=
> > > > parallel_degree.
> > >
> > > In the 16 indexes case, the performance saturated at 4 workers
> > > which contradicts to your explanation.
> >
> > Because the machine I used has 4 cores the performance doesn't get
> > improved even if more than 4 parallel workers are launched.
>
> That is what I mentioned in the cited phrases. Sorry for perhaps
> hard-to-read phrases..

I understood now. Thank you!


Attached the updated version patches incorporated all review comments.

Commit 6776142 changed the grammar production of vacuum command. This
patch adds PARALLEL option on top of the commit.

I realized that the commit 6776142 breaks indents in ExecVacuum() and
the including nodes/parsenodes.h is no longer needed. Sorry that's my
wrong. Attached the patch (vacuum_fix.patch) fixes them, although the
indent issue will be resolved by pgindent before releasing.

In parsing vacuum command, since only PARALLEL option can have an
argument I've added the check in ExecVacuum to erroring out when other
options have an argument. But it might be good to make other vacuum
options (perhaps except for DISABLE_PAGE_SKIPPING option) accept an
argument just like EXPLAIN command.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Mar 19, 2019 at 7:29 PM Kyotaro HORIGUCHI
<> wrote:
>
> At Tue, 19 Mar 2019 17:51:32 +0900, Masahiko Sawada <> wrote in
<>
> > On Tue, Mar 19, 2019 at 10:39 AM Haribabu Kommi
> > <> wrote:
> > > The performance results are good. Do we want to add the recommended
> > > size in the document for the parallel option? the parallel option for smaller
> > > tables can lead to performance overhead.
> > >
> >
> > Hmm, I don't think we can add the specific recommended size because
> > the performance gain by parallel lazy vacuum depends on various things
> > such as CPU cores, the number of indexes, shared buffer size, index
> > types, HDD or SSD. I suppose that users who want to use this option
> > have some sort of performance problem such as that vacuum takes a very
> > long time. They would use it for relatively larger tables.
>
> Agree that we have no recommended setting, but I strongly think that documentation on the downside or possible side
effectof this feature is required for those who are to use the feature.
 
>

I think that the side effect of parallel lazy vacuum would be to
consume more CPUs and I/O bandwidth, but which is also true for the
other utility command (i.e. parallel create index). The description of
max_parallel_maintenance_worker documents such things[1]. Anything
else to document?

[1] https://www.postgresql.org/docs/devel/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: [HACKERS] Block level parallel vacuum

От
Sergei Kornilov
Дата:
Hello

> * in_parallel is true if we're performing parallel lazy vacuum. Since any
> * updates are not allowed during parallel mode we don't update statistics
> * but set the index bulk-deletion result to *stats. Otherwise we update it
> * and set NULL.

lazy_cleanup_index has in_parallel argument only for this purpose, but caller still should check in_parallel after
lazy_cleanup_indexcall and do something else with stats for parallel execution.
 
Would be better always return stats and update statistics in caller? It's possible to update all index stats in
lazy_vacuum_all_indexesfor example? This routine is always parallel leader and has comment /* Do post-vacuum cleanup
andstatistics update for each index */ on for_cleanup=true call.
 

I think we need note in documentation that parallel leader is not counted in PARALLEL N option, so with PARALLEL 2
optionwe want use 3 processes. Or even change behavior? Default with PARALLEL 1 - only current backend in single
processis running, PARALLEL 2 - leader + one parallel worker, two processes works in parallel.
 

regards, Sergei


Re: [HACKERS] Block level parallel vacuum

От
Robert Haas
Дата:
On Tue, Mar 19, 2019 at 3:59 AM Kyotaro HORIGUCHI
<> wrote:
> The leader doesn't continue heap-scan while index vacuuming is
> running. And the index-page-scan seems eat up CPU easily. If
> index vacuum can run simultaneously with the next heap scan
> phase, we can make index scan finishes almost the same time with
> the next round of heap scan. It would reduce the (possible) CPU
> contention. But this requires as the twice size of shared
> memoryas the current implement.

I think you're approaching this from the wrong point of view.  If we
have a certain amount of memory available, is it better to (a) fill
the entire thing with dead tuples once, or (b) better to fill half of
it with dead tuples, start index vacuuming, and then fill the other
half of it with dead tuples for the next index-vacuum cycle while the
current one is running?  I think the answer is that (a) is clearly
better, because it results in half as many index vacuum cycles.

We can't really ask the user how much memory it's OK to use and then
use twice as much.  But if we could, what you're proposing here is
probably still not the right way to use it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

От
Robert Haas
Дата:
On Tue, Mar 19, 2019 at 7:26 AM Masahiko Sawada <> wrote:
> In parsing vacuum command, since only PARALLEL option can have an
> argument I've added the check in ExecVacuum to erroring out when other
> options have an argument. But it might be good to make other vacuum
> options (perhaps except for DISABLE_PAGE_SKIPPING option) accept an
> argument just like EXPLAIN command.

I think all of the existing options, including DISABLE_PAGE_SKIPPING,
should permit an argument that is passed to defGetBoolean().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Mar 22, 2019 at 4:53 AM Robert Haas <> wrote:
>
> On Tue, Mar 19, 2019 at 7:26 AM Masahiko Sawada <> wrote:
> > In parsing vacuum command, since only PARALLEL option can have an
> > argument I've added the check in ExecVacuum to erroring out when other
> > options have an argument. But it might be good to make other vacuum
> > options (perhaps except for DISABLE_PAGE_SKIPPING option) accept an
> > argument just like EXPLAIN command.
>
> I think all of the existing options, including DISABLE_PAGE_SKIPPING,
> should permit an argument that is passed to defGetBoolean().
>

Agreed. The attached 0001 patch changes so.

On Thu, Mar 21, 2019 at 8:05 PM Sergei Kornilov <> wrote:
>
> Hello
>

Thank you for reviewing the patch!

> > * in_parallel is true if we're performing parallel lazy vacuum. Since any
> > * updates are not allowed during parallel mode we don't update statistics
> > * but set the index bulk-deletion result to *stats. Otherwise we update it
> > * and set NULL.
>
> lazy_cleanup_index has in_parallel argument only for this purpose, but caller still should check in_parallel after
lazy_cleanup_indexcall and do something else with stats for parallel execution. 
> Would be better always return stats and update statistics in caller? It's possible to update all index stats in
lazy_vacuum_all_indexesfor example? This routine is always parallel leader and has comment /* Do post-vacuum cleanup
andstatistics update for each index */ on for_cleanup=true call. 

Agreed. I've changed the patch so that we update index statistics in
lazy_vacuum_all_indexes().

>
> I think we need note in documentation that parallel leader is not counted in PARALLEL N option, so with PARALLEL 2
optionwe want use 3 processes. Or even change behavior? Default with PARALLEL 1 - only current backend in single
processis running, PARALLEL 2 - leader + one parallel worker, two processes works in parallel. 
>

Hmm, the documentation says "Perform vacuum index and cleanup index
phases of VACUUM in parallel using N background workers". Doesn't it
already explain that?

Attached the updated version patch. 0001 patch allows all existing
vacuum options an boolean argument. 0002 patch introduces parallel
lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb
command.



Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Haribabu Kommi
Дата:

On Fri, Mar 22, 2019 at 4:06 PM Masahiko Sawada <> wrote:

Attached the updated version patch. 0001 patch allows all existing
vacuum options an boolean argument. 0002 patch introduces parallel
lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb
command.

Thanks for sharing the updated patches.

0001 patch:

+    PARALLEL [ <replaceable class="parameter">N</replaceable> ]

But this patch contains syntax of PARALLEL but no explanation, I saw that
it is explained in 0002. It is not a problem, but just mentioning.

+      Specifies parallel degree for <literal>PARALLEL</literal> option. The
+      value must be at least 1. If the parallel degree
+      <replaceable class="parameter">integer</replaceable> is omitted, then
+      <command>VACUUM</command> decides the number of workers based on number of
+      indexes on the relation which further limited by
+      <xref linkend="guc-max-parallel-workers-maintenance"/>.

Can we add some more details about backend participation also, parallel workers will
come into picture only when there are 2 indexes in the table.

+ /*
+ * Do post-vacuum cleanup and statistics update for each index if
+ * we're not in parallel lazy vacuum. If in parallel lazy vacuum, do
+ * only post-vacum cleanup and then update statistics after exited
+ * from parallel mode.
+ */
+ lazy_vacuum_all_indexes(vacrelstats, Irel, nindexes, indstats,
+ lps, true);

How about renaming the above function, as it does the cleanup also?
lazy_vacuum_or_cleanup_all_indexes?


+ if (!IsInParallelVacuum(lps))
+ {
+ /*
+ * Update index statistics. If in parallel lazy vacuum, we will
+ * update them after exited from parallel mode.
+ */
+ lazy_update_index_statistics(Irel[idx], stats[idx]);
+
+ if (stats[idx])
+ pfree(stats[idx]);
+ }

The above check in lazy_vacuum_all_indexes can be combined it with the outer
if check where the memcpy is happening. I still feel that the logic around the stats
makes it little bit complex.

+ if (IsParallelWorker())
+ msg = "scanned index \"%s\" to remove %d row versions by parallel vacuum worker";
+ else
+ msg = "scanned index \"%s\" to remove %d row versions";

I feel, this way of error message may not be picked for the translations.
Is there any problem if we duplicate the entire ereport message with changed message?

+ for (i = 0; i < nindexes; i++)
+ {
+ LVIndStats *s = &(copied_indstats[i]);
+
+ if (s->updated)
+ lazy_update_index_statistics(Irel[i], &(s->stats));
+ }
+
+ pfree(copied_indstats);

why can't we use the shared memory directly to update the stats once all the workers
are finished, instead of copying them to a local memory?

+ tab->at_params.nworkers = 0; /* parallel lazy autovacuum is not supported */

User is not required to provide workers number compulsory even that parallel vacuum can
work, so just setting the above parameters doesn't stop the parallel workers, user must
pass the PARALLEL option also. So mentioning that also will be helpful later when we
start supporting it or some one who is reading the code can understand.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

От
Kyotaro HORIGUCHI
Дата:
Hello.

At Thu, 21 Mar 2019 15:51:40 -0400, Robert Haas <> wrote in
<CA+TgmobkRtLb5frmEF5t9U=>
> On Tue, Mar 19, 2019 at 3:59 AM Kyotaro HORIGUCHI
> <> wrote:
> > The leader doesn't continue heap-scan while index vacuuming is
> > running. And the index-page-scan seems eat up CPU easily. If
> > index vacuum can run simultaneously with the next heap scan
> > phase, we can make index scan finishes almost the same time with
> > the next round of heap scan. It would reduce the (possible) CPU
> > contention. But this requires as the twice size of shared
> > memoryas the current implement.
> 
> I think you're approaching this from the wrong point of view.  If we
> have a certain amount of memory available, is it better to (a) fill
> the entire thing with dead tuples once, or (b) better to fill half of
> it with dead tuples, start index vacuuming, and then fill the other
> half of it with dead tuples for the next index-vacuum cycle while the
> current one is running?  I think the answer is that (a) is clearly

Sure.

> better, because it results in half as many index vacuum cycles.

The "problem" I see there is it stops heap scanning on the leader
process.  The leader cannot start the heap scan until the index
scan on workers end.

The heap scan is expected not to stop by the half-and-half
stratregy especially when the whole index pages are on
memory. But it is not always the case, of course.

> We can't really ask the user how much memory it's OK to use and then
> use twice as much.  But if we could, what you're proposing here is
> probably still not the right way to use it.

Yes. I thought that I wrote that with such implication. "requires
as the twice size" has negative implications as you wrote above.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

От
Kyotaro HORIGUCHI
Дата:
Hello. I forgot to mention a point.

At Fri, 22 Mar 2019 14:02:36 +0900, Masahiko Sawada <> wrote in
<>
> Attached the updated version patch. 0001 patch allows all existing
> vacuum options an boolean argument. 0002 patch introduces parallel
> lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb
> command.

> +    if (IsParallelWorker())
> +        msg = "scanned index \"%s\" to remove %d row versions by parallel vacuum worker";
> +    else
> +        msg = "scanned index \"%s\" to remove %d row versions";
>      ereport(elevel,
> -            (errmsg("scanned index \"%s\" to remove %d row versions",
> +            (errmsg(msg,
>                      RelationGetRelationName(indrel),
> -                    vacrelstats->num_dead_tuples),
> +                    dead_tuples->num_tuples),

The msg prevents NLS from working. Please enclose the right-hand
literals by gettext_noop().

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

От
Robert Haas
Дата:
On Tue, Mar 26, 2019 at 10:31 AM Masahiko Sawada <> wrote:
> Thank you for reviewing the patch.

I don't think the approach in v20-0001 is quite right.

         if (strcmp(opt->defname, "verbose") == 0)
-            params.options |= VACOPT_VERBOSE;
+            params.options |= defGetBoolean(opt) ? VACOPT_VERBOSE : 0;

It seems to me that it would be better to do declare a separate
boolean for each flag at the top; e.g. bool verbose.  Then here do
verbose = defGetBoolean(opt).  And then after the loop do
params.options = (verbose ? VACOPT_VERBOSE : 0) | ... similarly for
other options.

The thing I don't like about the way you have it here is that it's not
going to work well for options that are true by default but can
optionally be set to false.  In that case, you would need to start
with the bit set and then clear it, but |= can only set bits, not
clear them.  I went and looked at the VACUUM (INDEX_CLEANUP) patch on
the other thread and it doesn't have any special handling for that
case, which makes me suspect that if you use that patch, the reloption
works as expected but VACUUM (INDEX_CLEANUP false) doesn't actually
succeed in disabling index cleanup.  The structure I suggested above
would fix that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Mar 29, 2019 at 4:53 AM Robert Haas <> wrote:
>
> On Tue, Mar 26, 2019 at 10:31 AM Masahiko Sawada <> wrote:
> > Thank you for reviewing the patch.
>
> I don't think the approach in v20-0001 is quite right.
>
>          if (strcmp(opt->defname, "verbose") == 0)
> -            params.options |= VACOPT_VERBOSE;
> +            params.options |= defGetBoolean(opt) ? VACOPT_VERBOSE : 0;
>
> It seems to me that it would be better to do declare a separate
> boolean for each flag at the top; e.g. bool verbose.  Then here do
> verbose = defGetBoolean(opt).  And then after the loop do
> params.options = (verbose ? VACOPT_VERBOSE : 0) | ... similarly for
> other options.
>
> The thing I don't like about the way you have it here is that it's not
> going to work well for options that are true by default but can
> optionally be set to false.  In that case, you would need to start
> with the bit set and then clear it, but |= can only set bits, not
> clear them.  I went and looked at the VACUUM (INDEX_CLEANUP) patch on
> the other thread and it doesn't have any special handling for that
> case, which makes me suspect that if you use that patch, the reloption
> works as expected but VACUUM (INDEX_CLEANUP false) doesn't actually
> succeed in disabling index cleanup.  The structure I suggested above
> would fix that.
>

You're right, the previous patches are wrong. Attached the updated
version patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Robert Haas
Дата:
On Thu, Mar 28, 2019 at 10:27 PM Masahiko Sawada <> wrote:
> You're right, the previous patches are wrong. Attached the updated
> version patches.

0001 looks good now.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Mar 29, 2019 at 9:28 PM Robert Haas <> wrote:
>
> On Thu, Mar 28, 2019 at 10:27 PM Masahiko Sawada <> wrote:
> > You're right, the previous patches are wrong. Attached the updated
> > version patches.
>
> 0001 looks good now.  Committed.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Mar 29, 2019 at 11:26 AM Masahiko Sawada <> wrote:
>
> On Fri, Mar 29, 2019 at 4:53 AM Robert Haas <> wrote:
> >
> > On Tue, Mar 26, 2019 at 10:31 AM Masahiko Sawada <> wrote:
> > > Thank you for reviewing the patch.
> >
> > I don't think the approach in v20-0001 is quite right.
> >
> >          if (strcmp(opt->defname, "verbose") == 0)
> > -            params.options |= VACOPT_VERBOSE;
> > +            params.options |= defGetBoolean(opt) ? VACOPT_VERBOSE : 0;
> >
> > It seems to me that it would be better to do declare a separate
> > boolean for each flag at the top; e.g. bool verbose.  Then here do
> > verbose = defGetBoolean(opt).  And then after the loop do
> > params.options = (verbose ? VACOPT_VERBOSE : 0) | ... similarly for
> > other options.
> >
> > The thing I don't like about the way you have it here is that it's not
> > going to work well for options that are true by default but can
> > optionally be set to false.  In that case, you would need to start
> > with the bit set and then clear it, but |= can only set bits, not
> > clear them.  I went and looked at the VACUUM (INDEX_CLEANUP) patch on
> > the other thread and it doesn't have any special handling for that
> > case, which makes me suspect that if you use that patch, the reloption
> > works as expected but VACUUM (INDEX_CLEANUP false) doesn't actually
> > succeed in disabling index cleanup.  The structure I suggested above
> > would fix that.
> >
>
> You're right, the previous patches are wrong. Attached the updated
> version patches.
>

These patches conflict with the current HEAD. Attached the updated patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Robert Haas
Дата:
On Thu, Apr 4, 2019 at 6:28 AM Masahiko Sawada <> wrote:
> These patches conflict with the current HEAD. Attached the updated patches.

They'll need another rebase.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Apr 5, 2019 at 4:51 AM Robert Haas <> wrote:
>
> On Thu, Apr 4, 2019 at 6:28 AM Masahiko Sawada <> wrote:
> > These patches conflict with the current HEAD. Attached the updated patches.
>
> They'll need another rebase.
>

Thank you for the notice. Rebased.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Kyotaro HORIGUCHI
Дата:
Thank you for the rebased version.

At Fri, 5 Apr 2019 13:59:36 +0900, Masahiko Sawada <> wrote in
<>
> Thank you for the notice. Rebased.

+    <term><replaceable class="parameter">integer</replaceable></term>
+    <listitem>
+     <para>
+      Specifies parallel degree for <literal>PARALLEL</literal> option. The
+      value must be at least 1. If the parallel degree
+      <replaceable class="parameter">integer</replaceable> is omitted, then
+      <command>VACUUM</command> decides the number of workers based on number of
+      indexes on the relation which further limited by
+      <xref linkend="guc-max-parallel-workers-maintenance"/>.
+     </para>
+    </listitem>
+   </varlistentry>

I'm quite confused to see this. I suppose the <para> should be a
description about <integer> parameters. Actually the existing
<boolean> entry is describing the boolean itself.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Apr 5, 2019 at 3:47 PM Kyotaro HORIGUCHI
<> wrote:
>
> Thank you for the rebased version.
>
> At Fri, 5 Apr 2019 13:59:36 +0900, Masahiko Sawada <> wrote in
<>
> > Thank you for the notice. Rebased.
>
> +    <term><replaceable class="parameter">integer</replaceable></term>
> +    <listitem>
> +     <para>
> +      Specifies parallel degree for <literal>PARALLEL</literal> option. The
> +      value must be at least 1. If the parallel degree
> +      <replaceable class="parameter">integer</replaceable> is omitted, then
> +      <command>VACUUM</command> decides the number of workers based on number of
> +      indexes on the relation which further limited by
> +      <xref linkend="guc-max-parallel-workers-maintenance"/>.
> +     </para>
> +    </listitem>
> +   </varlistentry>
>

Thank you for reviewing the patch.

> I'm quite confused to see this. I suppose the <para> should be a
> description about <integer> parameters. Actually the existing
> <boolean> entry is describing the boolean itself.
>

Indeed. How about the following description?

PARALLEL
Perform vacuum index and cleanup index phases of VACUUM in parallel
using integer background workers (for the detail of each vacuum
phases, please refer to Table 27.25). If the parallel degree integer
is omitted, then VACUUM decides the number of workers based on number
of indexes on the relation which further limited by
max_parallel_maintenance_workers. Only one worker can be used per
index. So parallel workers are launched only when there are at least 2
indexes in the table. Workers for vacuum are launched before starting
each phases and exit at the end of the phase. These behaviors might
change in a future release. This option can not use with FULL option.

integer
Specifies a positive integer value passed to the selected option. The
integer value can also be omitted, in which case the default value of
the selected option is used.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Apr 5, 2019 at 4:10 PM Masahiko Sawada <> wrote:
>
> On Fri, Apr 5, 2019 at 3:47 PM Kyotaro HORIGUCHI
> <> wrote:
> >
> > Thank you for the rebased version.
> >
> > At Fri, 5 Apr 2019 13:59:36 +0900, Masahiko Sawada <> wrote in
<>
> > > Thank you for the notice. Rebased.
> >
> > +    <term><replaceable class="parameter">integer</replaceable></term>
> > +    <listitem>
> > +     <para>
> > +      Specifies parallel degree for <literal>PARALLEL</literal> option. The
> > +      value must be at least 1. If the parallel degree
> > +      <replaceable class="parameter">integer</replaceable> is omitted, then
> > +      <command>VACUUM</command> decides the number of workers based on number of
> > +      indexes on the relation which further limited by
> > +      <xref linkend="guc-max-parallel-workers-maintenance"/>.
> > +     </para>
> > +    </listitem>
> > +   </varlistentry>
> >
>
> Thank you for reviewing the patch.
>
> > I'm quite confused to see this. I suppose the <para> should be a
> > description about <integer> parameters. Actually the existing
> > <boolean> entry is describing the boolean itself.
> >
>
> Indeed. How about the following description?
>

Attached the updated version patches.
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Kyotaro HORIGUCHI
Дата:
Hello.

# Is this still living? I changed the status to "needs review"

At Sat, 6 Apr 2019 06:47:32 +0900, Masahiko Sawada <> wrote in
<CAD21AoAuD3txrxucnVtM6NGo=>
> > Indeed. How about the following description?
> >
> 
> Attached the updated version patches.

Thanks.

heapam.h is including access/parallel.h but the file doesn't use
parallel.h stuff and storage/shm_toc.h and storage/dsm.h are
enough.

+ * DSM keys for parallel lazy vacuum. Since we don't need to worry about DSM
+ * keys conflicting with plan_node_id we can use small integers.

Yeah, this is right, but "plan_node_id" seems abrupt
there. Please prepend "differently from parallel execution code"
or .. I think no excuse is needed to use that numbers. The
executor code is already making an excuse for the large numbers
as unusual instead.

+ * Macro to check if we in a parallel lazy vacuum. If true, we're in parallel
+ * mode and prepared the DSM segments.
+ */
+#define IsInParallelVacuum(lps) (((LVParallelState *) (lps)) != NULL)

we *are* in?

The name "IsInParallleVacuum()" looks (to me) like suggesting
"this process is a parallel vacuum worker".  How about
ParallelVacuumIsActive?


+typedef struct LVIndStats
+typedef struct LVDeadTuples
+typedef struct LVShared
+typedef struct LVParallelState

The names are confusing, and the name LVShared is too
generic. Shared-only structs are better to be marked in the name.
That is, maybe it would be better that LVIndStats were
LVSharedIndStats and LVShared were LVSharedRelStats.

It might be better that LVIndStats were moved out from LVShared,
but I'm not confident.

+static void
+lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel
...
+    lazy_begin_parallel_index_vacuum(lps, vacrelstats, for_cleanup);
...
+    do_parallel_vacuum_or_cleanup_indexes(Irel, nindexes, stats,
+                                  lps->lvshared, vacrelstats->dead_tuples);
...
+    lazy_end_parallel_index_vacuum(lps, !for_cleanup);

The function takes the parameter for_cleanup, but the flag is
used by the three subfunctions in utterly ununified way. It seems
to me useless to store for_cleanup in lvshared and lazy_end is
rather confusing. There's no explanation why "reinitialization"
== "!for_cleanup". In the first place,
lazy_begin_parallel_index_vacuum and
lazy_end_parallel_index_vacuum are called only from the function
and rather short so it doesn't seem reasonable that the are
independend functions.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Mon, Apr 8, 2019 at 7:25 PM Kyotaro HORIGUCHI
<> wrote:
>
> Hello.
>
> # Is this still living? I changed the status to "needs review"
>
> At Sat, 6 Apr 2019 06:47:32 +0900, Masahiko Sawada <> wrote in
<CAD21AoAuD3txrxucnVtM6NGo=>
> > > Indeed. How about the following description?
> > >
> >
> > Attached the updated version patches.
>
> Thanks.
>

Thank you for reviewing the patch!

> heapam.h is including access/parallel.h but the file doesn't use
> parallel.h stuff and storage/shm_toc.h and storage/dsm.h are
> enough.

Fixed.

>
> + * DSM keys for parallel lazy vacuum. Since we don't need to worry about DSM
> + * keys conflicting with plan_node_id we can use small integers.
>
> Yeah, this is right, but "plan_node_id" seems abrupt
> there. Please prepend "differently from parallel execution code"
> or .. I think no excuse is needed to use that numbers. The
> executor code is already making an excuse for the large numbers
> as unusual instead.

Fixed.

>
> + * Macro to check if we in a parallel lazy vacuum. If true, we're in parallel
> + * mode and prepared the DSM segments.
> + */
> +#define IsInParallelVacuum(lps) (((LVParallelState *) (lps)) != NULL)
>
> we *are* in?

Fixed.

>
> The name "IsInParallleVacuum()" looks (to me) like suggesting
> "this process is a parallel vacuum worker".  How about
> ParallelVacuumIsActive?

Fixed.

>
>
> +typedef struct LVIndStats
> +typedef struct LVDeadTuples
> +typedef struct LVShared
> +typedef struct LVParallelState
>
> The names are confusing, and the name LVShared is too
> generic. Shared-only structs are better to be marked in the name.
> That is, maybe it would be better that LVIndStats were
> LVSharedIndStats and LVShared were LVSharedRelStats.

Hmm, LVShared actually stores also various things that are not
relevant with the relation. I'm not sure that's a good idea to rename
it to LVSharedRelStats. When we support parallel vacuum for other
vacuum steps the adding a struct for storing only relation statistics
might work well.

>
> It might be better that LVIndStats were moved out from LVShared,
> but I'm not confident.
>
> +static void
> +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel
> ...
> +       lazy_begin_parallel_index_vacuum(lps, vacrelstats, for_cleanup);
> ...
> +       do_parallel_vacuum_or_cleanup_indexes(Irel, nindexes, stats,
> +                                  lps->lvshared, vacrelstats->dead_tuples);
> ...
> +       lazy_end_parallel_index_vacuum(lps, !for_cleanup);
>
> The function takes the parameter for_cleanup, but the flag is
> used by the three subfunctions in utterly ununified way. It seems
> to me useless to store for_cleanup in lvshared

I think that we need to store for_cleanup or a something telling
vacuum workers to do either index vacuuming or index cleanup in
lvshared. Or can we use another thing instead?

>  and lazy_end is
> rather confusing.

Ah, I used "lazy" as prefix of function in vacuumlazy.c. Fixed.

> There's no explanation why "reinitialization"
> == "!for_cleanup".  In the first place,
> lazy_begin_parallel_index_vacuum and
> lazy_end_parallel_index_vacuum are called only from the function
> and rather short so it doesn't seem reasonable that the are
> independend functions.

Okay agreed, fixed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Wed, Apr 10, 2019 at 2:19 PM Masahiko Sawada <> wrote:
>
> On Mon, Apr 8, 2019 at 7:25 PM Kyotaro HORIGUCHI
> <> wrote:
> >
> > Hello.
> >
> > # Is this still living? I changed the status to "needs review"
> >
> > At Sat, 6 Apr 2019 06:47:32 +0900, Masahiko Sawada <> wrote in
<CAD21AoAuD3txrxucnVtM6NGo=>
> > > > Indeed. How about the following description?
> > > >
> > >
> > > Attached the updated version patches.
> >
> > Thanks.
> >
>
> Thank you for reviewing the patch!
>
> > heapam.h is including access/parallel.h but the file doesn't use
> > parallel.h stuff and storage/shm_toc.h and storage/dsm.h are
> > enough.
>
> Fixed.
>
> >
> > + * DSM keys for parallel lazy vacuum. Since we don't need to worry about DSM
> > + * keys conflicting with plan_node_id we can use small integers.
> >
> > Yeah, this is right, but "plan_node_id" seems abrupt
> > there. Please prepend "differently from parallel execution code"
> > or .. I think no excuse is needed to use that numbers. The
> > executor code is already making an excuse for the large numbers
> > as unusual instead.
>
> Fixed.
>
> >
> > + * Macro to check if we in a parallel lazy vacuum. If true, we're in parallel
> > + * mode and prepared the DSM segments.
> > + */
> > +#define IsInParallelVacuum(lps) (((LVParallelState *) (lps)) != NULL)
> >
> > we *are* in?
>
> Fixed.
>
> >
> > The name "IsInParallleVacuum()" looks (to me) like suggesting
> > "this process is a parallel vacuum worker".  How about
> > ParallelVacuumIsActive?
>
> Fixed.
>
> >
> >
> > +typedef struct LVIndStats
> > +typedef struct LVDeadTuples
> > +typedef struct LVShared
> > +typedef struct LVParallelState
> >
> > The names are confusing, and the name LVShared is too
> > generic. Shared-only structs are better to be marked in the name.
> > That is, maybe it would be better that LVIndStats were
> > LVSharedIndStats and LVShared were LVSharedRelStats.
>
> Hmm, LVShared actually stores also various things that are not
> relevant with the relation. I'm not sure that's a good idea to rename
> it to LVSharedRelStats. When we support parallel vacuum for other
> vacuum steps the adding a struct for storing only relation statistics
> might work well.
>
> >
> > It might be better that LVIndStats were moved out from LVShared,
> > but I'm not confident.
> >
> > +static void
> > +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel
> > ...
> > +       lazy_begin_parallel_index_vacuum(lps, vacrelstats, for_cleanup);
> > ...
> > +       do_parallel_vacuum_or_cleanup_indexes(Irel, nindexes, stats,
> > +                                  lps->lvshared, vacrelstats->dead_tuples);
> > ...
> > +       lazy_end_parallel_index_vacuum(lps, !for_cleanup);
> >
> > The function takes the parameter for_cleanup, but the flag is
> > used by the three subfunctions in utterly ununified way. It seems
> > to me useless to store for_cleanup in lvshared
>
> I think that we need to store for_cleanup or a something telling
> vacuum workers to do either index vacuuming or index cleanup in
> lvshared. Or can we use another thing instead?
>
> >  and lazy_end is
> > rather confusing.
>
> Ah, I used "lazy" as prefix of function in vacuumlazy.c. Fixed.
>
> > There's no explanation why "reinitialization"
> > == "!for_cleanup".  In the first place,
> > lazy_begin_parallel_index_vacuum and
> > lazy_end_parallel_index_vacuum are called only from the function
> > and rather short so it doesn't seem reasonable that the are
> > independend functions.
>
> Okay agreed, fixed.
>

Since the previous version patch conflicts with current HEAD, I've
attached the updated version patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Sergei Kornilov
Дата:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Hello

I reviewed v25 patches and have just a few notes.

missed synopsis for "PARALLEL" option (<synopsis> block in doc/src/sgml/ref/vacuum.sgml )
missed prototype for vacuum_log_cleanup_info in "non-export function prototypes"

>    /*
>     * Do post-vacuum cleanup, and statistics update for each index if
>     * we're not in parallel lazy vacuum. If in parallel lazy vacuum, do
>     * only post-vacum cleanup and update statistics at the end of parallel
>     * lazy vacuum.
>     */
>    if (vacrelstats->useindex)
>        lazy_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
>                                       indstats, lps, true);
>
>    if (ParallelVacuumIsActive(lps))
>    {
>        /* End parallel mode and update index statistics */
>        end_parallel_vacuum(lps, Irel, nindexes);
>    }

I personally do not like update statistics in different places.
Can we change lazy_vacuum_or_cleanup_indexes to writing stats for both parallel and non-parallel cases? I means
somethinglike this:
 

>    if (ParallelVacuumIsActive(lps))
>    {
>        /* Do parallel index vacuuming or index cleanup */
>        lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel,
>                                                nindexes, stats,
>                                                lps, for_cleanup);
>        if (for_cleanup)
>        {
>            ...
>            for (i = 0; i < nindexes; i++)
>                lazy_update_index_statistics(...);
>        }
>        return;
>    }

So all lazy_update_index_statistics would be in one place. lazy_parallel_vacuum_or_cleanup_indexes is called only from
parallelleader and waits for all workers. Possible we can update stats in lazy_parallel_vacuum_or_cleanup_indexes after
WaitForParallelWorkersToFinishcall.
 

Also discussion question: vacuumdb parameters --parallel= and --jobs= will confuse users? We need more description for
thisoptions?
 

regards, Sergei

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <> wrote:
>
> Since the previous version patch conflicts with current HEAD, I've
> attached the updated version patches.
>

Review comments:
------------------------------
*
      indexes on the relation which further limited by
+      <xref linkend="guc-max-parallel-workers-maintenance"/>.

/which further/which is further

*
+ * index vacuuming or index cleanup, we launch parallel worker processes. Once
+ * all indexes are processed the parallel worker processes exit and the leader
+ * process re-initializes the DSM segment while keeping recorded dead tuples.

It is not clear for this comment why it re-initializes the DSM segment
instead of destroying it once the index work is done by workers.  Can
you elaborate a bit more in the comment?

*
+ * Note that all parallel workers live during one either index vacuuming or

It seems usage of 'one' is not required in the above sentence.

*
+
+/*
+ * Compute the number of parallel worker process to request.

/process/processes

*
+static int
+compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
+{
+ int parallel_workers = 0;
+
+ Assert(nrequested >= 0);
+
+ if (nindexes <= 1)
+ return 0;

I think here, in the beginning, you can also check if
max_parallel_maintenance_workers are 0, then return.

*
In function compute_parallel_workers, don't we want to cap the number
of workers based on maintenance_work_mem as we do in
plan_create_index_workers?

The basic point is how do we want to treat maintenance_work_mem for
this feature.  Do we want all workers to use at max the
maintenance_work_mem or each worker is allowed to use
maintenance_work_mem?  I would prefer earlier unless we have good
reason to follow the later strategy.

Accordingly, we might need to update the below paragraph in docs:
"Note that parallel utility commands should not consume substantially
more memory than equivalent non-parallel operations.  This strategy
differs from that of parallel query, where resource limits generally
apply per worker process.  Parallel utility commands treat the
resource limit <varname>maintenance_work_mem</varname> as a limit to
be applied to the entire utility command, regardless of the number of
parallel worker processes."

*
+static int
+compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
+{
+ int parallel_workers = 0;
+
+ Assert(nrequested >= 0);
+
+ if (nindexes <= 1)
+ return 0;
+
+ if (nrequested > 0)
+ {
+ /* At least one index is taken by the leader process */
+ parallel_workers = Min(nrequested, nindexes - 1);
+ }

I think here we always allow the leader to participate.  It seems to
me we have some way to disable leader participation.  During the
development of previous parallel operations, we find it quite handy to
catch bugs. We might want to mimic what has been done for index with
DISABLE_LEADER_PARTICIPATION.

*
+/*
+ * DSM keys for parallel lazy vacuum. Unlike other parallel execution code,
+ * since we don't need to worry about DSM keys conflicting with plan_node_id
+ * we can use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3

I think it would be better if these keys should be assigned numbers in
a way we do for other similar operation like create index.  See below
defines
in code:
/* Magic numbers for parallel state sharing */
#define PARALLEL_KEY_BTREE_SHARED UINT64CONST(0xA000000000000001)

This will make the code consistent with other parallel operations.

*
+begin_parallel_vacuum(LVRelStats *vacrelstats, Oid relid, BlockNumber nblocks,
+   int nindexes, int nrequested)
{
..
+ est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
..
}

I think here you should use SizeOfLVDeadTuples as defined by patch.

*
+ keys++;
+
+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
+ maxtuples = compute_max_dead_tuples(nblocks, true);
+ est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
+    mul_size(sizeof(ItemPointerData), maxtuples)));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
+ keys++;
+
+ shm_toc_estimate_keys(&pcxt->estimator, keys);
+
+ /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);

The code style looks inconsistent here.  In some cases, you are
calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
and in other cases, you are accumulating keys.  I think it is better
to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
in all cases.

*
+void
+heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
{
..
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
..
}

I think the last parameter in shm_toc_lookup should be false.  Is
there a reason for passing it as true?

*
+void
+heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
..
+ /* Open table */
+ onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock);
..
}

I don't think it is a good idea to assume the lock mode as
ShareUpdateExclusiveLock here.  Tomorrow, if due to some reason there
is a change in lock level for the vacuum process, we might forget to
update it here.  I think it is better if we can get this information
from the master backend.

*
+end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes)
{
..
+ /* Shutdown worker processes and destroy the parallel context */
+ WaitForParallelWorkersToFinish(lps->pcxt);
..
}

Do we really need to call WaitForParallelWorkersToFinish here as it
must have been called in lazy_parallel_vacuum_or_cleanup_indexes
before this time?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Sat, Sep 21, 2019 at 6:01 PM Amit Kapila <> wrote:
On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <> wrote:
>
> Since the previous version patch conflicts with current HEAD, I've
> attached the updated version patches.
>

Review comments:
------------------------------

Sawada-San, are you planning to work on the review comments?  I can take care of this and then proceed with further review if you are tied up with something else.
 
*
+/*
+ * DSM keys for parallel lazy vacuum. Unlike other parallel execution code,
+ * since we don't need to worry about DSM keys conflicting with plan_node_id
+ * we can use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3

I think it would be better if these keys should be assigned numbers in
a way we do for other similar operation like create index.  See below
defines
in code:
/* Magic numbers for parallel state sharing */
#define PARALLEL_KEY_BTREE_SHARED UINT64CONST(0xA000000000000001)

This will make the code consistent with other parallel operations.

I think we don't need to handle this comment.  Today, I read the other emails in the thread and noticed that you have done this based on comment by Robert and that decision seems wise to me.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Oct 1, 2019 at 10:31 PM Amit Kapila <> wrote:
>
> On Sat, Sep 21, 2019 at 6:01 PM Amit Kapila <> wrote:
>>
>> On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <> wrote:
>> >
>> > Since the previous version patch conflicts with current HEAD, I've
>> > attached the updated version patches.
>> >
>>
>> Review comments:
>> ------------------------------
>
>
> Sawada-San, are you planning to work on the review comments?  I can take care of this and then proceed with further
reviewif you are tied up with something else.
 
>

Thank you for reviewing this patch.

Yes I'm addressing your comments and will submit the updated patch soon.

> I think we don't need to handle this comment.  Today, I read the other emails in the thread and noticed that you have
donethis based on comment by Robert and that decision seems wise to me.
 

Understood.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <> wrote:
>
> On Fri, Jun 7, 2019 at 12:03 PM Masahiko Sawada <> wrote:
> >
> > Since the previous version patch conflicts with current HEAD, I've
> > attached the updated version patches.
> >
>

Thank you for reviewing this patch!

> Review comments:
> ------------------------------
> *
>       indexes on the relation which further limited by
> +      <xref linkend="guc-max-parallel-workers-maintenance"/>.
>
> /which further/which is further
>

Fixed.

> *
> + * index vacuuming or index cleanup, we launch parallel worker processes. Once
> + * all indexes are processed the parallel worker processes exit and the leader
> + * process re-initializes the DSM segment while keeping recorded dead tuples.
>
> It is not clear for this comment why it re-initializes the DSM segment
> instead of destroying it once the index work is done by workers.  Can
> you elaborate a bit more in the comment?

Added more explanation.

>
> *
> + * Note that all parallel workers live during one either index vacuuming or
>
> It seems usage of 'one' is not required in the above sentence.

Removed.

>
> *
> +
> +/*
> + * Compute the number of parallel worker process to request.
>
> /process/processes

Fixed.

>
> *
> +static int
> +compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
> +{
> + int parallel_workers = 0;
> +
> + Assert(nrequested >= 0);
> +
> + if (nindexes <= 1)
> + return 0;
>
> I think here, in the beginning, you can also check if
> max_parallel_maintenance_workers are 0, then return.
>

Agreed, fixed.

> *
> In function compute_parallel_workers, don't we want to cap the number
> of workers based on maintenance_work_mem as we do in
> plan_create_index_workers?
>
> The basic point is how do we want to treat maintenance_work_mem for
> this feature.  Do we want all workers to use at max the
> maintenance_work_mem or each worker is allowed to use
> maintenance_work_mem?  I would prefer earlier unless we have good
> reason to follow the later strategy.
>
> Accordingly, we might need to update the below paragraph in docs:
> "Note that parallel utility commands should not consume substantially
> more memory than equivalent non-parallel operations.  This strategy
> differs from that of parallel query, where resource limits generally
> apply per worker process.  Parallel utility commands treat the
> resource limit <varname>maintenance_work_mem</varname> as a limit to
> be applied to the entire utility command, regardless of the number of
> parallel worker processes."

I'd also prefer to use maintenance_work_mem at max during parallel
vacuum regardless of the number of parallel workers. This is current
implementation. In lazy vacuum the maintenance_work_mem is used to
record itempointer of dead tuples. This is done by leader process and
worker processes just refers them for vacuuming dead index tuples.
Even if user sets a small amount of maintenance_work_mem the parallel
vacuum would be helpful as it still would take a time for index
vacuuming. So I thought we should cap the number of parallel workers
by the number of indexes rather than maintenance_work_mem.

>
> *
> +static int
> +compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
> +{
> + int parallel_workers = 0;
> +
> + Assert(nrequested >= 0);
> +
> + if (nindexes <= 1)
> + return 0;
> +
> + if (nrequested > 0)
> + {
> + /* At least one index is taken by the leader process */
> + parallel_workers = Min(nrequested, nindexes - 1);
> + }
>
> I think here we always allow the leader to participate.  It seems to
> me we have some way to disable leader participation.  During the
> development of previous parallel operations, we find it quite handy to
> catch bugs. We might want to mimic what has been done for index with
> DISABLE_LEADER_PARTICIPATION.

Added the way to disable leader participation.

>
> *
> +/*
> + * DSM keys for parallel lazy vacuum. Unlike other parallel execution code,
> + * since we don't need to worry about DSM keys conflicting with plan_node_id
> + * we can use small integers.
> + */
> +#define PARALLEL_VACUUM_KEY_SHARED 1
> +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
> +#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
>
> I think it would be better if these keys should be assigned numbers in
> a way we do for other similar operation like create index.  See below
> defines
> in code:
> /* Magic numbers for parallel state sharing */
> #define PARALLEL_KEY_BTREE_SHARED UINT64CONST(0xA000000000000001)
>
> This will make the code consistent with other parallel operations.

I skipped this comment according to the previous your mail.

>
> *
> +begin_parallel_vacuum(LVRelStats *vacrelstats, Oid relid, BlockNumber nblocks,
> +   int nindexes, int nrequested)
> {
> ..
> + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
> ..
> }
>
> I think here you should use SizeOfLVDeadTuples as defined by patch.

Fixed.

>
> *
> + keys++;
> +
> + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> + maxtuples = compute_max_dead_tuples(nblocks, true);
> + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
> +    mul_size(sizeof(ItemPointerData), maxtuples)));
> + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
> + keys++;
> +
> + shm_toc_estimate_keys(&pcxt->estimator, keys);
> +
> + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
> + querylen = strlen(debug_query_string);
> + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
> + shm_toc_estimate_keys(&pcxt->estimator, 1);
>
> The code style looks inconsistent here.  In some cases, you are
> calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> and in other cases, you are accumulating keys.  I think it is better
> to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> in all cases.

Fixed. But there are some code that call shm_toc_estimate_keys for
multiple keys in for example nbtsort.c and parallel.c. What is the
difference?

>
> *
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
> {
> ..
> + /* Set debug_query_string for individual workers */
> + sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
> ..
> }
>
> I think the last parameter in shm_toc_lookup should be false.  Is
> there a reason for passing it as true?

My bad, fixed.

>
> *
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
> +{
> ..
> + /* Open table */
> + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock);
> ..
> }
>
> I don't think it is a good idea to assume the lock mode as
> ShareUpdateExclusiveLock here.  Tomorrow, if due to some reason there
> is a change in lock level for the vacuum process, we might forget to
> update it here.  I think it is better if we can get this information
> from the master backend.

So did you mean to declare the lock mode for lazy vacuum somewhere as
a global variable and use it in both try_relation_open in the leader
process and relation_open in the worker process? Otherwise we would
end up with adding something like shared->lmode =
ShareUpdateExclusiveLock during parallel context initialization, which
seems not to resolve your concern.

>
> *
> +end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes)
> {
> ..
> + /* Shutdown worker processes and destroy the parallel context */
> + WaitForParallelWorkersToFinish(lps->pcxt);
> ..
> }
>
> Do we really need to call WaitForParallelWorkersToFinish here as it
> must have been called in lazy_parallel_vacuum_or_cleanup_indexes
> before this time?

No, removed.

I've attached the updated version patch that incorporated your
comments excluding some comments that needs more discussion. After
discussion I'll update it again.

Regards,

--
Masahiko Sawada

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
>
I have started reviewing this patch and I have some cosmetic comments.
I will continue the review tomorrow.

+This change adds PARALLEL option to VACUUM command that enable us to
+perform index vacuuming and index cleanup with background
+workers. Indivisual

/s/Indivisual/Individual/

+ * parallel worker processes. Individual indexes is processed by one vacuum
+ * process. At beginning of lazy vacuum (at lazy_scan_heap) we prepare the

/s/Individual indexes is processed/Individual indexes are processed/
/s/At beginning/ At the beginning

+ * parallel workers. In parallel lazy vacuum, we enter parallel mode and
+ * create the parallel context and the DSM segment before starting heap
+ * scan.

Can we extend the comment to explain why we do that before starting
the heap scan?

+ else
+ {
+ if (for_cleanup)
+ {
+ if (lps->nworkers_requested > 0)
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker for index cleanup
(planned: %d, requested %d)",
+   "launched %d parallel vacuum workers for index cleanup (planned:
%d, requsted %d)",
+   lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched,
+ lps->pcxt->nworkers,
+ lps->nworkers_requested);
+ else
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+   "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+   lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched,
+ lps->pcxt->nworkers);
+ }
+ else
+ {
+ if (lps->nworkers_requested > 0)
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker for index vacuuming
(planned: %d, requested %d)",
+   "launched %d parallel vacuum workers for index vacuuming (planned:
%d, requested %d)",
+   lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched,
+ lps->pcxt->nworkers,
+ lps->nworkers_requested);
+ else
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker for index vacuuming
(planned: %d)",
+   "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+   lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched,
+ lps->pcxt->nworkers);
+ }

Multiple places I see a lot of duplicate code for for_cleanup is true
or false.  The only difference is in the error message whether we give
index cleanup or index vacuuming otherwise complete code is the same
for
both the cases.  Can't we create some string and based on the value of
the for_cleanup and append it in the error message that way we can
avoid duplicating this at many places?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Thu, Oct 3, 2019 at 9:06 PM Dilip Kumar <> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
> >
> I have started reviewing this patch and I have some cosmetic comments.
> I will continue the review tomorrow.
>

Thank you for reviewing the patch!

> +This change adds PARALLEL option to VACUUM command that enable us to
> +perform index vacuuming and index cleanup with background
> +workers. Indivisual
>
> /s/Indivisual/Individual/

Fixed.

>
> + * parallel worker processes. Individual indexes is processed by one vacuum
> + * process. At beginning of lazy vacuum (at lazy_scan_heap) we prepare the
>
> /s/Individual indexes is processed/Individual indexes are processed/
> /s/At beginning/ At the beginning

Fixed.

>
> + * parallel workers. In parallel lazy vacuum, we enter parallel mode and
> + * create the parallel context and the DSM segment before starting heap
> + * scan.
>
> Can we extend the comment to explain why we do that before starting
> the heap scan?

Added more comment.

>
> + else
> + {
> + if (for_cleanup)
> + {
> + if (lps->nworkers_requested > 0)
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index cleanup
> (planned: %d, requested %d)",
> +   "launched %d parallel vacuum workers for index cleanup (planned:
> %d, requsted %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers,
> + lps->nworkers_requested);
> + else
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
> +   "launched %d parallel vacuum workers for index cleanup (planned: %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers);
> + }
> + else
> + {
> + if (lps->nworkers_requested > 0)
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index vacuuming
> (planned: %d, requested %d)",
> +   "launched %d parallel vacuum workers for index vacuuming (planned:
> %d, requested %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers,
> + lps->nworkers_requested);
> + else
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index vacuuming
> (planned: %d)",
> +   "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers);
> + }
>
> Multiple places I see a lot of duplicate code for for_cleanup is true
> or false.  The only difference is in the error message whether we give
> index cleanup or index vacuuming otherwise complete code is the same
> for
> both the cases.  Can't we create some string and based on the value of
> the for_cleanup and append it in the error message that way we can
> avoid duplicating this at many places?

I think it's necessary for translation. IIUC if we construct the
message it cannot be translated.

Attached the updated patch.

Regards,

--
Masahiko Sawada

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <> wrote:
> *
> In function compute_parallel_workers, don't we want to cap the number
> of workers based on maintenance_work_mem as we do in
> plan_create_index_workers?
>
> The basic point is how do we want to treat maintenance_work_mem for
> this feature.  Do we want all workers to use at max the
> maintenance_work_mem or each worker is allowed to use
> maintenance_work_mem?  I would prefer earlier unless we have good
> reason to follow the later strategy.
>
> Accordingly, we might need to update the below paragraph in docs:
> "Note that parallel utility commands should not consume substantially
> more memory than equivalent non-parallel operations.  This strategy
> differs from that of parallel query, where resource limits generally
> apply per worker process.  Parallel utility commands treat the
> resource limit <varname>maintenance_work_mem</varname> as a limit to
> be applied to the entire utility command, regardless of the number of
> parallel worker processes."

I'd also prefer to use maintenance_work_mem at max during parallel
vacuum regardless of the number of parallel workers. This is current
implementation. In lazy vacuum the maintenance_work_mem is used to
record itempointer of dead tuples. This is done by leader process and
worker processes just refers them for vacuuming dead index tuples.
Even if user sets a small amount of maintenance_work_mem the parallel
vacuum would be helpful as it still would take a time for index
vacuuming. So I thought we should cap the number of parallel workers
by the number of indexes rather than maintenance_work_mem.


Isn't that true only if we never use maintenance_work_mem during index cleanup?  However, I think we are using during index cleanup, see forex. ginInsertCleanup.  I think before reaching any conclusion about what to do about this, first we need to establish whether this is a problem.  If I am correct, then only some of the index cleanups (like gin index) use maintenance_work_mem, so we need to consider that point while designing a solution for this.
 
> *
> + keys++;
> +
> + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> + maxtuples = compute_max_dead_tuples(nblocks, true);
> + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
> +    mul_size(sizeof(ItemPointerData), maxtuples)));
> + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
> + keys++;
> +
> + shm_toc_estimate_keys(&pcxt->estimator, keys);
> +
> + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
> + querylen = strlen(debug_query_string);
> + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
> + shm_toc_estimate_keys(&pcxt->estimator, 1);
>
> The code style looks inconsistent here.  In some cases, you are
> calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> and in other cases, you are accumulating keys.  I think it is better
> to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
> in all cases.

Fixed. But there are some code that call shm_toc_estimate_keys for
multiple keys in for example nbtsort.c and parallel.c. What is the
difference?


We can do it, either way, depending on the situation.  For example, in nbtsort.c, there is an if check based on which 'number of keys' can vary.  I think here we should try to write in a way that it should not confuse the reader why it is done in a particular way.  This is the reason I told you to be consistent.
 
>
> *
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
> +{
> ..
> + /* Open table */
> + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock);
> ..
> }
>
> I don't think it is a good idea to assume the lock mode as
> ShareUpdateExclusiveLock here.  Tomorrow, if due to some reason there
> is a change in lock level for the vacuum process, we might forget to
> update it here.  I think it is better if we can get this information
> from the master backend.

So did you mean to declare the lock mode for lazy vacuum somewhere as
a global variable and use it in both try_relation_open in the leader
process and relation_open in the worker process? Otherwise we would
end up with adding something like shared->lmode =
ShareUpdateExclusiveLock during parallel context initialization, which
seems not to resolve your concern.


I was thinking that if we can find a way to pass the lockmode we used in vacuum_rel, but I guess we need to pass it through multiple functions which will be a bit inconvenient.  OTOH, today, I checked nbtsort.c (_bt_parallel_build_main) and found that there also we are using it directly instead of passing it from the master backend.  I think we can leave it as you have in the patch, but add a comment on why it is okay to use that lock mode?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <> wrote:
On Thu, Oct 3, 2019 at 9:06 PM Dilip Kumar <> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
>
> + else
> + {
> + if (for_cleanup)
> + {
> + if (lps->nworkers_requested > 0)
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index cleanup
> (planned: %d, requested %d)",
> +   "launched %d parallel vacuum workers for index cleanup (planned:
> %d, requsted %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers,
> + lps->nworkers_requested);
> + else
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
> +   "launched %d parallel vacuum workers for index cleanup (planned: %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers);
> + }
> + else
> + {
> + if (lps->nworkers_requested > 0)
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index vacuuming
> (planned: %d, requested %d)",
> +   "launched %d parallel vacuum workers for index vacuuming (planned:
> %d, requested %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers,
> + lps->nworkers_requested);
> + else
> + appendStringInfo(&buf,
> + ngettext("launched %d parallel vacuum worker for index vacuuming
> (planned: %d)",
> +   "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
> +   lps->pcxt->nworkers_launched),
> + lps->pcxt->nworkers_launched,
> + lps->pcxt->nworkers);
> + }
>
> Multiple places I see a lot of duplicate code for for_cleanup is true
> or false.  The only difference is in the error message whether we give
> index cleanup or index vacuuming otherwise complete code is the same
> for
> both the cases.  Can't we create some string and based on the value of
> the for_cleanup and append it in the error message that way we can
> avoid duplicating this at many places?

I think it's necessary for translation. IIUC if we construct the
message it cannot be translated.


Do we really need to log all those messages?  The other places where we launch parallel workers doesn't seem to be using such messages.  Why do you think it is important to log the messages here when other cases don't use it?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Fri, Oct 4, 2019 at 11:01 AM Amit Kapila <> wrote:
>
> On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <> wrote:
>>
Some more comments..
1.
+ for (idx = 0; idx < nindexes; idx++)
+ {
+ if (!for_cleanup)
+ lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
+   vacrelstats->old_live_tuples);
+ else
+ {
+ /* Cleanup one index and update index statistics */
+ lazy_cleanup_index(Irel[idx], &stats[idx], vacrelstats->new_rel_tuples,
+    vacrelstats->tupcount_pages < vacrelstats->rel_pages);
+
+ lazy_update_index_statistics(Irel[idx], stats[idx]);
+
+ if (stats[idx])
+ pfree(stats[idx]);
+ }

I think instead of checking for_cleanup variable for every index of
the loop we better move loop inside, like shown below?

if (!for_cleanup)
for (idx = 0; idx < nindexes; idx++)
lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
else
for (idx = 0; idx < nindexes; idx++)
{
lazy_cleanup_index
lazy_update_index_statistics
...
}

2.
+static void
+lazy_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel,
+    int nindexes, IndexBulkDeleteResult **stats,
+    LVParallelState *lps, bool for_cleanup)
+{
+ int idx;
+
+ Assert(!IsParallelWorker());
+
+ /* no job if the table has no index */
+ if (nindexes <= 0)
+ return;

Wouldn't it be good idea to call this function only if nindexes > 0?

3.
+/*
+ * Vacuum or cleanup indexes with parallel workers. This function must be used
+ * by the parallel vacuum leader process.
+ */
+static void
+lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats,
Relation *Irel,
+ int nindexes, IndexBulkDeleteResult **stats,
+ LVParallelState *lps, bool for_cleanup)

If you see this function there is no much common code between
for_cleanup and without for_cleanup except these 3-4 statement.
LaunchParallelWorkers(lps->pcxt);
/* Create the log message to report */
initStringInfo(&buf);
...
/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);

Other than that you have got a lot of checks like this
+ if (!for_cleanup)
+ {
+ }
+ else
+ {
}

I think code would be much redable if we have 2 functions one for
vaccum (lazy_parallel_vacuum_indexes) and another for
cleanup(lazy_parallel_cleanup_indexes).

4.
 * of index scans performed.  So we don't use maintenance_work_mem memory for
  * the TID array, just enough to hold as many heap tuples as fit on one page.
  *
+ * Lazy vacuum supports parallel execution with parallel worker processes. In
+ * parallel lazy vacuum, we perform both index vacuuming and index cleanup with
+ * parallel worker processes. Individual indexes are processed by one vacuum

Spacing after the "." is not uniform, previous comment is using 2
space and newly
added is using 1 space.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <> wrote:
>
> *
> +end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes)
> {
> ..
> + /* Shutdown worker processes and destroy the parallel context */
> + WaitForParallelWorkersToFinish(lps->pcxt);
> ..
> }
>
> Do we really need to call WaitForParallelWorkersToFinish here as it
> must have been called in lazy_parallel_vacuum_or_cleanup_indexes
> before this time?

No, removed.

+ /* Shutdown worker processes and destroy the parallel context */
+ DestroyParallelContext(lps->pcxt);

But you forget to update the comment.

Few more comments:
--------------------------------
1.
+/*
+ * Parallel Index vacuuming and index cleanup routine used by both the leader
+ * process and worker processes. Unlike single process vacuum, we don't update
+ * index statistics after cleanup index since it is not allowed during
+ * parallel mode, instead copy index bulk-deletion results from the local
+ * memory to the DSM segment and update them at the end of parallel lazy
+ * vacuum.
+ */
+static void
+do_parallel_vacuum_or_cleanup_indexes(Relation *Irel, int nindexes,
+  IndexBulkDeleteResult **stats,
+  LVShared *lvshared,
+  LVDeadTuples *dead_tuples)
+{
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
+
+ /* Done for all indexes? */
+ if (idx >= nindexes)
+ break;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result
+ * if someone has already updated it.
+ */
+ if (lvshared->indstats[idx].updated &&
+ stats[idx] == NULL)
+ stats[idx] = &(lvshared->indstats[idx].stats);
+
+ /* Do vacuum or cleanup one index */
+ if (!lvshared->for_cleanup)
+ lazy_vacuum_index(Irel[idx], &stats[idx], dead_tuples,
+  lvshared->reltuples);
+ else
+ lazy_cleanup_index(Irel[idx], &stats[idx], lvshared->reltuples,
+   lvshared->estimated_count);

It seems we always run index cleanup via parallel worker which seems overkill because the cleanup index generally scans the index only when bulkdelete was not performed.  In some cases like for hash index, it doesn't do anything even bulk delete is not called.  OTOH, for brin index, it does the main job during cleanup but we might be able to always allow index cleanup by parallel worker for brin indexes if we remove the allocation in brinbulkdelete which I am not sure is of any use.

I think we shouldn't call cleanup via parallel worker unless bulkdelete hasn't been performed on the index.

2.
- for (i = 0; i < nindexes; i++)
- lazy_vacuum_index(Irel[i],
-  &indstats[i],
-  vacrelstats);
+ lazy_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
+   indstats, lps, false);

Indentation is not proper.  You might want to run pgindent.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

От
vignesh C
Дата:
On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
>>
>> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <> wrote:
>> >
One comment:
We can check if parallel_workers is within range something within
MAX_PARALLEL_WORKER_LIMIT.
+ int parallel_workers = 0;
+
+ if (optarg != NULL)
+ {
+ parallel_workers = atoi(optarg);
+ if (parallel_workers <= 0)
+ {
+ pg_log_error("number of parallel workers must be at least 1");
+ exit(1);
+ }
+ }

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Oct 4, 2019 at 2:02 PM Amit Kapila <> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
>>
>> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <> wrote:
>> > *
>> > In function compute_parallel_workers, don't we want to cap the number
>> > of workers based on maintenance_work_mem as we do in
>> > plan_create_index_workers?
>> >
>> > The basic point is how do we want to treat maintenance_work_mem for
>> > this feature.  Do we want all workers to use at max the
>> > maintenance_work_mem or each worker is allowed to use
>> > maintenance_work_mem?  I would prefer earlier unless we have good
>> > reason to follow the later strategy.
>> >
>> > Accordingly, we might need to update the below paragraph in docs:
>> > "Note that parallel utility commands should not consume substantially
>> > more memory than equivalent non-parallel operations.  This strategy
>> > differs from that of parallel query, where resource limits generally
>> > apply per worker process.  Parallel utility commands treat the
>> > resource limit <varname>maintenance_work_mem</varname> as a limit to
>> > be applied to the entire utility command, regardless of the number of
>> > parallel worker processes."
>>
>> I'd also prefer to use maintenance_work_mem at max during parallel
>> vacuum regardless of the number of parallel workers. This is current
>> implementation. In lazy vacuum the maintenance_work_mem is used to
>> record itempointer of dead tuples. This is done by leader process and
>> worker processes just refers them for vacuuming dead index tuples.
>> Even if user sets a small amount of maintenance_work_mem the parallel
>> vacuum would be helpful as it still would take a time for index
>> vacuuming. So I thought we should cap the number of parallel workers
>> by the number of indexes rather than maintenance_work_mem.
>>
>
> Isn't that true only if we never use maintenance_work_mem during index cleanup?  However, I think we are using during
indexcleanup, see forex. ginInsertCleanup.  I think before reaching any conclusion about what to do about this, first
weneed to establish whether this is a problem.  If I am correct, then only some of the index cleanups (like gin index)
usemaintenance_work_mem, so we need to consider that point while designing a solution for this. 
>

I got your point. Currently the single process lazy vacuum could
consume the amount of (maintenance_work_mem * 2) memory at max because
we do index cleanup during holding the dead tuple space as you
mentioned. And ginInsertCleanup is also be called at the beginning of
ginbulkdelete. In current parallel lazy vacuum, each parallel vacuum
worker could consume other memory apart from the memory used by heap
scan depending on the implementation of target index AM. Given that
the current single and parallel vacuum implementation it would be
better to control the amount memory in total rather than the number of
parallel workers. So one approach I came up with is that we make all
vacuum workers use the amount of (maintenance_work_mem / # of
participants) as new maintenance_work_mem. It might be too small in
some cases but it doesn't consume more memory than single lazy vacuum
as long as index AM doesn't consume more memory regardless of
maintenance_work_mem. I think it really depends on the implementation
of index AM.

>>
>> > *
>> > + keys++;
>> > +
>> > + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
>> > + maxtuples = compute_max_dead_tuples(nblocks, true);
>> > + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
>> > +    mul_size(sizeof(ItemPointerData), maxtuples)));
>> > + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
>> > + keys++;
>> > +
>> > + shm_toc_estimate_keys(&pcxt->estimator, keys);
>> > +
>> > + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
>> > + querylen = strlen(debug_query_string);
>> > + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
>> > + shm_toc_estimate_keys(&pcxt->estimator, 1);
>> >
>> > The code style looks inconsistent here.  In some cases, you are
>> > calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
>> > and in other cases, you are accumulating keys.  I think it is better
>> > to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
>> > in all cases.
>>
>> Fixed. But there are some code that call shm_toc_estimate_keys for
>> multiple keys in for example nbtsort.c and parallel.c. What is the
>> difference?
>>
>
> We can do it, either way, depending on the situation.  For example, in nbtsort.c, there is an if check based on which
'numberof keys' can vary.  I think here we should try to write in a way that it should not confuse the reader why it is
donein a particular way.  This is the reason I told you to be consistent. 

Understood. Thank you for explanation!

>
>>
>> >
>> > *
>> > +void
>> > +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
>> > +{
>> > ..
>> > + /* Open table */
>> > + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock);
>> > ..
>> > }
>> >
>> > I don't think it is a good idea to assume the lock mode as
>> > ShareUpdateExclusiveLock here.  Tomorrow, if due to some reason there
>> > is a change in lock level for the vacuum process, we might forget to
>> > update it here.  I think it is better if we can get this information
>> > from the master backend.
>>
>> So did you mean to declare the lock mode for lazy vacuum somewhere as
>> a global variable and use it in both try_relation_open in the leader
>> process and relation_open in the worker process? Otherwise we would
>> end up with adding something like shared->lmode =
>> ShareUpdateExclusiveLock during parallel context initialization, which
>> seems not to resolve your concern.
>>
>
> I was thinking that if we can find a way to pass the lockmode we used in vacuum_rel, but I guess we need to pass it
throughmultiple functions which will be a bit inconvenient.  OTOH, today, I checked nbtsort.c (_bt_parallel_build_main)
andfound that there also we are using it directly instead of passing it from the master backend.  I think we can leave
itas you have in the patch, but add a comment on why it is okay to use that lock mode? 

Yeah agreed.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila <> wrote:
>
> On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <> wrote:
>>
>> On Thu, Oct 3, 2019 at 9:06 PM Dilip Kumar <> wrote:
>> >
>> > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
>> >
>> > + else
>> > + {
>> > + if (for_cleanup)
>> > + {
>> > + if (lps->nworkers_requested > 0)
>> > + appendStringInfo(&buf,
>> > + ngettext("launched %d parallel vacuum worker for index cleanup
>> > (planned: %d, requested %d)",
>> > +   "launched %d parallel vacuum workers for index cleanup (planned:
>> > %d, requsted %d)",
>> > +   lps->pcxt->nworkers_launched),
>> > + lps->pcxt->nworkers_launched,
>> > + lps->pcxt->nworkers,
>> > + lps->nworkers_requested);
>> > + else
>> > + appendStringInfo(&buf,
>> > + ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
>> > +   "launched %d parallel vacuum workers for index cleanup (planned: %d)",
>> > +   lps->pcxt->nworkers_launched),
>> > + lps->pcxt->nworkers_launched,
>> > + lps->pcxt->nworkers);
>> > + }
>> > + else
>> > + {
>> > + if (lps->nworkers_requested > 0)
>> > + appendStringInfo(&buf,
>> > + ngettext("launched %d parallel vacuum worker for index vacuuming
>> > (planned: %d, requested %d)",
>> > +   "launched %d parallel vacuum workers for index vacuuming (planned:
>> > %d, requested %d)",
>> > +   lps->pcxt->nworkers_launched),
>> > + lps->pcxt->nworkers_launched,
>> > + lps->pcxt->nworkers,
>> > + lps->nworkers_requested);
>> > + else
>> > + appendStringInfo(&buf,
>> > + ngettext("launched %d parallel vacuum worker for index vacuuming
>> > (planned: %d)",
>> > +   "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
>> > +   lps->pcxt->nworkers_launched),
>> > + lps->pcxt->nworkers_launched,
>> > + lps->pcxt->nworkers);
>> > + }
>> >
>> > Multiple places I see a lot of duplicate code for for_cleanup is true
>> > or false.  The only difference is in the error message whether we give
>> > index cleanup or index vacuuming otherwise complete code is the same
>> > for
>> > both the cases.  Can't we create some string and based on the value of
>> > the for_cleanup and append it in the error message that way we can
>> > avoid duplicating this at many places?
>>
>> I think it's necessary for translation. IIUC if we construct the
>> message it cannot be translated.
>>
>
> Do we really need to log all those messages?  The other places where we launch parallel workers doesn't seem to be
usingsuch messages.  Why do you think it is important to log the messages here when other cases don't use it?
 

Well I would rather think that parallel create index doesn't log
enough messages. Parallel maintenance operation is invoked manually by
user. I can imagine that DBA wants to cancel and try the operation
again later if enough workers are not launched. But there is not a
convenient way to confirm how many parallel workers planned and
actually launched. We need to see ps command or pg_stat_activity.
That's why I think that log message would be helpful for users.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Fri, Oct 4, 2019 at 3:35 PM Dilip Kumar <> wrote:
>
> On Fri, Oct 4, 2019 at 11:01 AM Amit Kapila <> wrote:
> >
> > On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <> wrote:
> >>
> Some more comments..
> 1.
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + if (!for_cleanup)
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> +   vacrelstats->old_live_tuples);
> + else
> + {
> + /* Cleanup one index and update index statistics */
> + lazy_cleanup_index(Irel[idx], &stats[idx], vacrelstats->new_rel_tuples,
> +    vacrelstats->tupcount_pages < vacrelstats->rel_pages);
> +
> + lazy_update_index_statistics(Irel[idx], stats[idx]);
> +
> + if (stats[idx])
> + pfree(stats[idx]);
> + }
>
> I think instead of checking for_cleanup variable for every index of
> the loop we better move loop inside, like shown below?
>
> if (!for_cleanup)
> for (idx = 0; idx < nindexes; idx++)
> lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> else
> for (idx = 0; idx < nindexes; idx++)
> {
> lazy_cleanup_index
> lazy_update_index_statistics
> ...
> }
>
> 2.
> +static void
> +lazy_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel,
> +    int nindexes, IndexBulkDeleteResult **stats,
> +    LVParallelState *lps, bool for_cleanup)
> +{
> + int idx;
> +
> + Assert(!IsParallelWorker());
> +
> + /* no job if the table has no index */
> + if (nindexes <= 0)
> + return;
>
> Wouldn't it be good idea to call this function only if nindexes > 0?
>
> 3.
> +/*
> + * Vacuum or cleanup indexes with parallel workers. This function must be used
> + * by the parallel vacuum leader process.
> + */
> +static void
> +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats,
> Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps, bool for_cleanup)
>
> If you see this function there is no much common code between
> for_cleanup and without for_cleanup except these 3-4 statement.
> LaunchParallelWorkers(lps->pcxt);
> /* Create the log message to report */
> initStringInfo(&buf);
> ...
> /* Wait for all vacuum workers to finish */
> WaitForParallelWorkersToFinish(lps->pcxt);
>
> Other than that you have got a lot of checks like this
> + if (!for_cleanup)
> + {
> + }
> + else
> + {
> }
>
> I think code would be much redable if we have 2 functions one for
> vaccum (lazy_parallel_vacuum_indexes) and another for
> cleanup(lazy_parallel_cleanup_indexes).
>
> 4.
>  * of index scans performed.  So we don't use maintenance_work_mem memory for
>   * the TID array, just enough to hold as many heap tuples as fit on one page.
>   *
> + * Lazy vacuum supports parallel execution with parallel worker processes. In
> + * parallel lazy vacuum, we perform both index vacuuming and index cleanup with
> + * parallel worker processes. Individual indexes are processed by one vacuum
>
> Spacing after the "." is not uniform, previous comment is using 2
> space and newly
> added is using 1 space.

Few more comments
----------------------------

1.
+static int
+compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
+{
+ int parallel_workers;
+ bool leaderparticipates = true;

Seems like this function is not using onerel parameter so we can remove this.


2.
+
+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
+ maxtuples = compute_max_dead_tuples(nblocks, true);
+ est_deadtuples = MAXALIGN(add_size(SizeOfLVDeadTuples,
+    mul_size(sizeof(ItemPointerData), maxtuples)));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
+ querylen = strlen(debug_query_string);

for consistency with other comments change
VACUUM_KEY_QUERY_TEXT  to PARALLEL_VACUUM_KEY_QUERY_TEXT


3.
@@ -2888,6 +2888,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
  (!wraparound ? VACOPT_SKIP_LOCKED : 0);
  tab->at_params.index_cleanup = VACOPT_TERNARY_DEFAULT;
  tab->at_params.truncate = VACOPT_TERNARY_DEFAULT;
+ /* parallel lazy vacuum is not supported for autovacuum */
+ tab->at_params.nworkers = -1;

What is the reason for the same?  Can we explain in the comments?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Fri, Oct 4, 2019 at 7:57 PM Masahiko Sawada <> wrote:
On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila <> wrote:
>>
>
> Do we really need to log all those messages?  The other places where we launch parallel workers doesn't seem to be using such messages.  Why do you think it is important to log the messages here when other cases don't use it?

Well I would rather think that parallel create index doesn't log
enough messages. Parallel maintenance operation is invoked manually by
user. I can imagine that DBA wants to cancel and try the operation
again later if enough workers are not launched. But there is not a
convenient way to confirm how many parallel workers planned and
actually launched. We need to see ps command or pg_stat_activity.
That's why I think that log message would be helpful for users.

Hmm, what is a guarantee at a later time the user will get the required number of workers?  I think if the user decides to vacuum, then she would want it to start sooner.  Also, to cancel the vacuum, for this reason, the user needs to monitor logs which don't seem to be an easy thing considering this information will be logged at DEBUG2 level.  I think it is better to add in docs that we don't guarantee that the number of workers the user has asked or expected to use for a parallel vacuum will be available during execution.  Even if there is a compelling reason (which I don't see)  to log this information, I think we shouldn't use more than one message to log (like there is no need for a separate message for cleanup and vacuuming) this information.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Fri, Oct 4, 2019 at 7:34 PM Masahiko Sawada <> wrote:
On Fri, Oct 4, 2019 at 2:02 PM Amit Kapila <> wrote:
>>
>> I'd also prefer to use maintenance_work_mem at max during parallel
>> vacuum regardless of the number of parallel workers. This is current
>> implementation. In lazy vacuum the maintenance_work_mem is used to
>> record itempointer of dead tuples. This is done by leader process and
>> worker processes just refers them for vacuuming dead index tuples.
>> Even if user sets a small amount of maintenance_work_mem the parallel
>> vacuum would be helpful as it still would take a time for index
>> vacuuming. So I thought we should cap the number of parallel workers
>> by the number of indexes rather than maintenance_work_mem.
>>
>
> Isn't that true only if we never use maintenance_work_mem during index cleanup?  However, I think we are using during index cleanup, see forex. ginInsertCleanup.  I think before reaching any conclusion about what to do about this, first we need to establish whether this is a problem.  If I am correct, then only some of the index cleanups (like gin index) use maintenance_work_mem, so we need to consider that point while designing a solution for this.
>

I got your point. Currently the single process lazy vacuum could
consume the amount of (maintenance_work_mem * 2) memory at max because
we do index cleanup during holding the dead tuple space as you
mentioned. And ginInsertCleanup is also be called at the beginning of
ginbulkdelete. In current parallel lazy vacuum, each parallel vacuum
worker could consume other memory apart from the memory used by heap
scan depending on the implementation of target index AM. Given that
the current single and parallel vacuum implementation it would be
better to control the amount memory in total rather than the number of
parallel workers. So one approach I came up with is that we make all
vacuum workers use the amount of (maintenance_work_mem / # of
participants) as new maintenance_work_mem.

Yeah, we can do something like that, but I am not clear whether the current memory usage for Gin indexes is correct.  I have started a new thread, let's discuss there.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Sun, Oct 6, 2019 at 7:59 PM Amit Kapila <> wrote:
>
> On Fri, Oct 4, 2019 at 7:34 PM Masahiko Sawada <> wrote:
>>
>> On Fri, Oct 4, 2019 at 2:02 PM Amit Kapila <> wrote:
>> >>
>> >> I'd also prefer to use maintenance_work_mem at max during parallel
>> >> vacuum regardless of the number of parallel workers. This is current
>> >> implementation. In lazy vacuum the maintenance_work_mem is used to
>> >> record itempointer of dead tuples. This is done by leader process and
>> >> worker processes just refers them for vacuuming dead index tuples.
>> >> Even if user sets a small amount of maintenance_work_mem the parallel
>> >> vacuum would be helpful as it still would take a time for index
>> >> vacuuming. So I thought we should cap the number of parallel workers
>> >> by the number of indexes rather than maintenance_work_mem.
>> >>
>> >
>> > Isn't that true only if we never use maintenance_work_mem during index cleanup?  However, I think we are using
duringindex cleanup, see forex. ginInsertCleanup.  I think before reaching any conclusion about what to do about this,
firstwe need to establish whether this is a problem.  If I am correct, then only some of the index cleanups (like gin
index)use maintenance_work_mem, so we need to consider that point while designing a solution for this. 
>> >
>>
>> I got your point. Currently the single process lazy vacuum could
>> consume the amount of (maintenance_work_mem * 2) memory at max because
>> we do index cleanup during holding the dead tuple space as you
>> mentioned. And ginInsertCleanup is also be called at the beginning of
>> ginbulkdelete. In current parallel lazy vacuum, each parallel vacuum
>> worker could consume other memory apart from the memory used by heap
>> scan depending on the implementation of target index AM. Given that
>> the current single and parallel vacuum implementation it would be
>> better to control the amount memory in total rather than the number of
>> parallel workers. So one approach I came up with is that we make all
>> vacuum workers use the amount of (maintenance_work_mem / # of
>> participants) as new maintenance_work_mem.
>
>
> Yeah, we can do something like that, but I am not clear whether the current memory usage for Gin indexes is correct.
Ihave started a new thread, let's discuss there. 
>

Thank you for starting that discussion!

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Sat, Oct 5, 2019 at 8:22 PM Amit Kapila <> wrote:
>
> On Fri, Oct 4, 2019 at 7:57 PM Masahiko Sawada <> wrote:
>>
>> On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila <> wrote:
>> >>
>> >
>> > Do we really need to log all those messages?  The other places where we launch parallel workers doesn't seem to be
usingsuch messages.  Why do you think it is important to log the messages here when other cases don't use it? 
>>
>> Well I would rather think that parallel create index doesn't log
>> enough messages. Parallel maintenance operation is invoked manually by
>> user. I can imagine that DBA wants to cancel and try the operation
>> again later if enough workers are not launched. But there is not a
>> convenient way to confirm how many parallel workers planned and
>> actually launched. We need to see ps command or pg_stat_activity.
>> That's why I think that log message would be helpful for users.
>
>
> Hmm, what is a guarantee at a later time the user will get the required number of workers?  I think if the user
decidesto vacuum, then she would want it to start sooner.  Also, to cancel the vacuum, for this reason, the user needs
tomonitor logs which don't seem to be an easy thing considering this information will be logged at DEBUG2 level.  I
thinkit is better to add in docs that we don't guarantee that the number of workers the user has asked or expected to
usefor a parallel vacuum will be available during execution.  Even if there is a compelling reason (which I don't see)
tolog this information, I think we shouldn't use more than one message to log (like there is no need for a separate
messagefor cleanup and vacuuming) this information. 
>

I think that there is use case where user wants to cancel a
long-running analytic query using parallel workers to use parallel
workers for parallel vacuum instead. That way the lazy vacuum will
eventually complete soon. Or user would want to see the vacuum log to
check if lazy vacuum has been done with how many parallel workers for
diagnostic when the vacuum took a long time. This log information
appears when VERBOSE option is specified. When executing VACUUM
command it's quite common to specify VERBOSE option to see the vacuum
execution more details and VACUUM VERBOSE already emits very detailed
information such as how many frozen pages are skipped and OldestXmin.
So I think this information would not be too odd for that. Are you
concerned that this information takes many lines of code? or it's not
worth to be logged?

I agreed to add in docs that we don't guarantee that the number of
workers user requested will be available.

--
Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Mon, Oct 7, 2019 at 10:00 AM Masahiko Sawada <> wrote:
On Sat, Oct 5, 2019 at 8:22 PM Amit Kapila <> wrote:
>
> On Fri, Oct 4, 2019 at 7:57 PM Masahiko Sawada <> wrote:
>>
>> On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila <> wrote:
>> >>
>> >
>> > Do we really need to log all those messages?  The other places where we launch parallel workers doesn't seem to be using such messages.  Why do you think it is important to log the messages here when other cases don't use it?
>>
>> Well I would rather think that parallel create index doesn't log
>> enough messages. Parallel maintenance operation is invoked manually by
>> user. I can imagine that DBA wants to cancel and try the operation
>> again later if enough workers are not launched. But there is not a
>> convenient way to confirm how many parallel workers planned and
>> actually launched. We need to see ps command or pg_stat_activity.
>> That's why I think that log message would be helpful for users.
>
>
> Hmm, what is a guarantee at a later time the user will get the required number of workers?  I think if the user decides to vacuum, then she would want it to start sooner.  Also, to cancel the vacuum, for this reason, the user needs to monitor logs which don't seem to be an easy thing considering this information will be logged at DEBUG2 level.  I think it is better to add in docs that we don't guarantee that the number of workers the user has asked or expected to use for a parallel vacuum will be available during execution.  Even if there is a compelling reason (which I don't see)  to log this information, I think we shouldn't use more than one message to log (like there is no need for a separate message for cleanup and vacuuming) this information.
>

I think that there is use case where user wants to cancel a
long-running analytic query using parallel workers to use parallel
workers for parallel vacuum instead. That way the lazy vacuum will
eventually complete soon. Or user would want to see the vacuum log to
check if lazy vacuum has been done with how many parallel workers for
diagnostic when the vacuum took a long time. This log information
appears when VERBOSE option is specified. When executing VACUUM
command it's quite common to specify VERBOSE option to see the vacuum
execution more details and VACUUM VERBOSE already emits very detailed
information such as how many frozen pages are skipped and OldestXmin.
So I think this information would not be too odd for that. Are you
concerned that this information takes many lines of code? or it's not
worth to be logged?

To an extent both, but I see the point you are making.  So, we should try to minimize the number of lines used to log this message.  If we can use just one message to log this information, that would be ideal.
 

I agreed to add in docs that we don't guarantee that the number of
workers user requested will be available.

Okay.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Oct 4, 2019 at 7:05 PM Dilip Kumar <> wrote:
>
> On Fri, Oct 4, 2019 at 11:01 AM Amit Kapila <> wrote:
> >
> > On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada <> wrote:
> >>
> Some more comments..

Thank you!

> 1.
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + if (!for_cleanup)
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> +   vacrelstats->old_live_tuples);
> + else
> + {
> + /* Cleanup one index and update index statistics */
> + lazy_cleanup_index(Irel[idx], &stats[idx], vacrelstats->new_rel_tuples,
> +    vacrelstats->tupcount_pages < vacrelstats->rel_pages);
> +
> + lazy_update_index_statistics(Irel[idx], stats[idx]);
> +
> + if (stats[idx])
> + pfree(stats[idx]);
> + }
>
> I think instead of checking for_cleanup variable for every index of
> the loop we better move loop inside, like shown below?

Fixed.

>
> if (!for_cleanup)
> for (idx = 0; idx < nindexes; idx++)
> lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> else
> for (idx = 0; idx < nindexes; idx++)
> {
> lazy_cleanup_index
> lazy_update_index_statistics
> ...
> }
>
> 2.
> +static void
> +lazy_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel,
> +    int nindexes, IndexBulkDeleteResult **stats,
> +    LVParallelState *lps, bool for_cleanup)
> +{
> + int idx;
> +
> + Assert(!IsParallelWorker());
> +
> + /* no job if the table has no index */
> + if (nindexes <= 0)
> + return;
>
> Wouldn't it be good idea to call this function only if nindexes > 0?
>

I realized the callers of this function should pass nindexes > 0
because they attempt to do index vacuuming or index cleanup. So it
should be an assertion rather than returning. Thoughts?

> 3.
> +/*
> + * Vacuum or cleanup indexes with parallel workers. This function must be used
> + * by the parallel vacuum leader process.
> + */
> +static void
> +lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats,
> Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps, bool for_cleanup)
>
> If you see this function there is no much common code between
> for_cleanup and without for_cleanup except these 3-4 statement.
> LaunchParallelWorkers(lps->pcxt);
> /* Create the log message to report */
> initStringInfo(&buf);
> ...
> /* Wait for all vacuum workers to finish */
> WaitForParallelWorkersToFinish(lps->pcxt);
>
> Other than that you have got a lot of checks like this
> + if (!for_cleanup)
> + {
> + }
> + else
> + {
> }
>
> I think code would be much redable if we have 2 functions one for
> vaccum (lazy_parallel_vacuum_indexes) and another for
> cleanup(lazy_parallel_cleanup_indexes).

Seems good idea. Fixed.

>
> 4.
>  * of index scans performed.  So we don't use maintenance_work_mem memory for
>   * the TID array, just enough to hold as many heap tuples as fit on one page.
>   *
> + * Lazy vacuum supports parallel execution with parallel worker processes. In
> + * parallel lazy vacuum, we perform both index vacuuming and index cleanup with
> + * parallel worker processes. Individual indexes are processed by one vacuum
>
> Spacing after the "." is not uniform, previous comment is using 2
> space and newly
> added is using 1 space.
>

FIxed.

The code has been fixed in my local repository. After incorporated the
all comments I got so far I'll submit the updated version patch.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Sat, Oct 5, 2019 at 4:36 PM Dilip Kumar <> wrote:
>
> Few more comments
> ----------------------------
>
> 1.
> +static int
> +compute_parallel_workers(Relation onerel, int nrequested, int nindexes)
> +{
> + int parallel_workers;
> + bool leaderparticipates = true;
>
> Seems like this function is not using onerel parameter so we can remove this.
>

Fixed.

>
> 2.
> +
> + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
> + maxtuples = compute_max_dead_tuples(nblocks, true);
> + est_deadtuples = MAXALIGN(add_size(SizeOfLVDeadTuples,
> +    mul_size(sizeof(ItemPointerData), maxtuples)));
> + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
> + shm_toc_estimate_keys(&pcxt->estimator, 1);
> +
> + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
> + querylen = strlen(debug_query_string);
>
> for consistency with other comments change
> VACUUM_KEY_QUERY_TEXT  to PARALLEL_VACUUM_KEY_QUERY_TEXT
>

Fixed.

>
> 3.
> @@ -2888,6 +2888,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
>   (!wraparound ? VACOPT_SKIP_LOCKED : 0);
>   tab->at_params.index_cleanup = VACOPT_TERNARY_DEFAULT;
>   tab->at_params.truncate = VACOPT_TERNARY_DEFAULT;
> + /* parallel lazy vacuum is not supported for autovacuum */
> + tab->at_params.nworkers = -1;
>
> What is the reason for the same?  Can we explain in the comments?

I think it's just that we don't want to support parallel auto vacuum
because it can consume more CPU resources in spite of background job,
which might be an unexpected behavior of autovacuum. I've changed the
comment.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Oct 4, 2019 at 8:55 PM vignesh C <> wrote:
>
> On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <> wrote:
> >
> > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
> >>
> >> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <> wrote:
> >> >
> One comment:

Thank you for reviewing this patch.

> We can check if parallel_workers is within range something within
> MAX_PARALLEL_WORKER_LIMIT.
> + int parallel_workers = 0;
> +
> + if (optarg != NULL)
> + {
> + parallel_workers = atoi(optarg);
> + if (parallel_workers <= 0)
> + {
> + pg_log_error("number of parallel workers must be at least 1");
> + exit(1);
> + }
> + }
>

Fixed.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Wed, Oct 9, 2019 at 6:13 AM Masahiko Sawada <> wrote:
>
> On Sat, Oct 5, 2019 at 4:36 PM Dilip Kumar <> wrote:
> >
> > 3.
> > @@ -2888,6 +2888,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
> >   (!wraparound ? VACOPT_SKIP_LOCKED : 0);
> >   tab->at_params.index_cleanup = VACOPT_TERNARY_DEFAULT;
> >   tab->at_params.truncate = VACOPT_TERNARY_DEFAULT;
> > + /* parallel lazy vacuum is not supported for autovacuum */
> > + tab->at_params.nworkers = -1;
> >
> > What is the reason for the same?  Can we explain in the comments?
>
> I think it's just that we don't want to support parallel auto vacuum
> because it can consume more CPU resources in spite of background job,
> which might be an unexpected behavior of autovacuum.
>

I think the other reason is it can generate a lot of I/O which might
choke other operations.  I think if we want we can provide Guc(s) to
control such behavior, but initially providing it via command should
be a good start so that users can knowingly use it in appropriate
cases.  We can later extend it for autovacuum if required.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
>>

Few more comments:
---------------------------------
1.  Caurrently parallel vacuum is allowed for temporary relations
which is wrong.  It leads to below error:

postgres=# create temporary table tmp_t1(c1 int, c2 char(10));
CREATE TABLE
postgres=# create index idx_tmp_t1 on tmp_t1(c1);
CREATE INDEX
postgres=# create index idx1_tmp_t1 on tmp_t1(c2);
CREATE INDEX
postgres=# insert into tmp_t1 values(generate_series(1,10000),'aaaa');
INSERT 0 10000
postgres=# delete from tmp_t1 where c1 > 5000;
DELETE 5000
postgres=# vacuum (parallel 2) tmp_t1;
ERROR:  cannot access temporary tables during a parallel operation
CONTEXT:  parallel worker

The parallel vacuum shouldn't be allowed for temporary relations.

2.
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [
<replaceable class="paramet
     SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
     INDEX_CLEANUP [ <replaceable
class="parameter">boolean</replaceable> ]
     TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
+    PARALLEL [ <replaceable
class="parameter">integer</replaceable> ]

Now, if the user gives a command like Vacuum (analyze, parallel)
<table_name>; it is not very obvious that a parallel option will be
only used for vacuum purposes but not for analyze.  I think we can add
a note in the docs to mention this explicitly.  This can avoid any
confusion.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Oct 4, 2019 at 7:48 PM Amit Kapila <> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
>>
>> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <> wrote:
>> >
>> > *
>> > +end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes)
>> > {
>> > ..
>> > + /* Shutdown worker processes and destroy the parallel context */
>> > + WaitForParallelWorkersToFinish(lps->pcxt);
>> > ..
>> > }
>> >
>> > Do we really need to call WaitForParallelWorkersToFinish here as it
>> > must have been called in lazy_parallel_vacuum_or_cleanup_indexes
>> > before this time?
>>
>> No, removed.
>
>
> + /* Shutdown worker processes and destroy the parallel context */
> + DestroyParallelContext(lps->pcxt);
>
> But you forget to update the comment.

Fixed.

>
> Few more comments:
> --------------------------------
> 1.
> +/*
> + * Parallel Index vacuuming and index cleanup routine used by both the leader
> + * process and worker processes. Unlike single process vacuum, we don't update
> + * index statistics after cleanup index since it is not allowed during
> + * parallel mode, instead copy index bulk-deletion results from the local
> + * memory to the DSM segment and update them at the end of parallel lazy
> + * vacuum.
> + */
> +static void
> +do_parallel_vacuum_or_cleanup_indexes(Relation *Irel, int nindexes,
> +  IndexBulkDeleteResult **stats,
> +  LVShared *lvshared,
> +  LVDeadTuples *dead_tuples)
> +{
> + /* Loop until all indexes are vacuumed */
> + for (;;)
> + {
> + int idx;
> +
> + /* Get an index number to process */
> + idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
> +
> + /* Done for all indexes? */
> + if (idx >= nindexes)
> + break;
> +
> + /*
> + * Update the pointer to the corresponding bulk-deletion result
> + * if someone has already updated it.
> + */
> + if (lvshared->indstats[idx].updated &&
> + stats[idx] == NULL)
> + stats[idx] = &(lvshared->indstats[idx].stats);
> +
> + /* Do vacuum or cleanup one index */
> + if (!lvshared->for_cleanup)
> + lazy_vacuum_index(Irel[idx], &stats[idx], dead_tuples,
> +  lvshared->reltuples);
> + else
> + lazy_cleanup_index(Irel[idx], &stats[idx], lvshared->reltuples,
> +   lvshared->estimated_count);
>
> It seems we always run index cleanup via parallel worker which seems overkill because the cleanup index generally
scansthe index only when bulkdelete was not performed.  In some cases like for hash index, it doesn't do anything even
bulkdelete is not called.  OTOH, for brin index, it does the main job during cleanup but we might be able to always
allowindex cleanup by parallel worker for brin indexes if we remove the allocation in brinbulkdelete which I am not
sureis of any use. 
>
> I think we shouldn't call cleanup via parallel worker unless bulkdelete hasn't been performed on the index.
>

Agreed. Fixed.

> 2.
> - for (i = 0; i < nindexes; i++)
> - lazy_vacuum_index(Irel[i],
> -  &indstats[i],
> -  vacrelstats);
> + lazy_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> +   indstats, lps, false);
>
> Indentation is not proper.  You might want to run pgindent.

Fixed.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Thu, Oct 10, 2019 at 2:19 PM Amit Kapila <> wrote:
>
> On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <> wrote:
> >
> > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
> >>
>
> Few more comments:

Thank you for reviewing the patch!

> ---------------------------------
> 1.  Caurrently parallel vacuum is allowed for temporary relations
> which is wrong.  It leads to below error:
>
> postgres=# create temporary table tmp_t1(c1 int, c2 char(10));
> CREATE TABLE
> postgres=# create index idx_tmp_t1 on tmp_t1(c1);
> CREATE INDEX
> postgres=# create index idx1_tmp_t1 on tmp_t1(c2);
> CREATE INDEX
> postgres=# insert into tmp_t1 values(generate_series(1,10000),'aaaa');
> INSERT 0 10000
> postgres=# delete from tmp_t1 where c1 > 5000;
> DELETE 5000
> postgres=# vacuum (parallel 2) tmp_t1;
> ERROR:  cannot access temporary tables during a parallel operation
> CONTEXT:  parallel worker
>
> The parallel vacuum shouldn't be allowed for temporary relations.

Fixed.

>
> 2.
> --- a/doc/src/sgml/ref/vacuum.sgml
> +++ b/doc/src/sgml/ref/vacuum.sgml
> @@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [
> <replaceable class="paramet
>      SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
>      INDEX_CLEANUP [ <replaceable
> class="parameter">boolean</replaceable> ]
>      TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
> +    PARALLEL [ <replaceable
> class="parameter">integer</replaceable> ]
>
> Now, if the user gives a command like Vacuum (analyze, parallel)
> <table_name>; it is not very obvious that a parallel option will be
> only used for vacuum purposes but not for analyze.  I think we can add
> a note in the docs to mention this explicitly.  This can avoid any
> confusion.

Agreed.

Attached the latest version patch although the memory usage problem is
under discussion. I'll update the patches according to the result of
that discussion.

Regards,

--
Masahiko Sawada

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Mahendra Singh
Дата:
Hi

On Thu, 10 Oct 2019 at 13:18, Masahiko Sawada <> wrote:
On Thu, Oct 10, 2019 at 2:19 PM Amit Kapila <> wrote:
>
> On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila <> wrote:
> >
> > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <> wrote:
> >>
>
> Few more comments:

Thank you for reviewing the patch!

> ---------------------------------
> 1.  Caurrently parallel vacuum is allowed for temporary relations
> which is wrong.  It leads to below error:
>
> postgres=# create temporary table tmp_t1(c1 int, c2 char(10));
> CREATE TABLE
> postgres=# create index idx_tmp_t1 on tmp_t1(c1);
> CREATE INDEX
> postgres=# create index idx1_tmp_t1 on tmp_t1(c2);
> CREATE INDEX
> postgres=# insert into tmp_t1 values(generate_series(1,10000),'aaaa');
> INSERT 0 10000
> postgres=# delete from tmp_t1 where c1 > 5000;
> DELETE 5000
> postgres=# vacuum (parallel 2) tmp_t1;
> ERROR:  cannot access temporary tables during a parallel operation
> CONTEXT:  parallel worker
>
> The parallel vacuum shouldn't be allowed for temporary relations.

Fixed.

>
> 2.
> --- a/doc/src/sgml/ref/vacuum.sgml
> +++ b/doc/src/sgml/ref/vacuum.sgml
> @@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [
> <replaceable class="paramet
>      SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
>      INDEX_CLEANUP [ <replaceable
> class="parameter">boolean</replaceable> ]
>      TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
> +    PARALLEL [ <replaceable
> class="parameter">integer</replaceable> ]
>
> Now, if the user gives a command like Vacuum (analyze, parallel)
> <table_name>; it is not very obvious that a parallel option will be
> only used for vacuum purposes but not for analyze.  I think we can add
> a note in the docs to mention this explicitly.  This can avoid any
> confusion.

Agreed.

Attached the latest version patch although the memory usage problem is
under discussion. I'll update the patches according to the result of
that discussion.

 
I applied both patches on HEAD and did some testing. I am getting one crash in freeing memory. (pfree(stats[i]))

Steps to reproduce:
Step 1) Apply both the patches and configure with below command.
./configure --with-zlib  --enable-debug --prefix=$PWD/inst/   --with-openssl CFLAGS="-ggdb3" > war && make -j 8 install > war

Step 2) Now start the server.

Step 3) Fire below commands:
create table tmp_t1(c1 int, c2 char(10));
create index idx_tmp_t1 on tmp_t1(c1);
create index idx1_tmp_t1 on tmp_t1(c2);
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
insert into tmp_t1 values(generate_series(1,10000),'aaaa');
delete from tmp_t1 where c1 > 5000;
vacuum (parallel 2) tmp_t1;

Call stack:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: mahendra postgres [local] VACUUM                        '.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000a4f97a in pfree (pointer=0x10baa68) at mcxt.c:1060
1060 context->methods->free_p(context, pointer);
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libselinux-2.5-12.el7.x86_64 openssl-libs-1.0.2k-12.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x0000000000a4f97a in pfree (pointer=0x10baa68) at mcxt.c:1060
#1  0x00000000004e7d13 in update_index_statistics (Irel=0x10b9808, stats=0x10b9828, nindexes=2) at vacuumlazy.c:2277
#2  0x00000000004e693f in lazy_scan_heap (onerel=0x7f8d99610d08, params=0x7ffeeaddb7f0, vacrelstats=0x10b9728, Irel=0x10b9808, nindexes=2, aggressive=false) at vacuumlazy.c:1659
'#3  0x00000000004e4d25 in heap_vacuum_rel (onerel=0x7f8d99610d08, params=0x7ffeeaddb7f0, bstrategy=0x1117528) at vacuumlazy.c:431
#4  0x00000000006a71a7 in table_relation_vacuum (rel=0x7f8d99610d08, params=0x7ffeeaddb7f0, bstrategy=0x1117528) at ../../../src/include/access/tableam.h:1432
#5  0x00000000006a9899 in vacuum_rel (relid=16384, relation=0x103b308, params=0x7ffeeaddb7f0) at vacuum.c:1870
#6  0x00000000006a7c22 in vacuum (relations=0x11176b8, params=0x7ffeeaddb7f0, bstrategy=0x1117528, isTopLevel=true) at vacuum.c:425
#7  0x00000000006a77e6 in ExecVacuum (pstate=0x105f578, vacstmt=0x103b3d8, isTopLevel=true) at vacuum.c:228
#8  0x00000000008af401 in standard_ProcessUtility (pstmt=0x103b6f8, queryString=0x103a808 "vacuum (parallel 2) tmp_t1;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
    dest=0x103b7d8, completionTag=0x7ffeeaddbc50 "") at utility.c:670
#9  0x00000000008aec40 in ProcessUtility (pstmt=0x103b6f8, queryString=0x103a808 "vacuum (parallel 2) tmp_t1;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
    dest=0x103b7d8, completionTag=0x7ffeeaddbc50 "") at utility.c:360
#10 0x00000000008addbb in PortalRunUtility (portal=0x10a1a28, pstmt=0x103b6f8, isTopLevel=true, setHoldSnapshot=false, dest=0x103b7d8, completionTag=0x7ffeeaddbc50 "") at pquery.c:1175
#11 0x00000000008adf9f in PortalRunMulti (portal=0x10a1a28, isTopLevel=true, setHoldSnapshot=false, dest=0x103b7d8, altdest=0x103b7d8, completionTag=0x7ffeeaddbc50 "") at pquery.c:1321
#12 0x00000000008ad55d in PortalRun (portal=0x10a1a28, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x103b7d8, altdest=0x103b7d8, completionTag=0x7ffeeaddbc50 "")
    at pquery.c:796
#13 0x00000000008a7789 in exec_simple_query (query_string=0x103a808 "vacuum (parallel 2) tmp_t1;") at postgres.c:1231
#14 0x00000000008ab8f2 in PostgresMain (argc=1, argv=0x1065b00, dbname=0x1065a28 "postgres", username=0x1065a08 "mahendra") at postgres.c:4256
#15 0x0000000000811a42 in BackendRun (port=0x105d9c0) at postmaster.c:4465
#16 0x0000000000811241 in BackendStartup (port=0x105d9c0) at postmaster.c:4156
#17 0x000000000080d7d6 in ServerLoop () at postmaster.c:1718
#18 0x000000000080d096 in PostmasterMain (argc=3, argv=0x1035270) at postmaster.c:1391
#19 0x000000000072accb in main (argc=3, argv=0x1035270) at main.c:210


I did some analysis and found that we are trying to free some already freed memory. Or we are freeing palloced memory in vac_update_relstats.
    for (i = 0; i < nindexes; i++)
    {
        if (stats[i] == NULL || stats[i]->estimated_count)
            continue;

        /* Update index statistics */
        vac_update_relstats(Irel[i],
                            stats[i]->num_pages,
                            stats[i]->num_index_tuples,
                            0,
                            false,
                            InvalidTransactionId,
                            InvalidMultiXactId,
                            false);
        pfree(stats[i]);
    }

As my table have 2 indexes, so we have to free both stats. When i = 0, it is freeing propery but when i = 1, then vac_update_relstats  is freeing memory.
(gdb) p *stats[i]
$1 = {num_pages = 218, pages_removed = 0, estimated_count = false, num_index_tuples = 30000, tuples_removed = 30000, pages_deleted = 102, pages_free = 0}
(gdb) p *stats[i]
$2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, num_index_tuples = 0, tuples_removed = 0, pages_deleted = 0, pages_free = 0}
(gdb)

From above data, it looks like, somewhere inside vac_update_relstats, we are freeing all palloced memory. I don't know, why is it.

Thanks and Regards
Mahendra Thalor

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh <> wrote:
>
>
> I did some analysis and found that we are trying to free some already freed memory. Or we are freeing palloced memory
invac_update_relstats.
 
>     for (i = 0; i < nindexes; i++)
>     {
>         if (stats[i] == NULL || stats[i]->estimated_count)
>             continue;
>
>         /* Update index statistics */
>         vac_update_relstats(Irel[i],
>                             stats[i]->num_pages,
>                             stats[i]->num_index_tuples,
>                             0,
>                             false,
>                             InvalidTransactionId,
>                             InvalidMultiXactId,
>                             false);
>         pfree(stats[i]);
>     }
>
> As my table have 2 indexes, so we have to free both stats. When i = 0, it is freeing propery but when i = 1, then
vac_update_relstats is freeing memory.
 
>>
>> (gdb) p *stats[i]
>> $1 = {num_pages = 218, pages_removed = 0, estimated_count = false, num_index_tuples = 30000, tuples_removed = 30000,
pages_deleted= 102, pages_free = 0}
 
>> (gdb) p *stats[i]
>> $2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, num_index_tuples = 0, tuples_removed = 0,
pages_deleted= 0, pages_free = 0}
 
>> (gdb)
>
>
> From above data, it looks like, somewhere inside vac_update_relstats, we are freeing all palloced memory. I don't
know,why is it.
 
>

I don't think the problem is in vac_update_relstats as we are not even
passing stats to it, so it won't be able to free it.  I think the real
problem is in the way we copy the stats from shared memory to local
memory in the function end_parallel_vacuum().  Basically, it allocates
the memory for all the index stats together and then in function
update_index_statistics,  it is trying to free memory of individual
array elements, that won't work.  I have tried to fix the allocation
in end_parallel_vacuum, see if this fixes the problem for you.   You
need to apply the attached patch atop
v28-0001-Add-parallel-option-to-VACUUM-command posted above by
Sawada-San.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Sat, Oct 12, 2019 at 12:33 PM Amit Kapila <> wrote:
>
> On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh <> wrote:
> >
> >
> > I did some analysis and found that we are trying to free some already freed memory. Or we are freeing palloced
memoryin vac_update_relstats.
 
> >     for (i = 0; i < nindexes; i++)
> >     {
> >         if (stats[i] == NULL || stats[i]->estimated_count)
> >             continue;
> >
> >         /* Update index statistics */
> >         vac_update_relstats(Irel[i],
> >                             stats[i]->num_pages,
> >                             stats[i]->num_index_tuples,
> >                             0,
> >                             false,
> >                             InvalidTransactionId,
> >                             InvalidMultiXactId,
> >                             false);
> >         pfree(stats[i]);
> >     }
> >
> > As my table have 2 indexes, so we have to free both stats. When i = 0, it is freeing propery but when i = 1, then
vac_update_relstats is freeing memory.
 
> >>
> >> (gdb) p *stats[i]
> >> $1 = {num_pages = 218, pages_removed = 0, estimated_count = false, num_index_tuples = 30000, tuples_removed =
30000,pages_deleted = 102, pages_free = 0}
 
> >> (gdb) p *stats[i]
> >> $2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, num_index_tuples = 0, tuples_removed = 0,
pages_deleted= 0, pages_free = 0}
 
> >> (gdb)
> >
> >
> > From above data, it looks like, somewhere inside vac_update_relstats, we are freeing all palloced memory. I don't
know,why is it.
 
> >
>
> I don't think the problem is in vac_update_relstats as we are not even
> passing stats to it, so it won't be able to free it.  I think the real
> problem is in the way we copy the stats from shared memory to local
> memory in the function end_parallel_vacuum().  Basically, it allocates
> the memory for all the index stats together and then in function
> update_index_statistics,  it is trying to free memory of individual
> array elements, that won't work.  I have tried to fix the allocation
> in end_parallel_vacuum, see if this fixes the problem for you.   You
> need to apply the attached patch atop
> v28-0001-Add-parallel-option-to-VACUUM-command posted above by
> Sawada-San.

Thank you for reviewing and creating the patch!

I think the patch fixes this issue correctly. Attached the updated
version patch.

Regards,

--
Masahiko Sawada

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Mahendra Singh
Дата:
Thanks Amit for patch.

Crash is fixed by this patch.

Thanks and Regards
Mahendra Thalor


On Sat, Oct 12, 2019, 09:03 Amit Kapila <> wrote:
On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh <> wrote:
>
>
> I did some analysis and found that we are trying to free some already freed memory. Or we are freeing palloced memory in vac_update_relstats.
>     for (i = 0; i < nindexes; i++)
>     {
>         if (stats[i] == NULL || stats[i]->estimated_count)
>             continue;
>
>         /* Update index statistics */
>         vac_update_relstats(Irel[i],
>                             stats[i]->num_pages,
>                             stats[i]->num_index_tuples,
>                             0,
>                             false,
>                             InvalidTransactionId,
>                             InvalidMultiXactId,
>                             false);
>         pfree(stats[i]);
>     }
>
> As my table have 2 indexes, so we have to free both stats. When i = 0, it is freeing propery but when i = 1, then vac_update_relstats  is freeing memory.
>>
>> (gdb) p *stats[i]
>> $1 = {num_pages = 218, pages_removed = 0, estimated_count = false, num_index_tuples = 30000, tuples_removed = 30000, pages_deleted = 102, pages_free = 0}
>> (gdb) p *stats[i]
>> $2 = {num_pages = 0, pages_removed = 65536, estimated_count = false, num_index_tuples = 0, tuples_removed = 0, pages_deleted = 0, pages_free = 0}
>> (gdb)
>
>
> From above data, it looks like, somewhere inside vac_update_relstats, we are freeing all palloced memory. I don't know, why is it.
>

I don't think the problem is in vac_update_relstats as we are not even
passing stats to it, so it won't be able to free it.  I think the real
problem is in the way we copy the stats from shared memory to local
memory in the function end_parallel_vacuum().  Basically, it allocates
the memory for all the index stats together and then in function
update_index_statistics,  it is trying to free memory of individual
array elements, that won't work.  I have tried to fix the allocation
in end_parallel_vacuum, see if this fixes the problem for you.   You
need to apply the attached patch atop
v28-0001-Add-parallel-option-to-VACUUM-command posted above by
Sawada-San.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <> wrote:
>
> On Sat, Oct 12, 2019 at 12:33 PM Amit Kapila <> wrote:
> >
> > On Fri, Oct 11, 2019 at 4:47 PM Mahendra Singh <> wrote:
> > >
>
> Thank you for reviewing and creating the patch!
>
> I think the patch fixes this issue correctly. Attached the updated
> version patch.
>

I see a much bigger problem with the way this patch collects the index
stats in shared memory.  IIUC, it allocates the shared memory (DSM)
for all the index stats, in the same way, considering its size as
IndexBulkDeleteResult.  For the first time, it gets the stats from
local memory as returned by ambulkdelete/amvacuumcleanup call and then
copies it in shared memory space.  There onwards, it always updates
the stats in shared memory by pointing each index stats to that
memory.  In this scheme, you overlooked the point that an index AM
could choose to return a larger structure of which
IndexBulkDeleteResult is just the first field.  This generally
provides a way for ambulkdelete to communicate additional private data
to amvacuumcleanup.  We use this idea in the gist index, see how
gistbulkdelete and gistvacuumcleanup works. The current design won't
work for such cases.

One idea is to change the design such that each index method provides
a method to estimate/allocate the shared memory required for stats of
ambulkdelete/amvacuumscan and then later we also need to use index
method-specific function which copies the stats from local memory to
shared memory. I think this needs further investigation.

I have also made a few other changes in the attached delta patch.  The
main point that fixed by attached patch is that even if we don't allow
a parallel vacuum on temporary tables, the analyze should be able to
work if the user has asked for it.  I have changed an error message
and few other cosmetic changes related to comments.  Kindly include
this in the next version if you don't find any problem with the
changes.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Sat, Oct 12, 2019 at 4:50 PM Amit Kapila <> wrote:
> On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <> wrote:
> >
>
> I see a much bigger problem with the way this patch collects the index
> stats in shared memory.  IIUC, it allocates the shared memory (DSM)
> for all the index stats, in the same way, considering its size as
> IndexBulkDeleteResult.  For the first time, it gets the stats from
> local memory as returned by ambulkdelete/amvacuumcleanup call and then
> copies it in shared memory space.  There onwards, it always updates
> the stats in shared memory by pointing each index stats to that
> memory.  In this scheme, you overlooked the point that an index AM
> could choose to return a larger structure of which
> IndexBulkDeleteResult is just the first field.  This generally
> provides a way for ambulkdelete to communicate additional private data
> to amvacuumcleanup.  We use this idea in the gist index, see how
> gistbulkdelete and gistvacuumcleanup works. The current design won't
> work for such cases.
>

Today, I looked at gistbulkdelete and gistvacuumcleanup closely and I
have a few observations about those which might help us to solve this
problem for gist indexes:
1. Are we using memory context GistBulkDeleteResult->page_set_context?
 It seems to me it is not being used.
2. Each time we perform gistbulkdelete, we always seem to reset the
GistBulkDeleteResult stats, see gistvacuumscan.  So, how will it
accumulate it for the cleanup phase when the vacuum needs to call
gistbulkdelete multiple times because the available space for
dead-tuple is filled.  It seems to me like we only use the stats from
the very last call to gistbulkdelete.
3. Do we really need to give the responsibility of deleting empty
pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
do it in gistbulkdelte?  I see one advantage of postponing it till the
cleanup phase which is if somehow we can accumulate stats over
multiple calls of gistbulkdelete, but I am not sure if it is feasible.
At least, the way current code works, it seems that there is no
advantage to postpone deleting empty pages till the cleanup phase.

If we avoid postponing deleting empty pages till the cleanup phase,
then we don't have the problem for gist indexes.

This is not directly related to this patch, so we can discuss these
observations in a separate thread as well, but before that, I wanted
to check your opinion to see if this makes sense to you as this will
help us in moving this patch forward.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Mon, Oct 14, 2019 at 3:10 PM Amit Kapila <> wrote:
>
> On Sat, Oct 12, 2019 at 4:50 PM Amit Kapila <> wrote:
> > On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <> wrote:
> > >
> >
> > I see a much bigger problem with the way this patch collects the index
> > stats in shared memory.  IIUC, it allocates the shared memory (DSM)
> > for all the index stats, in the same way, considering its size as
> > IndexBulkDeleteResult.  For the first time, it gets the stats from
> > local memory as returned by ambulkdelete/amvacuumcleanup call and then
> > copies it in shared memory space.  There onwards, it always updates
> > the stats in shared memory by pointing each index stats to that
> > memory.  In this scheme, you overlooked the point that an index AM
> > could choose to return a larger structure of which
> > IndexBulkDeleteResult is just the first field.  This generally
> > provides a way for ambulkdelete to communicate additional private data
> > to amvacuumcleanup.  We use this idea in the gist index, see how
> > gistbulkdelete and gistvacuumcleanup works. The current design won't
> > work for such cases.
> >
>
> Today, I looked at gistbulkdelete and gistvacuumcleanup closely and I
> have a few observations about those which might help us to solve this
> problem for gist indexes:
> 1. Are we using memory context GistBulkDeleteResult->page_set_context?
>  It seems to me it is not being used.
To me also it appears that it's not being used.

> 2. Each time we perform gistbulkdelete, we always seem to reset the
> GistBulkDeleteResult stats, see gistvacuumscan.  So, how will it
> accumulate it for the cleanup phase when the vacuum needs to call
> gistbulkdelete multiple times because the available space for
> dead-tuple is filled.  It seems to me like we only use the stats from
> the very last call to gistbulkdelete.
IIUC, it is fine to use the stats from the latest gistbulkdelete call
because we are trying to collect the information of the empty pages
while scanning the tree.  So I think it would be fine to just use the
information collected from the latest scan otherwise we will get
duplicate information.

> 3. Do we really need to give the responsibility of deleting empty
> pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
> do it in gistbulkdelte?  I see one advantage of postponing it till the
> cleanup phase which is if somehow we can accumulate stats over
> multiple calls of gistbulkdelete, but I am not sure if it is feasible.
It seems that we want to use the latest result. That might be the
reason for postponing to the cleanup phase.


> At least, the way current code works, it seems that there is no
> advantage to postpone deleting empty pages till the cleanup phase.
>
> If we avoid postponing deleting empty pages till the cleanup phase,
> then we don't have the problem for gist indexes.
>
> This is not directly related to this patch, so we can discuss these
> observations in a separate thread as well, but before that, I wanted
> to check your opinion to see if this makes sense to you as this will
> help us in moving this patch forward.



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <> wrote:
>
> On Sat, Oct 12, 2019 at 4:50 PM Amit Kapila <> wrote:
> > On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <> wrote:
> > >
> >
> > I see a much bigger problem with the way this patch collects the index
> > stats in shared memory.  IIUC, it allocates the shared memory (DSM)
> > for all the index stats, in the same way, considering its size as
> > IndexBulkDeleteResult.  For the first time, it gets the stats from
> > local memory as returned by ambulkdelete/amvacuumcleanup call and then
> > copies it in shared memory space.  There onwards, it always updates
> > the stats in shared memory by pointing each index stats to that
> > memory.  In this scheme, you overlooked the point that an index AM
> > could choose to return a larger structure of which
> > IndexBulkDeleteResult is just the first field.  This generally
> > provides a way for ambulkdelete to communicate additional private data
> > to amvacuumcleanup.  We use this idea in the gist index, see how
> > gistbulkdelete and gistvacuumcleanup works. The current design won't
> > work for such cases.

Indeed. That's a very good point. Thank you for pointing out.

> >
>
> Today, I looked at gistbulkdelete and gistvacuumcleanup closely and I
> have a few observations about those which might help us to solve this
> problem for gist indexes:
> 1. Are we using memory context GistBulkDeleteResult->page_set_context?
>  It seems to me it is not being used.

Yes I also think this memory context is not being used.

> 2. Each time we perform gistbulkdelete, we always seem to reset the
> GistBulkDeleteResult stats, see gistvacuumscan.  So, how will it
> accumulate it for the cleanup phase when the vacuum needs to call
> gistbulkdelete multiple times because the available space for
> dead-tuple is filled.  It seems to me like we only use the stats from
> the very last call to gistbulkdelete.

I think you're right. gistbulkdelete scans all pages and collects all
internal pages and all empty pages. And then in gistvacuumcleanup it
uses them to unlink all empty pages. Currently it accumulates such
information over multiple gistbulkdelete calls due to missing
switching the memory context but I guess this code intends to use them
only from the very last call to gistbulkdelete.

> 3. Do we really need to give the responsibility of deleting empty
> pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
> do it in gistbulkdelte?  I see one advantage of postponing it till the
> cleanup phase which is if somehow we can accumulate stats over
> multiple calls of gistbulkdelete, but I am not sure if it is feasible.
> At least, the way current code works, it seems that there is no
> advantage to postpone deleting empty pages till the cleanup phase.
>

Considering the current strategy of page deletion of gist index the
advantage of postponing the page deletion till the cleanup phase is
that we can do the bulk deletion in cleanup phase which is called at
most once. But I wonder if we can do the page deletion in the similar
way to btree index. Or even we use the current strategy I think we can
do that while not passing the pages information from bulkdelete to
vacuumcleanup using by GistBulkDeleteResult.

> If we avoid postponing deleting empty pages till the cleanup phase,
> then we don't have the problem for gist indexes.

Yes. But considering your pointing out I guess that there might be
other index AMs use the stats returned from bulkdelete in the similar
way to gist index (i.e. using more larger structure of which
IndexBulkDeleteResult is just the first field). If we have the same
concern the parallel vacuum still needs to deal with that as you
mentioned.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Tue, Oct 15, 2019 at 10:34 AM Masahiko Sawada <> wrote:
>
> On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <> wrote:
> >
>
> > 3. Do we really need to give the responsibility of deleting empty
> > pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
> > do it in gistbulkdelte?  I see one advantage of postponing it till the
> > cleanup phase which is if somehow we can accumulate stats over
> > multiple calls of gistbulkdelete, but I am not sure if it is feasible.
> > At least, the way current code works, it seems that there is no
> > advantage to postpone deleting empty pages till the cleanup phase.
> >
>
> Considering the current strategy of page deletion of gist index the
> advantage of postponing the page deletion till the cleanup phase is
> that we can do the bulk deletion in cleanup phase which is called at
> most once. But I wonder if we can do the page deletion in the similar
> way to btree index.
>

I think there might be some advantage of the current strategy due to
which it has been chosen.  I was going through the development thread
and noticed some old email which points something related to this.
See [1].

> Or even we use the current strategy I think we can
> do that while not passing the pages information from bulkdelete to
> vacuumcleanup using by GistBulkDeleteResult.
>

Yeah, I also think so.  I have started a new thread [2] to know the
opinion of others on this matter.

> > If we avoid postponing deleting empty pages till the cleanup phase,
> > then we don't have the problem for gist indexes.
>
> Yes. But considering your pointing out I guess that there might be
> other index AMs use the stats returned from bulkdelete in the similar
> way to gist index (i.e. using more larger structure of which
> IndexBulkDeleteResult is just the first field). If we have the same
> concern the parallel vacuum still needs to deal with that as you
> mentioned.
>

Right, apart from some functions for memory allocation/estimation and
stats copy, we might need something like amcanparallelvacuum, so that
index methods can have the option to not participate in parallel
vacuum due to reasons similar to gist or something else.  I think we
can work towards this direction as this anyway seems to be required
and till we reach any conclusion for gist indexes, you can mark
amcanparallelvacuum for gist indexes as false.

[1] - https://www.postgresql.org/message-id/8548498B-6EC6-4C89-8313-107BEC437489%40yandex-team.ru
[2] - https://www.postgresql.org/message-id/CAA4eK1LGr%2BMN0xHZpJ2dfS8QNQ1a_aROKowZB%2BMPNep8FVtwAA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Oct 15, 2019 at 3:55 PM Amit Kapila <> wrote:
>
> On Tue, Oct 15, 2019 at 10:34 AM Masahiko Sawada <> wrote:
> >
> > On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <> wrote:
> > >
> >
> > > 3. Do we really need to give the responsibility of deleting empty
> > > pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
> > > do it in gistbulkdelte?  I see one advantage of postponing it till the
> > > cleanup phase which is if somehow we can accumulate stats over
> > > multiple calls of gistbulkdelete, but I am not sure if it is feasible.
> > > At least, the way current code works, it seems that there is no
> > > advantage to postpone deleting empty pages till the cleanup phase.
> > >
> >
> > Considering the current strategy of page deletion of gist index the
> > advantage of postponing the page deletion till the cleanup phase is
> > that we can do the bulk deletion in cleanup phase which is called at
> > most once. But I wonder if we can do the page deletion in the similar
> > way to btree index.
> >
>
> I think there might be some advantage of the current strategy due to
> which it has been chosen.  I was going through the development thread
> and noticed some old email which points something related to this.
> See [1].

Thanks.

>
> > Or even we use the current strategy I think we can
> > do that while not passing the pages information from bulkdelete to
> > vacuumcleanup using by GistBulkDeleteResult.
> >
>
> Yeah, I also think so.  I have started a new thread [2] to know the
> opinion of others on this matter.
>

Thank you.

> > > If we avoid postponing deleting empty pages till the cleanup phase,
> > > then we don't have the problem for gist indexes.
> >
> > Yes. But considering your pointing out I guess that there might be
> > other index AMs use the stats returned from bulkdelete in the similar
> > way to gist index (i.e. using more larger structure of which
> > IndexBulkDeleteResult is just the first field). If we have the same
> > concern the parallel vacuum still needs to deal with that as you
> > mentioned.
> >
>
> Right, apart from some functions for memory allocation/estimation and
> stats copy, we might need something like amcanparallelvacuum, so that
> index methods can have the option to not participate in parallel
> vacuum due to reasons similar to gist or something else.  I think we
> can work towards this direction as this anyway seems to be required
> and till we reach any conclusion for gist indexes, you can mark
> amcanparallelvacuum for gist indexes as false.

Agreed. I'll create a separate patch to add this callback and change
parallel vacuum patch so that it checks the participation of indexes
and then vacuums on un-participated indexes after parallel vacuum.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Oct 15, 2019 at 4:15 PM Masahiko Sawada <> wrote:
>
> On Tue, Oct 15, 2019 at 3:55 PM Amit Kapila <> wrote:
> >
> > On Tue, Oct 15, 2019 at 10:34 AM Masahiko Sawada <> wrote:
> > >
> > > On Mon, Oct 14, 2019 at 6:37 PM Amit Kapila <> wrote:
> > > >
> > >
> > > > 3. Do we really need to give the responsibility of deleting empty
> > > > pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup.  Can't we
> > > > do it in gistbulkdelte?  I see one advantage of postponing it till the
> > > > cleanup phase which is if somehow we can accumulate stats over
> > > > multiple calls of gistbulkdelete, but I am not sure if it is feasible.
> > > > At least, the way current code works, it seems that there is no
> > > > advantage to postpone deleting empty pages till the cleanup phase.
> > > >
> > >
> > > Considering the current strategy of page deletion of gist index the
> > > advantage of postponing the page deletion till the cleanup phase is
> > > that we can do the bulk deletion in cleanup phase which is called at
> > > most once. But I wonder if we can do the page deletion in the similar
> > > way to btree index.
> > >
> >
> > I think there might be some advantage of the current strategy due to
> > which it has been chosen.  I was going through the development thread
> > and noticed some old email which points something related to this.
> > See [1].
>
> Thanks.
>
> >
> > > Or even we use the current strategy I think we can
> > > do that while not passing the pages information from bulkdelete to
> > > vacuumcleanup using by GistBulkDeleteResult.
> > >
> >
> > Yeah, I also think so.  I have started a new thread [2] to know the
> > opinion of others on this matter.
> >
>
> Thank you.
>
> > > > If we avoid postponing deleting empty pages till the cleanup phase,
> > > > then we don't have the problem for gist indexes.
> > >
> > > Yes. But considering your pointing out I guess that there might be
> > > other index AMs use the stats returned from bulkdelete in the similar
> > > way to gist index (i.e. using more larger structure of which
> > > IndexBulkDeleteResult is just the first field). If we have the same
> > > concern the parallel vacuum still needs to deal with that as you
> > > mentioned.
> > >
> >
> > Right, apart from some functions for memory allocation/estimation and
> > stats copy, we might need something like amcanparallelvacuum, so that
> > index methods can have the option to not participate in parallel
> > vacuum due to reasons similar to gist or something else.  I think we
> > can work towards this direction as this anyway seems to be required
> > and till we reach any conclusion for gist indexes, you can mark
> > amcanparallelvacuum for gist indexes as false.
>
> Agreed. I'll create a separate patch to add this callback and change
> parallel vacuum patch so that it checks the participation of indexes
> and then vacuums on un-participated indexes after parallel vacuum.

amcanparallelvacuum is not necessary to be a callback, it can be a
boolean field of IndexAmRoutine.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Tue, Oct 15, 2019 at 12:25 PM Amit Kapila <> wrote:
>

> Right, apart from some functions for memory allocation/estimation and
> stats copy, we might need something like amcanparallelvacuum, so that
> index methods can have the option to not participate in parallel
> vacuum due to reasons similar to gist or something else.  I think we
> can work towards this direction as this anyway seems to be required
> and till we reach any conclusion for gist indexes, you can mark
> amcanparallelvacuum for gist indexes as false.
>
I think for estimating the size of the stat I suggest "amestimatestat"
or "amstatsize" and for copy stat data we can add "amcopystat"?  It
would be helpful to extend the parallel vacuum for the indexes which
has extended stats.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Tue, Oct 15, 2019 at 1:26 PM Masahiko Sawada <> wrote:
>
> On Tue, Oct 15, 2019 at 4:15 PM Masahiko Sawada <> wrote:
> >
> > > > > If we avoid postponing deleting empty pages till the cleanup phase,
> > > > > then we don't have the problem for gist indexes.
> > > >
> > > > Yes. But considering your pointing out I guess that there might be
> > > > other index AMs use the stats returned from bulkdelete in the similar
> > > > way to gist index (i.e. using more larger structure of which
> > > > IndexBulkDeleteResult is just the first field). If we have the same
> > > > concern the parallel vacuum still needs to deal with that as you
> > > > mentioned.
> > > >
> > >
> > > Right, apart from some functions for memory allocation/estimation and
> > > stats copy, we might need something like amcanparallelvacuum, so that
> > > index methods can have the option to not participate in parallel
> > > vacuum due to reasons similar to gist or something else.  I think we
> > > can work towards this direction as this anyway seems to be required
> > > and till we reach any conclusion for gist indexes, you can mark
> > > amcanparallelvacuum for gist indexes as false.
> >
> > Agreed. I'll create a separate patch to add this callback and change
> > parallel vacuum patch so that it checks the participation of indexes
> > and then vacuums on un-participated indexes after parallel vacuum.
>
> amcanparallelvacuum is not necessary to be a callback, it can be a
> boolean field of IndexAmRoutine.
>

Yes, it will be a boolean.  Note that for parallel-index scans, we
already have amcanparallel.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <> wrote:
>
> On Tue, Oct 15, 2019 at 1:26 PM Masahiko Sawada <> wrote:
> >
> > On Tue, Oct 15, 2019 at 4:15 PM Masahiko Sawada <> wrote:
> > >
> > > > > > If we avoid postponing deleting empty pages till the cleanup phase,
> > > > > > then we don't have the problem for gist indexes.
> > > > >
> > > > > Yes. But considering your pointing out I guess that there might be
> > > > > other index AMs use the stats returned from bulkdelete in the similar
> > > > > way to gist index (i.e. using more larger structure of which
> > > > > IndexBulkDeleteResult is just the first field). If we have the same
> > > > > concern the parallel vacuum still needs to deal with that as you
> > > > > mentioned.
> > > > >
> > > >
> > > > Right, apart from some functions for memory allocation/estimation and
> > > > stats copy, we might need something like amcanparallelvacuum, so that
> > > > index methods can have the option to not participate in parallel
> > > > vacuum due to reasons similar to gist or something else.  I think we
> > > > can work towards this direction as this anyway seems to be required
> > > > and till we reach any conclusion for gist indexes, you can mark
> > > > amcanparallelvacuum for gist indexes as false.
> > >
> > > Agreed. I'll create a separate patch to add this callback and change
> > > parallel vacuum patch so that it checks the participation of indexes
> > > and then vacuums on un-participated indexes after parallel vacuum.
> >
> > amcanparallelvacuum is not necessary to be a callback, it can be a
> > boolean field of IndexAmRoutine.
> >
>
> Yes, it will be a boolean.  Note that for parallel-index scans, we
> already have amcanparallel.
>

Attached updated patch set. 0001 patch introduces new index AM field
amcanparallelvacuum. All index AMs except for gist sets true for now.
0002 patch incorporated the all comments I got so far.

Regards,

--
Masahiko Sawada

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Wed, Oct 16, 2019 at 6:50 AM Masahiko Sawada <> wrote:
>
> On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <> wrote:
> >
>
> Attached updated patch set. 0001 patch introduces new index AM field
> amcanparallelvacuum. All index AMs except for gist sets true for now.
> 0002 patch incorporated the all comments I got so far.
>

I haven't studied the latest patch in detail, but it seems you are
still assuming that all indexes will have the same amount of shared
memory for index stats and copying it in the same way. I thought we
agreed that each index AM should do this on its own.  The basic
problem is as of now we see this problem only with the Gist index, but
some other index AM's could also have a similar problem.

Another major problem with previous and this patch version is that the
cost-based vacuum concept seems to be entirely broken.  Basically,
each parallel vacuum worker operates independently w.r.t vacuum delay
and cost.  Assume that the overall I/O allowed for vacuum operation is
X after which it will sleep for some time, reset the balance and
continue.  In the patch, each worker will be allowed to perform X
before which it can sleep and also there is no coordination for the
same with master backend.  This is somewhat similar to memory usage
problem, but a bit more tricky because here we can't easily split the
I/O for each of the worker.

One idea could be that we somehow map vacuum costing related
parameters to the shared memory (dsm) which the vacuum operation is
using and then allow workers to coordinate.  This way master and
worker processes will have the same view of balance cost and can act
accordingly.

The other idea could be that we come up with some smart way to split
the I/O among workers.  Initially, I thought we could try something as
we do for autovacuum workers (see autovac_balance_cost), but I think
that will require much more math.  Before launching workers, we need
to compute the remaining I/O (heap operation would have used
something) after which we need to sleep and continue the operation and
then somehow split it equally across workers.  Once the workers are
finished, then need to let master backend know how much I/O they have
consumed and then master backend can add it to it's current I/O
consumed.

I think this problem matters because the vacuum delay is useful for
large vacuums and this patch is trying to exactly solve that problem,
so we can't ignore this problem.  I am not yet sure what is the best
solution to this problem, but I think we need to do something for it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Wed, Oct 16, 2019 at 3:02 PM Amit Kapila <> wrote:
>
> On Wed, Oct 16, 2019 at 6:50 AM Masahiko Sawada <> wrote:
> >
> > On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <> wrote:
> > >
> >
> > Attached updated patch set. 0001 patch introduces new index AM field
> > amcanparallelvacuum. All index AMs except for gist sets true for now.
> > 0002 patch incorporated the all comments I got so far.
> >
>
> I haven't studied the latest patch in detail, but it seems you are
> still assuming that all indexes will have the same amount of shared
> memory for index stats and copying it in the same way.

Yeah I thought we agreed at least to have canparallelvacuum and if an
index AM cannot support parallel index vacuuming like gist, it returns
false.

> I thought we
> agreed that each index AM should do this on its own.  The basic
> problem is as of now we see this problem only with the Gist index, but
> some other index AM's could also have a similar problem.

Okay. I'm thinking we're going to have a new callback to ack index AMs
the size of the structure using within both ambulkdelete and
amvacuumcleanup. But copying it to DSM can be done by the core because
it knows how many bytes need to be copied to DSM. Is that okay?

>
> Another major problem with previous and this patch version is that the
> cost-based vacuum concept seems to be entirely broken.  Basically,
> each parallel vacuum worker operates independently w.r.t vacuum delay
> and cost.  Assume that the overall I/O allowed for vacuum operation is
> X after which it will sleep for some time, reset the balance and
> continue.  In the patch, each worker will be allowed to perform X
> before which it can sleep and also there is no coordination for the
> same with master backend.  This is somewhat similar to memory usage
> problem, but a bit more tricky because here we can't easily split the
> I/O for each of the worker.
>
> One idea could be that we somehow map vacuum costing related
> parameters to the shared memory (dsm) which the vacuum operation is
> using and then allow workers to coordinate.  This way master and
> worker processes will have the same view of balance cost and can act
> accordingly.
>
> The other idea could be that we come up with some smart way to split
> the I/O among workers.  Initially, I thought we could try something as
> we do for autovacuum workers (see autovac_balance_cost), but I think
> that will require much more math.  Before launching workers, we need
> to compute the remaining I/O (heap operation would have used
> something) after which we need to sleep and continue the operation and
> then somehow split it equally across workers.  Once the workers are
> finished, then need to let master backend know how much I/O they have
> consumed and then master backend can add it to it's current I/O
> consumed.
>
> I think this problem matters because the vacuum delay is useful for
> large vacuums and this patch is trying to exactly solve that problem,
> so we can't ignore this problem.  I am not yet sure what is the best
> solution to this problem, but I think we need to do something for it.
>

I guess that the concepts of vacuum delay contradicts the concepts of
parallel vacuum. The concepts of parallel vacuum would be to use more
resource to make vacuum faster. Vacuum delays balances I/O during
vacuum in order to avoid I/O spikes by vacuum but parallel vacuum
rather concentrates I/O in shorter duration. Since we need to share
the memory in entire system we need to deal with the memory issue but
disks are different.

If we need to deal with this problem how about just dividing
vacuum_cost_limit by the parallel degree and setting it to worker's
vacuum_cost_limit?

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Mahendra Singh
Дата:
Hi
I applied all 3 patches and ran regression test. I was getting one regression failure.

diff -U3 /home/mahendra/postgres_base_rp/postgres/src/test/regress/expected/vacuum.out /home/mahendra/postgres_base_rp/postgres/src/test/regress/results/vacuum.out
--- /home/mahendra/postgres_base_rp/postgres/src/test/regress/expected/vacuum.out 2019-10-17 10:01:58.138863802 +0530
+++ /home/mahendra/postgres_base_rp/postgres/src/test/regress/results/vacuum.out 2019-10-17 11:41:20.930699926 +0530
@@ -105,7 +105,7 @@
 CREATE TEMPORARY TABLE tmp (a int PRIMARY KEY);
 CREATE INDEX tmp_idx1 ON tmp (a);
 VACUUM (PARALLEL 1) tmp; -- error, cannot parallel vacuum temporary tables
-WARNING:  skipping "tmp" --- cannot parallel vacuum temporary tables
+WARNING:  skipping vacuum on "tmp" --- cannot vacuum temporary tables in parallel
 -- INDEX_CLEANUP option
 CREATE TABLE no_index_cleanup (i INT PRIMARY KEY, t TEXT);
 -- Use uncompressed data stored in toast.

It look likes that you changed warning message for temp table, but haven't updated expected out file.

Thanks and Regards
Mahendra Thalor

On Wed, 16 Oct 2019 at 06:50, Masahiko Sawada <> wrote:
On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <> wrote:
>
> On Tue, Oct 15, 2019 at 1:26 PM Masahiko Sawada <> wrote:
> >
> > On Tue, Oct 15, 2019 at 4:15 PM Masahiko Sawada <> wrote:
> > >
> > > > > > If we avoid postponing deleting empty pages till the cleanup phase,
> > > > > > then we don't have the problem for gist indexes.
> > > > >
> > > > > Yes. But considering your pointing out I guess that there might be
> > > > > other index AMs use the stats returned from bulkdelete in the similar
> > > > > way to gist index (i.e. using more larger structure of which
> > > > > IndexBulkDeleteResult is just the first field). If we have the same
> > > > > concern the parallel vacuum still needs to deal with that as you
> > > > > mentioned.
> > > > >
> > > >
> > > > Right, apart from some functions for memory allocation/estimation and
> > > > stats copy, we might need something like amcanparallelvacuum, so that
> > > > index methods can have the option to not participate in parallel
> > > > vacuum due to reasons similar to gist or something else.  I think we
> > > > can work towards this direction as this anyway seems to be required
> > > > and till we reach any conclusion for gist indexes, you can mark
> > > > amcanparallelvacuum for gist indexes as false.
> > >
> > > Agreed. I'll create a separate patch to add this callback and change
> > > parallel vacuum patch so that it checks the participation of indexes
> > > and then vacuums on un-participated indexes after parallel vacuum.
> >
> > amcanparallelvacuum is not necessary to be a callback, it can be a
> > boolean field of IndexAmRoutine.
> >
>
> Yes, it will be a boolean.  Note that for parallel-index scans, we
> already have amcanparallel.
>

Attached updated patch set. 0001 patch introduces new index AM field
amcanparallelvacuum. All index AMs except for gist sets true for now.
0002 patch incorporated the all comments I got so far.

Regards,

--
Masahiko Sawada

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Thu, Oct 17, 2019 at 3:18 PM Mahendra Singh <> wrote:
>
> Hi
> I applied all 3 patches and ran regression test. I was getting one regression failure.
>
>> diff -U3 /home/mahendra/postgres_base_rp/postgres/src/test/regress/expected/vacuum.out
/home/mahendra/postgres_base_rp/postgres/src/test/regress/results/vacuum.out
>> --- /home/mahendra/postgres_base_rp/postgres/src/test/regress/expected/vacuum.out 2019-10-17 10:01:58.138863802
+0530
>> +++ /home/mahendra/postgres_base_rp/postgres/src/test/regress/results/vacuum.out 2019-10-17 11:41:20.930699926
+0530
>> @@ -105,7 +105,7 @@
>>  CREATE TEMPORARY TABLE tmp (a int PRIMARY KEY);
>>  CREATE INDEX tmp_idx1 ON tmp (a);
>>  VACUUM (PARALLEL 1) tmp; -- error, cannot parallel vacuum temporary tables
>> -WARNING:  skipping "tmp" --- cannot parallel vacuum temporary tables
>> +WARNING:  skipping vacuum on "tmp" --- cannot vacuum temporary tables in parallel
>>  -- INDEX_CLEANUP option
>>  CREATE TABLE no_index_cleanup (i INT PRIMARY KEY, t TEXT);
>>  -- Use uncompressed data stored in toast.
>
>
> It look likes that you changed warning message for temp table, but haven't updated expected out file.
>

Thank you!
I forgot to change the expected file. I'll fix it in the next version patch.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Thu, Oct 17, 2019 at 10:56 AM Masahiko Sawada <> wrote:
>
> On Wed, Oct 16, 2019 at 3:02 PM Amit Kapila <> wrote:
> >
> > On Wed, Oct 16, 2019 at 6:50 AM Masahiko Sawada <> wrote:
> > >
> > > On Tue, Oct 15, 2019 at 6:33 PM Amit Kapila <> wrote:
> > > >
> > >
> > > Attached updated patch set. 0001 patch introduces new index AM field
> > > amcanparallelvacuum. All index AMs except for gist sets true for now.
> > > 0002 patch incorporated the all comments I got so far.
> > >
> >
> > I haven't studied the latest patch in detail, but it seems you are
> > still assuming that all indexes will have the same amount of shared
> > memory for index stats and copying it in the same way.
>
> Yeah I thought we agreed at least to have canparallelvacuum and if an
> index AM cannot support parallel index vacuuming like gist, it returns
> false.
>
> > I thought we
> > agreed that each index AM should do this on its own.  The basic
> > problem is as of now we see this problem only with the Gist index, but
> > some other index AM's could also have a similar problem.
>
> Okay. I'm thinking we're going to have a new callback to ack index AMs
> the size of the structure using within both ambulkdelete and
> amvacuumcleanup. But copying it to DSM can be done by the core because
> it knows how many bytes need to be copied to DSM. Is that okay?
>

That sounds okay.

> >
> > Another major problem with previous and this patch version is that the
> > cost-based vacuum concept seems to be entirely broken.  Basically,
> > each parallel vacuum worker operates independently w.r.t vacuum delay
> > and cost.  Assume that the overall I/O allowed for vacuum operation is
> > X after which it will sleep for some time, reset the balance and
> > continue.  In the patch, each worker will be allowed to perform X
> > before which it can sleep and also there is no coordination for the
> > same with master backend.  This is somewhat similar to memory usage
> > problem, but a bit more tricky because here we can't easily split the
> > I/O for each of the worker.
> >
> > One idea could be that we somehow map vacuum costing related
> > parameters to the shared memory (dsm) which the vacuum operation is
> > using and then allow workers to coordinate.  This way master and
> > worker processes will have the same view of balance cost and can act
> > accordingly.
> >
> > The other idea could be that we come up with some smart way to split
> > the I/O among workers.  Initially, I thought we could try something as
> > we do for autovacuum workers (see autovac_balance_cost), but I think
> > that will require much more math.  Before launching workers, we need
> > to compute the remaining I/O (heap operation would have used
> > something) after which we need to sleep and continue the operation and
> > then somehow split it equally across workers.  Once the workers are
> > finished, then need to let master backend know how much I/O they have
> > consumed and then master backend can add it to it's current I/O
> > consumed.
> >
> > I think this problem matters because the vacuum delay is useful for
> > large vacuums and this patch is trying to exactly solve that problem,
> > so we can't ignore this problem.  I am not yet sure what is the best
> > solution to this problem, but I think we need to do something for it.
> >
>
> I guess that the concepts of vacuum delay contradicts the concepts of
> parallel vacuum. The concepts of parallel vacuum would be to use more
> resource to make vacuum faster. Vacuum delays balances I/O during
> vacuum in order to avoid I/O spikes by vacuum but parallel vacuum
> rather concentrates I/O in shorter duration.
>

You have a point, but the way it is currently working in the patch
doesn't make much sense.  Basically, each of the parallel workers will
be allowed to use a complete I/O limit which is actually a limit for
the entire vacuum operation.  It doesn't give any consideration to the
work done for the heap.

> Since we need to share
> the memory in entire system we need to deal with the memory issue but
> disks are different.
>
> If we need to deal with this problem how about just dividing
> vacuum_cost_limit by the parallel degree and setting it to worker's
> vacuum_cost_limit?
>

How will we take the I/O done by heap into consideration?  The
vacuum_cost_limit is the cost for the entire vacuum operation not
separately for heap and indexes.  What makes you think that
considering the limit for heap and index separately is not
problematic?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Thu, Oct 17, 2019 at 12:21 PM Amit Kapila <> wrote:
>
> On Thu, Oct 17, 2019 at 10:56 AM Masahiko Sawada <> wrote:
> >
> > I guess that the concepts of vacuum delay contradicts the concepts of
> > parallel vacuum. The concepts of parallel vacuum would be to use more
> > resource to make vacuum faster. Vacuum delays balances I/O during
> > vacuum in order to avoid I/O spikes by vacuum but parallel vacuum
> > rather concentrates I/O in shorter duration.
> >
>
> You have a point, but the way it is currently working in the patch
> doesn't make much sense.
>

Another point in this regard is that the user anyway has an option to
turn off the cost-based vacuum.  By default, it is anyway disabled.
So, if the user enables it we have to provide some sensible behavior.
If we can't come up with anything, then, in the end, we might want to
turn it off for a parallel vacuum and mention the same in docs, but I
think we should try to come up with a solution for it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <> wrote:
>
> On Thu, Oct 17, 2019 at 12:21 PM Amit Kapila <> wrote:
> >
> > On Thu, Oct 17, 2019 at 10:56 AM Masahiko Sawada <> wrote:
> > >
> > > I guess that the concepts of vacuum delay contradicts the concepts of
> > > parallel vacuum. The concepts of parallel vacuum would be to use more
> > > resource to make vacuum faster. Vacuum delays balances I/O during
> > > vacuum in order to avoid I/O spikes by vacuum but parallel vacuum
> > > rather concentrates I/O in shorter duration.
> > >
> >
> > You have a point, but the way it is currently working in the patch
> > doesn't make much sense.
> >
>
> Another point in this regard is that the user anyway has an option to
> turn off the cost-based vacuum.  By default, it is anyway disabled.
> So, if the user enables it we have to provide some sensible behavior.
> If we can't come up with anything, then, in the end, we might want to
> turn it off for a parallel vacuum and mention the same in docs, but I
> think we should try to come up with a solution for it.

I finally got your point and now understood the need. And the idea I
proposed doesn't work fine.

So you meant that all workers share the cost count and if a parallel
vacuum worker increase the cost and it reaches the limit, does the
only one worker sleep? Is that okay even though other parallel workers
are still running and then the sleep might not help?

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <> wrote:
>
> On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <> wrote:
> >
> > On Thu, Oct 17, 2019 at 12:21 PM Amit Kapila <> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 10:56 AM Masahiko Sawada <> wrote:
> > > >
> > > > I guess that the concepts of vacuum delay contradicts the concepts of
> > > > parallel vacuum. The concepts of parallel vacuum would be to use more
> > > > resource to make vacuum faster. Vacuum delays balances I/O during
> > > > vacuum in order to avoid I/O spikes by vacuum but parallel vacuum
> > > > rather concentrates I/O in shorter duration.
> > > >
> > >
> > > You have a point, but the way it is currently working in the patch
> > > doesn't make much sense.
> > >
> >
> > Another point in this regard is that the user anyway has an option to
> > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > So, if the user enables it we have to provide some sensible behavior.
> > If we can't come up with anything, then, in the end, we might want to
> > turn it off for a parallel vacuum and mention the same in docs, but I
> > think we should try to come up with a solution for it.
>
> I finally got your point and now understood the need. And the idea I
> proposed doesn't work fine.
>
> So you meant that all workers share the cost count and if a parallel
> vacuum worker increase the cost and it reaches the limit, does the
> only one worker sleep? Is that okay even though other parallel workers
> are still running and then the sleep might not help?
>
I agree with this point.  There is a possibility that some of the
workers who are doing heavy I/O continue to work and OTOH other
workers who are doing very less I/O might become the victim and
unnecessarily delay its operation.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <> wrote:
>
> On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <> wrote:
> >
> > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <> wrote:
> > >
> > > Another point in this regard is that the user anyway has an option to
> > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > So, if the user enables it we have to provide some sensible behavior.
> > > If we can't come up with anything, then, in the end, we might want to
> > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > think we should try to come up with a solution for it.
> >
> > I finally got your point and now understood the need. And the idea I
> > proposed doesn't work fine.
> >
> > So you meant that all workers share the cost count and if a parallel
> > vacuum worker increase the cost and it reaches the limit, does the
> > only one worker sleep? Is that okay even though other parallel workers
> > are still running and then the sleep might not help?
> >

Remember that the other running workers will also increase
VacuumCostBalance and whichever worker finds that it becomes greater
than VacuumCostLimit will reset its value and sleep.  So, won't this
make sure that overall throttling works the same?

> I agree with this point.  There is a possibility that some of the
> workers who are doing heavy I/O continue to work and OTOH other
> workers who are doing very less I/O might become the victim and
> unnecessarily delay its operation.
>

Sure, but will it impact the overall I/O?  I mean to say the rate
limit we want to provide for overall vacuum operation will still be
the same.  Also, isn't a similar thing happens now also where heap
might have done a major portion of I/O but soon after we start
vacuuming the index, we will hit the limit and will sleep.

I think this might not be the perfect solution and we should try to
come up with something else if this doesn't seem to be working.  Have
you guys thought about the second solution I mentioned in email [1]
(Before launching workers, we need to compute the remaining I/O ....)?
 Any other better ideas?

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BySETHCaCnAsEC-dC4GSXaE2sNGMOgD6J%3DX%2BN43bBqJQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <> wrote:
>
> On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <> wrote:
> >
> > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <> wrote:
> > > >
> > > > Another point in this regard is that the user anyway has an option to
> > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > So, if the user enables it we have to provide some sensible behavior.
> > > > If we can't come up with anything, then, in the end, we might want to
> > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > think we should try to come up with a solution for it.
> > >
> > > I finally got your point and now understood the need. And the idea I
> > > proposed doesn't work fine.
> > >
> > > So you meant that all workers share the cost count and if a parallel
> > > vacuum worker increase the cost and it reaches the limit, does the
> > > only one worker sleep? Is that okay even though other parallel workers
> > > are still running and then the sleep might not help?
> > >
>
> Remember that the other running workers will also increase
> VacuumCostBalance and whichever worker finds that it becomes greater
> than VacuumCostLimit will reset its value and sleep.  So, won't this
> make sure that overall throttling works the same?
>
> > I agree with this point.  There is a possibility that some of the
> > workers who are doing heavy I/O continue to work and OTOH other
> > workers who are doing very less I/O might become the victim and
> > unnecessarily delay its operation.
> >
>
> Sure, but will it impact the overall I/O?  I mean to say the rate
> limit we want to provide for overall vacuum operation will still be
> the same.  Also, isn't a similar thing happens now also where heap
> might have done a major portion of I/O but soon after we start
> vacuuming the index, we will hit the limit and will sleep.

Actually, What I meant is that the worker who performing actual I/O
might not go for the delay and another worker which has done only CPU
operation might pay the penalty?  So basically the worker who is doing
CPU intensive operation might go for the delay and pay the penalty and
the worker who is performing actual I/O continues to work and do
further I/O.  Do you think this is not a practical problem?

Stepping back a bit,  OTOH, I think that we can not guarantee that the
one worker who has done more I/O will continue to do further I/O and
the one which has not done much I/O will not perform more I/O in
future.  So it might not be too bad if we compute shared costs as you
suggested above.

>
> I think this might not be the perfect solution and we should try to
> come up with something else if this doesn't seem to be working.  Have
> you guys thought about the second solution I mentioned in email [1]
> (Before launching workers, we need to compute the remaining I/O ....)?
>  Any other better ideas?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <> wrote:
>
> On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <> wrote:
> >
> > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <> wrote:
> > > >
> > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <> wrote:
> > > > >
> > > > > Another point in this regard is that the user anyway has an option to
> > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > think we should try to come up with a solution for it.
> > > >
> > > > I finally got your point and now understood the need. And the idea I
> > > > proposed doesn't work fine.
> > > >
> > > > So you meant that all workers share the cost count and if a parallel
> > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > only one worker sleep? Is that okay even though other parallel workers
> > > > are still running and then the sleep might not help?
> > > >
> >
> > Remember that the other running workers will also increase
> > VacuumCostBalance and whichever worker finds that it becomes greater
> > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > make sure that overall throttling works the same?
> >
> > > I agree with this point.  There is a possibility that some of the
> > > workers who are doing heavy I/O continue to work and OTOH other
> > > workers who are doing very less I/O might become the victim and
> > > unnecessarily delay its operation.
> > >
> >
> > Sure, but will it impact the overall I/O?  I mean to say the rate
> > limit we want to provide for overall vacuum operation will still be
> > the same.  Also, isn't a similar thing happens now also where heap
> > might have done a major portion of I/O but soon after we start
> > vacuuming the index, we will hit the limit and will sleep.
>
> Actually, What I meant is that the worker who performing actual I/O
> might not go for the delay and another worker which has done only CPU
> operation might pay the penalty?  So basically the worker who is doing
> CPU intensive operation might go for the delay and pay the penalty and
> the worker who is performing actual I/O continues to work and do
> further I/O.  Do you think this is not a practical problem?
>

I don't know.  Generally, we try to delay (if required) before
processing (read/write) one page which means it will happen for I/O
intensive operations, so I am not sure if the point you are making is
completely correct.

> Stepping back a bit,  OTOH, I think that we can not guarantee that the
> one worker who has done more I/O will continue to do further I/O and
> the one which has not done much I/O will not perform more I/O in
> future.  So it might not be too bad if we compute shared costs as you
> suggested above.
>

I am thinking if we can write the patch for both the approaches (a.
compute shared costs and try to delay based on that, b. try to divide
the I/O cost among workers as described in the email above[1]) and do
some tests to see the behavior of throttling, that might help us in
deciding what is the best strategy to solve this problem, if any.
What do you think?


[1] - https://www.postgresql.org/message-id/CAA4eK1%2BySETHCaCnAsEC-dC4GSXaE2sNGMOgD6J%3DX%2BN43bBqJQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
>
> On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <> wrote:
> >
> > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <> wrote:
> > > >
> > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <> wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <> wrote:
> > > > > >
> > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > think we should try to come up with a solution for it.
> > > > >
> > > > > I finally got your point and now understood the need. And the idea I
> > > > > proposed doesn't work fine.
> > > > >
> > > > > So you meant that all workers share the cost count and if a parallel
> > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > are still running and then the sleep might not help?
> > > > >
> > >
> > > Remember that the other running workers will also increase
> > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > make sure that overall throttling works the same?
> > >
> > > > I agree with this point.  There is a possibility that some of the
> > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > workers who are doing very less I/O might become the victim and
> > > > unnecessarily delay its operation.
> > > >
> > >
> > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > limit we want to provide for overall vacuum operation will still be
> > > the same.  Also, isn't a similar thing happens now also where heap
> > > might have done a major portion of I/O but soon after we start
> > > vacuuming the index, we will hit the limit and will sleep.
> >
> > Actually, What I meant is that the worker who performing actual I/O
> > might not go for the delay and another worker which has done only CPU
> > operation might pay the penalty?  So basically the worker who is doing
> > CPU intensive operation might go for the delay and pay the penalty and
> > the worker who is performing actual I/O continues to work and do
> > further I/O.  Do you think this is not a practical problem?
> >
>
> I don't know.  Generally, we try to delay (if required) before
> processing (read/write) one page which means it will happen for I/O
> intensive operations, so I am not sure if the point you are making is
> completely correct.

Ok, I agree with the point that we are checking it only when we are
doing the I/O operation.  But, we also need to consider that each I/O
operations have a different weightage.  So even if we have a delay
point at I/O operation there is a possibility that we might delay the
worker which is just performing read buffer with page
hit(VacuumCostPageHit).  But, the other worker who is actually
dirtying the page(VacuumCostPageDirty = 20) continue the work and do
more I/O.

>
> > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > one worker who has done more I/O will continue to do further I/O and
> > the one which has not done much I/O will not perform more I/O in
> > future.  So it might not be too bad if we compute shared costs as you
> > suggested above.
> >
>
> I am thinking if we can write the patch for both the approaches (a.
> compute shared costs and try to delay based on that, b. try to divide
> the I/O cost among workers as described in the email above[1]) and do
> some tests to see the behavior of throttling, that might help us in
> deciding what is the best strategy to solve this problem, if any.
> What do you think?

I agree with this idea.  I can come up with a POC patch for approach
(b).  Meanwhile, if someone is interested to quickly hack with the
approach (a) then we can do some testing and compare.  Sawada-san,
by any chance will you be interested to write POC with approach (a)?
Otherwise, I will try to write it after finishing the first one
(approach b).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Oct 18, 2019 at 3:48 PM Dilip Kumar <> wrote:
>
> On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> >
> > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <> wrote:
> > > >
> > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <> wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <> wrote:
> > > > > > >
> > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > think we should try to come up with a solution for it.
> > > > > >
> > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > proposed doesn't work fine.
> > > > > >
> > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > are still running and then the sleep might not help?
> > > > > >
> > > >
> > > > Remember that the other running workers will also increase
> > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > make sure that overall throttling works the same?
> > > >
> > > > > I agree with this point.  There is a possibility that some of the
> > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > workers who are doing very less I/O might become the victim and
> > > > > unnecessarily delay its operation.
> > > > >
> > > >
> > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > limit we want to provide for overall vacuum operation will still be
> > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > might have done a major portion of I/O but soon after we start
> > > > vacuuming the index, we will hit the limit and will sleep.
> > >
> > > Actually, What I meant is that the worker who performing actual I/O
> > > might not go for the delay and another worker which has done only CPU
> > > operation might pay the penalty?  So basically the worker who is doing
> > > CPU intensive operation might go for the delay and pay the penalty and
> > > the worker who is performing actual I/O continues to work and do
> > > further I/O.  Do you think this is not a practical problem?
> > >
> >
> > I don't know.  Generally, we try to delay (if required) before
> > processing (read/write) one page which means it will happen for I/O
> > intensive operations, so I am not sure if the point you are making is
> > completely correct.
>
> Ok, I agree with the point that we are checking it only when we are
> doing the I/O operation.  But, we also need to consider that each I/O
> operations have a different weightage.  So even if we have a delay
> point at I/O operation there is a possibility that we might delay the
> worker which is just performing read buffer with page
> hit(VacuumCostPageHit).  But, the other worker who is actually
> dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> more I/O.
>
> >
> > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > one worker who has done more I/O will continue to do further I/O and
> > > the one which has not done much I/O will not perform more I/O in
> > > future.  So it might not be too bad if we compute shared costs as you
> > > suggested above.
> > >
> >
> > I am thinking if we can write the patch for both the approaches (a.
> > compute shared costs and try to delay based on that, b. try to divide
> > the I/O cost among workers as described in the email above[1]) and do
> > some tests to see the behavior of throttling, that might help us in
> > deciding what is the best strategy to solve this problem, if any.
> > What do you think?
>
> I agree with this idea.  I can come up with a POC patch for approach
> (b).  Meanwhile, if someone is interested to quickly hack with the
> approach (a) then we can do some testing and compare.  Sawada-san,
> by any chance will you be interested to write POC with approach (a)?

Yes, I will try to write the PoC patch with approach (a).

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
>
> On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> >
> > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <> wrote:
> > >
> > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <> wrote:
> > > >
> > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <> wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <> wrote:
> > > > > > >
> > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > think we should try to come up with a solution for it.
> > > > > >
> > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > proposed doesn't work fine.
> > > > > >
> > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > are still running and then the sleep might not help?
> > > > > >
> > > >
> > > > Remember that the other running workers will also increase
> > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > make sure that overall throttling works the same?
> > > >
> > > > > I agree with this point.  There is a possibility that some of the
> > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > workers who are doing very less I/O might become the victim and
> > > > > unnecessarily delay its operation.
> > > > >
> > > >
> > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > limit we want to provide for overall vacuum operation will still be
> > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > might have done a major portion of I/O but soon after we start
> > > > vacuuming the index, we will hit the limit and will sleep.
> > >
> > > Actually, What I meant is that the worker who performing actual I/O
> > > might not go for the delay and another worker which has done only CPU
> > > operation might pay the penalty?  So basically the worker who is doing
> > > CPU intensive operation might go for the delay and pay the penalty and
> > > the worker who is performing actual I/O continues to work and do
> > > further I/O.  Do you think this is not a practical problem?
> > >
> >
> > I don't know.  Generally, we try to delay (if required) before
> > processing (read/write) one page which means it will happen for I/O
> > intensive operations, so I am not sure if the point you are making is
> > completely correct.
>
> Ok, I agree with the point that we are checking it only when we are
> doing the I/O operation.  But, we also need to consider that each I/O
> operations have a different weightage.  So even if we have a delay
> point at I/O operation there is a possibility that we might delay the
> worker which is just performing read buffer with page
> hit(VacuumCostPageHit).  But, the other worker who is actually
> dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> more I/O.
>
> >
> > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > one worker who has done more I/O will continue to do further I/O and
> > > the one which has not done much I/O will not perform more I/O in
> > > future.  So it might not be too bad if we compute shared costs as you
> > > suggested above.
> > >
> >
> > I am thinking if we can write the patch for both the approaches (a.
> > compute shared costs and try to delay based on that, b. try to divide
> > the I/O cost among workers as described in the email above[1]) and do
> > some tests to see the behavior of throttling, that might help us in
> > deciding what is the best strategy to solve this problem, if any.
> > What do you think?
>
> I agree with this idea.  I can come up with a POC patch for approach
> (b).  Meanwhile, if someone is interested to quickly hack with the
> approach (a) then we can do some testing and compare.  Sawada-san,
> by any chance will you be interested to write POC with approach (a)?
> Otherwise, I will try to write it after finishing the first one
> (approach b).
>
I have come up with the POC for approach (a).

The idea is
1) Before launching the worker divide the current VacuumCostBalance
among workers so that workers start accumulating the balance from that
point.
2) Also, divide the VacuumCostLimit among the workers.
3) Once the worker are done with the index vacuum, send back the
remaining balance with the leader.
4) The leader will sum all the balances and add that to its current
VacuumCostBalance.  And start accumulating its balance from this
point.

I was trying to test how is the behaviour of the vacuum I/O limit, but
I could not find an easy way to test that so I just put the tracepoint
in the code and just checked that at what point we are giving the
delay.
I also printed the cost balance at various point to see that after how
much I/O accumulation we are hitting the delay.  Please feel free to
suggest a better way to test this.

I have printed these logs for parallel vacuum patch (v30) vs v(30) +
patch for dividing i/o limit (attached with the mail)

Note: Patch and the test results are attached.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <> wrote:
>
> On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> >
> > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > >
> > > I am thinking if we can write the patch for both the approaches (a.
> > > compute shared costs and try to delay based on that, b. try to divide
> > > the I/O cost among workers as described in the email above[1]) and do
> > > some tests to see the behavior of throttling, that might help us in
> > > deciding what is the best strategy to solve this problem, if any.
> > > What do you think?
> >
> > I agree with this idea.  I can come up with a POC patch for approach
> > (b).  Meanwhile, if someone is interested to quickly hack with the
> > approach (a) then we can do some testing and compare.  Sawada-san,
> > by any chance will you be interested to write POC with approach (a)?
> > Otherwise, I will try to write it after finishing the first one
> > (approach b).
> >
> I have come up with the POC for approach (a).
>

I think you mean to say approach (b).

> The idea is
> 1) Before launching the worker divide the current VacuumCostBalance
> among workers so that workers start accumulating the balance from that
> point.
> 2) Also, divide the VacuumCostLimit among the workers.
> 3) Once the worker are done with the index vacuum, send back the
> remaining balance with the leader.
> 4) The leader will sum all the balances and add that to its current
> VacuumCostBalance.  And start accumulating its balance from this
> point.
>
> I was trying to test how is the behaviour of the vacuum I/O limit, but
> I could not find an easy way to test that so I just put the tracepoint
> in the code and just checked that at what point we are giving the
> delay.
> I also printed the cost balance at various point to see that after how
> much I/O accumulation we are hitting the delay.  Please feel free to
> suggest a better way to test this.
>

Can we compute the overall throttling (sleep time) in the operation
separately for heap and index, then divide the index's sleep_time with
a number of workers and add it to heap's sleep time?  Then, it will be
a bit easier to compare the data between parallel and non-parallel
case.

> I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> patch for dividing i/o limit (attached with the mail)
>
> Note: Patch and the test results are attached.
>

I think it is always a good idea to summarize the results and tell
your conclusion about it.  AFAICT, it seems to me this technique as
done in patch might not work for the cases when there is an uneven
amount of work done by parallel workers (say the index sizes vary
(maybe due partial indexes or index column width or some other
reasons)).   The reason for it is that when the worker finishes it's
work we don't rebalance the cost among other workers.  Can we generate
such a test and see how it behaves?  I think it might be possible to
address this if it turns out to be a problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <> wrote:
>
> On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <> wrote:
> >
> > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > > >
> > > > I am thinking if we can write the patch for both the approaches (a.
> > > > compute shared costs and try to delay based on that, b. try to divide
> > > > the I/O cost among workers as described in the email above[1]) and do
> > > > some tests to see the behavior of throttling, that might help us in
> > > > deciding what is the best strategy to solve this problem, if any.
> > > > What do you think?
> > >
> > > I agree with this idea.  I can come up with a POC patch for approach
> > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > by any chance will you be interested to write POC with approach (a)?
> > > Otherwise, I will try to write it after finishing the first one
> > > (approach b).
> > >
> > I have come up with the POC for approach (a).
> >
>
> I think you mean to say approach (b).

Yeah, sorry for the confusion.  It's approach (b).
>
> > The idea is
> > 1) Before launching the worker divide the current VacuumCostBalance
> > among workers so that workers start accumulating the balance from that
> > point.
> > 2) Also, divide the VacuumCostLimit among the workers.
> > 3) Once the worker are done with the index vacuum, send back the
> > remaining balance with the leader.
> > 4) The leader will sum all the balances and add that to its current
> > VacuumCostBalance.  And start accumulating its balance from this
> > point.
> >
> > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > I could not find an easy way to test that so I just put the tracepoint
> > in the code and just checked that at what point we are giving the
> > delay.
> > I also printed the cost balance at various point to see that after how
> > much I/O accumulation we are hitting the delay.  Please feel free to
> > suggest a better way to test this.
> >
>
> Can we compute the overall throttling (sleep time) in the operation
> separately for heap and index, then divide the index's sleep_time with
> a number of workers and add it to heap's sleep time?  Then, it will be
> a bit easier to compare the data between parallel and non-parallel
> case.

Okay, I will try to do that.
>
> > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > patch for dividing i/o limit (attached with the mail)
> >
> > Note: Patch and the test results are attached.
> >
>
> I think it is always a good idea to summarize the results and tell
> your conclusion about it.  AFAICT, it seems to me this technique as
> done in patch might not work for the cases when there is an uneven
> amount of work done by parallel workers (say the index sizes vary
> (maybe due partial indexes or index column width or some other
> reasons)).   The reason for it is that when the worker finishes it's
> work we don't rebalance the cost among other workers.
Right, thats one problem I observed.
  Can we generate
> such a test and see how it behaves?  I think it might be possible to
> address this if it turns out to be a problem.
Yeah, we can address this by rebalancing the cost.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <> wrote:
>
> On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> >
> > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <> wrote:
> > > >
> > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <> wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <> wrote:
> > > > > > > >
> > > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > > think we should try to come up with a solution for it.
> > > > > > >
> > > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > > proposed doesn't work fine.
> > > > > > >
> > > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > > are still running and then the sleep might not help?
> > > > > > >
> > > > >
> > > > > Remember that the other running workers will also increase
> > > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > > make sure that overall throttling works the same?
> > > > >
> > > > > > I agree with this point.  There is a possibility that some of the
> > > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > > workers who are doing very less I/O might become the victim and
> > > > > > unnecessarily delay its operation.
> > > > > >
> > > > >
> > > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > > limit we want to provide for overall vacuum operation will still be
> > > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > > might have done a major portion of I/O but soon after we start
> > > > > vacuuming the index, we will hit the limit and will sleep.
> > > >
> > > > Actually, What I meant is that the worker who performing actual I/O
> > > > might not go for the delay and another worker which has done only CPU
> > > > operation might pay the penalty?  So basically the worker who is doing
> > > > CPU intensive operation might go for the delay and pay the penalty and
> > > > the worker who is performing actual I/O continues to work and do
> > > > further I/O.  Do you think this is not a practical problem?
> > > >
> > >
> > > I don't know.  Generally, we try to delay (if required) before
> > > processing (read/write) one page which means it will happen for I/O
> > > intensive operations, so I am not sure if the point you are making is
> > > completely correct.
> >
> > Ok, I agree with the point that we are checking it only when we are
> > doing the I/O operation.  But, we also need to consider that each I/O
> > operations have a different weightage.  So even if we have a delay
> > point at I/O operation there is a possibility that we might delay the
> > worker which is just performing read buffer with page
> > hit(VacuumCostPageHit).  But, the other worker who is actually
> > dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> > more I/O.
> >
> > >
> > > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > > one worker who has done more I/O will continue to do further I/O and
> > > > the one which has not done much I/O will not perform more I/O in
> > > > future.  So it might not be too bad if we compute shared costs as you
> > > > suggested above.
> > > >
> > >
> > > I am thinking if we can write the patch for both the approaches (a.
> > > compute shared costs and try to delay based on that, b. try to divide
> > > the I/O cost among workers as described in the email above[1]) and do
> > > some tests to see the behavior of throttling, that might help us in
> > > deciding what is the best strategy to solve this problem, if any.
> > > What do you think?
> >
> > I agree with this idea.  I can come up with a POC patch for approach
> > (b).  Meanwhile, if someone is interested to quickly hack with the
> > approach (a) then we can do some testing and compare.  Sawada-san,
> > by any chance will you be interested to write POC with approach (a)?
> > Otherwise, I will try to write it after finishing the first one
> > (approach b).
> >
> I have come up with the POC for approach (a).
>
> The idea is
> 1) Before launching the worker divide the current VacuumCostBalance
> among workers so that workers start accumulating the balance from that
> point.
> 2) Also, divide the VacuumCostLimit among the workers.
> 3) Once the worker are done with the index vacuum, send back the
> remaining balance with the leader.
> 4) The leader will sum all the balances and add that to its current
> VacuumCostBalance.  And start accumulating its balance from this
> point.
>
> I was trying to test how is the behaviour of the vacuum I/O limit, but
> I could not find an easy way to test that so I just put the tracepoint
> in the code and just checked that at what point we are giving the
> delay.
> I also printed the cost balance at various point to see that after how
> much I/O accumulation we are hitting the delay.  Please feel free to
> suggest a better way to test this.
>
> I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> patch for dividing i/o limit (attached with the mail)
>
> Note: Patch and the test results are attached.
>

Thank you!

For approach (a) the basic idea I've come up with is that we have a
shared balance value on DSM and each workers including the leader
process add its local balance value to it in vacuum_delay_point, and
then based on the shared value workers sleep. I'll submit that patch
with other updates.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <> wrote:
>
> On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <> wrote:
> >
> > I have come up with the POC for approach (a).
> >
> > The idea is
> > 1) Before launching the worker divide the current VacuumCostBalance
> > among workers so that workers start accumulating the balance from that
> > point.
> > 2) Also, divide the VacuumCostLimit among the workers.
> > 3) Once the worker are done with the index vacuum, send back the
> > remaining balance with the leader.
> > 4) The leader will sum all the balances and add that to its current
> > VacuumCostBalance.  And start accumulating its balance from this
> > point.
> >
> > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > I could not find an easy way to test that so I just put the tracepoint
> > in the code and just checked that at what point we are giving the
> > delay.
> > I also printed the cost balance at various point to see that after how
> > much I/O accumulation we are hitting the delay.  Please feel free to
> > suggest a better way to test this.
> >
> > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > patch for dividing i/o limit (attached with the mail)
> >
> > Note: Patch and the test results are attached.
> >
>
> Thank you!
>
> For approach (a) the basic idea I've come up with is that we have a
> shared balance value on DSM and each workers including the leader
> process add its local balance value to it in vacuum_delay_point, and
> then based on the shared value workers sleep. I'll submit that patch
> with other updates.
>

I think it would be better if we can prepare the I/O balance patches
on top of main patch and evaluate both approaches.  We can test both
the approaches and integrate the one which turned out to be good.

Note that, I will be away next week, so I won't be able to review your
latest patch unless you are planning to post today or tomorrow.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Fri, Oct 25, 2019 at 7:37 AM Amit Kapila <> wrote:
>
> On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <> wrote:
> >
> > On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <> wrote:
> > >
> > > I have come up with the POC for approach (a).
> > >
> > > The idea is
> > > 1) Before launching the worker divide the current VacuumCostBalance
> > > among workers so that workers start accumulating the balance from that
> > > point.
> > > 2) Also, divide the VacuumCostLimit among the workers.
> > > 3) Once the worker are done with the index vacuum, send back the
> > > remaining balance with the leader.
> > > 4) The leader will sum all the balances and add that to its current
> > > VacuumCostBalance.  And start accumulating its balance from this
> > > point.
> > >
> > > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > > I could not find an easy way to test that so I just put the tracepoint
> > > in the code and just checked that at what point we are giving the
> > > delay.
> > > I also printed the cost balance at various point to see that after how
> > > much I/O accumulation we are hitting the delay.  Please feel free to
> > > suggest a better way to test this.
> > >
> > > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > > patch for dividing i/o limit (attached with the mail)
> > >
> > > Note: Patch and the test results are attached.
> > >
> >
> > Thank you!
> >
> > For approach (a) the basic idea I've come up with is that we have a
> > shared balance value on DSM and each workers including the leader
> > process add its local balance value to it in vacuum_delay_point, and
> > then based on the shared value workers sleep. I'll submit that patch
> > with other updates.
> >
>
> I think it would be better if we can prepare the I/O balance patches
> on top of main patch and evaluate both approaches.  We can test both
> the approaches and integrate the one which turned out to be good.
>

Just to add something to testing both approaches.  I think we can
first come up with a way to compute the throttling vacuum does as
mentioned by me in one of the emails above [1] or in some other way.
I think Dilip is planning to give it a try and once we have that we
can evaluate both the patches.  Some of the tests I have in mind are:
a. All indexes have an equal amount of deleted data.
b. indexes have an uneven amount of deleted data.
c. try with mix of indexes (btree, gin, gist, hash, etc..) on a table.

Feel free to add more tests.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BPeiFLdTuwrE6CvbNdx80E-O%3DZxCuWB2maREKFD-RaCA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <> wrote:
>
> On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <> wrote:
> >
> > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <> wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <> wrote:
> > > > > > > >
> > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <> wrote:
> > > > > > > > >
> > > > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > > > think we should try to come up with a solution for it.
> > > > > > > >
> > > > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > > > proposed doesn't work fine.
> > > > > > > >
> > > > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > > > are still running and then the sleep might not help?
> > > > > > > >
> > > > > >
> > > > > > Remember that the other running workers will also increase
> > > > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > > > make sure that overall throttling works the same?
> > > > > >
> > > > > > > I agree with this point.  There is a possibility that some of the
> > > > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > > > workers who are doing very less I/O might become the victim and
> > > > > > > unnecessarily delay its operation.
> > > > > > >
> > > > > >
> > > > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > > > limit we want to provide for overall vacuum operation will still be
> > > > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > > > might have done a major portion of I/O but soon after we start
> > > > > > vacuuming the index, we will hit the limit and will sleep.
> > > > >
> > > > > Actually, What I meant is that the worker who performing actual I/O
> > > > > might not go for the delay and another worker which has done only CPU
> > > > > operation might pay the penalty?  So basically the worker who is doing
> > > > > CPU intensive operation might go for the delay and pay the penalty and
> > > > > the worker who is performing actual I/O continues to work and do
> > > > > further I/O.  Do you think this is not a practical problem?
> > > > >
> > > >
> > > > I don't know.  Generally, we try to delay (if required) before
> > > > processing (read/write) one page which means it will happen for I/O
> > > > intensive operations, so I am not sure if the point you are making is
> > > > completely correct.
> > >
> > > Ok, I agree with the point that we are checking it only when we are
> > > doing the I/O operation.  But, we also need to consider that each I/O
> > > operations have a different weightage.  So even if we have a delay
> > > point at I/O operation there is a possibility that we might delay the
> > > worker which is just performing read buffer with page
> > > hit(VacuumCostPageHit).  But, the other worker who is actually
> > > dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> > > more I/O.
> > >
> > > >
> > > > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > > > one worker who has done more I/O will continue to do further I/O and
> > > > > the one which has not done much I/O will not perform more I/O in
> > > > > future.  So it might not be too bad if we compute shared costs as you
> > > > > suggested above.
> > > > >
> > > >
> > > > I am thinking if we can write the patch for both the approaches (a.
> > > > compute shared costs and try to delay based on that, b. try to divide
> > > > the I/O cost among workers as described in the email above[1]) and do
> > > > some tests to see the behavior of throttling, that might help us in
> > > > deciding what is the best strategy to solve this problem, if any.
> > > > What do you think?
> > >
> > > I agree with this idea.  I can come up with a POC patch for approach
> > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > by any chance will you be interested to write POC with approach (a)?
> > > Otherwise, I will try to write it after finishing the first one
> > > (approach b).
> > >
> > I have come up with the POC for approach (a).
> >
> > The idea is
> > 1) Before launching the worker divide the current VacuumCostBalance
> > among workers so that workers start accumulating the balance from that
> > point.
> > 2) Also, divide the VacuumCostLimit among the workers.
> > 3) Once the worker are done with the index vacuum, send back the
> > remaining balance with the leader.
> > 4) The leader will sum all the balances and add that to its current
> > VacuumCostBalance.  And start accumulating its balance from this
> > point.
> >
> > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > I could not find an easy way to test that so I just put the tracepoint
> > in the code and just checked that at what point we are giving the
> > delay.
> > I also printed the cost balance at various point to see that after how
> > much I/O accumulation we are hitting the delay.  Please feel free to
> > suggest a better way to test this.
> >
> > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > patch for dividing i/o limit (attached with the mail)
> >
> > Note: Patch and the test results are attached.
> >
>
> Thank you!
>
> For approach (a) the basic idea I've come up with is that we have a
> shared balance value on DSM and each workers including the leader
> process add its local balance value to it in vacuum_delay_point, and
> then based on the shared value workers sleep. I'll submit that patch
> with other updates.
IMHO, if we add the local balance to the shared balance in
vacuum_delay_point and each worker is working with full limit then
there will be a problem right? because suppose VacuumCostLimit is 2000
then the first time each worker hit the vacuum_delay_point when their
local limit will be 2000 so in most cases, the first delay will be hit
when there gross I/O is 6000 (if there are 3 workers).

I think if we want to have the shared accounting then we must
accumulate the balance always in a shared variable so that as soon as
the gross limit hits the VacuumCostLimit, we can have the delay point.

Maybe we can do this
1. change VacuumCostBalance from integer to pg_atomic_uint32 *
2. In heap_parallel_vacuum_main function, make this point into a
shared memory location.  Basically, for the non-parallel case, it will
point to the process-specific global variable whereas in parallel case
it will point to a shared memory variable.
3. Now, I think in code (I think 5-6 occurrence) wherever we are using
VacuumCostBalance, change them to use atomic operations.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Oct 25, 2019 at 12:44 PM Dilip Kumar <> wrote:
>
> On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <> wrote:
> >
> > On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <> wrote:
> > > > > > > >
> > > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <> wrote:
> > > > > > > > > >
> > > > > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > > > > think we should try to come up with a solution for it.
> > > > > > > > >
> > > > > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > > > > proposed doesn't work fine.
> > > > > > > > >
> > > > > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > > > > are still running and then the sleep might not help?
> > > > > > > > >
> > > > > > >
> > > > > > > Remember that the other running workers will also increase
> > > > > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > > > > make sure that overall throttling works the same?
> > > > > > >
> > > > > > > > I agree with this point.  There is a possibility that some of the
> > > > > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > > > > workers who are doing very less I/O might become the victim and
> > > > > > > > unnecessarily delay its operation.
> > > > > > > >
> > > > > > >
> > > > > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > > > > limit we want to provide for overall vacuum operation will still be
> > > > > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > > > > might have done a major portion of I/O but soon after we start
> > > > > > > vacuuming the index, we will hit the limit and will sleep.
> > > > > >
> > > > > > Actually, What I meant is that the worker who performing actual I/O
> > > > > > might not go for the delay and another worker which has done only CPU
> > > > > > operation might pay the penalty?  So basically the worker who is doing
> > > > > > CPU intensive operation might go for the delay and pay the penalty and
> > > > > > the worker who is performing actual I/O continues to work and do
> > > > > > further I/O.  Do you think this is not a practical problem?
> > > > > >
> > > > >
> > > > > I don't know.  Generally, we try to delay (if required) before
> > > > > processing (read/write) one page which means it will happen for I/O
> > > > > intensive operations, so I am not sure if the point you are making is
> > > > > completely correct.
> > > >
> > > > Ok, I agree with the point that we are checking it only when we are
> > > > doing the I/O operation.  But, we also need to consider that each I/O
> > > > operations have a different weightage.  So even if we have a delay
> > > > point at I/O operation there is a possibility that we might delay the
> > > > worker which is just performing read buffer with page
> > > > hit(VacuumCostPageHit).  But, the other worker who is actually
> > > > dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> > > > more I/O.
> > > >
> > > > >
> > > > > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > > > > one worker who has done more I/O will continue to do further I/O and
> > > > > > the one which has not done much I/O will not perform more I/O in
> > > > > > future.  So it might not be too bad if we compute shared costs as you
> > > > > > suggested above.
> > > > > >
> > > > >
> > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > some tests to see the behavior of throttling, that might help us in
> > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > What do you think?
> > > >
> > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > by any chance will you be interested to write POC with approach (a)?
> > > > Otherwise, I will try to write it after finishing the first one
> > > > (approach b).
> > > >
> > > I have come up with the POC for approach (a).
> > >
> > > The idea is
> > > 1) Before launching the worker divide the current VacuumCostBalance
> > > among workers so that workers start accumulating the balance from that
> > > point.
> > > 2) Also, divide the VacuumCostLimit among the workers.
> > > 3) Once the worker are done with the index vacuum, send back the
> > > remaining balance with the leader.
> > > 4) The leader will sum all the balances and add that to its current
> > > VacuumCostBalance.  And start accumulating its balance from this
> > > point.
> > >
> > > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > > I could not find an easy way to test that so I just put the tracepoint
> > > in the code and just checked that at what point we are giving the
> > > delay.
> > > I also printed the cost balance at various point to see that after how
> > > much I/O accumulation we are hitting the delay.  Please feel free to
> > > suggest a better way to test this.
> > >
> > > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > > patch for dividing i/o limit (attached with the mail)
> > >
> > > Note: Patch and the test results are attached.
> > >
> >
> > Thank you!
> >
> > For approach (a) the basic idea I've come up with is that we have a
> > shared balance value on DSM and each workers including the leader
> > process add its local balance value to it in vacuum_delay_point, and
> > then based on the shared value workers sleep. I'll submit that patch
> > with other updates.
> IMHO, if we add the local balance to the shared balance in
> vacuum_delay_point and each worker is working with full limit then
> there will be a problem right? because suppose VacuumCostLimit is 2000
> then the first time each worker hit the vacuum_delay_point when their
> local limit will be 2000 so in most cases, the first delay will be hit
> when there gross I/O is 6000 (if there are 3 workers).

For more detail of my idea it is that the first worker who entered to
vacuum_delay_point adds its local value to shared value and reset the
local value to 0. And then the worker sleeps if it exceeds
VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
from the shared value. Since vacuum_delay_point are typically called
per page processed I expect there will not such problem. Thoughts?

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <> wrote:
>
> On Fri, Oct 25, 2019 at 12:44 PM Dilip Kumar <> wrote:
> >
> > On Thu, Oct 24, 2019 at 8:12 PM Masahiko Sawada <> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > > > > >
> > > > > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <> wrote:
> > > > > > > >
> > > > > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <> wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > > > > > turn off the cost-based vacuum.  By default, it is anyway disabled.
> > > > > > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > > > > > think we should try to come up with a solution for it.
> > > > > > > > > >
> > > > > > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > > > > > proposed doesn't work fine.
> > > > > > > > > >
> > > > > > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > > > > > are still running and then the sleep might not help?
> > > > > > > > > >
> > > > > > > >
> > > > > > > > Remember that the other running workers will also increase
> > > > > > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > > > > > than VacuumCostLimit will reset its value and sleep.  So, won't this
> > > > > > > > make sure that overall throttling works the same?
> > > > > > > >
> > > > > > > > > I agree with this point.  There is a possibility that some of the
> > > > > > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > > > > > workers who are doing very less I/O might become the victim and
> > > > > > > > > unnecessarily delay its operation.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Sure, but will it impact the overall I/O?  I mean to say the rate
> > > > > > > > limit we want to provide for overall vacuum operation will still be
> > > > > > > > the same.  Also, isn't a similar thing happens now also where heap
> > > > > > > > might have done a major portion of I/O but soon after we start
> > > > > > > > vacuuming the index, we will hit the limit and will sleep.
> > > > > > >
> > > > > > > Actually, What I meant is that the worker who performing actual I/O
> > > > > > > might not go for the delay and another worker which has done only CPU
> > > > > > > operation might pay the penalty?  So basically the worker who is doing
> > > > > > > CPU intensive operation might go for the delay and pay the penalty and
> > > > > > > the worker who is performing actual I/O continues to work and do
> > > > > > > further I/O.  Do you think this is not a practical problem?
> > > > > > >
> > > > > >
> > > > > > I don't know.  Generally, we try to delay (if required) before
> > > > > > processing (read/write) one page which means it will happen for I/O
> > > > > > intensive operations, so I am not sure if the point you are making is
> > > > > > completely correct.
> > > > >
> > > > > Ok, I agree with the point that we are checking it only when we are
> > > > > doing the I/O operation.  But, we also need to consider that each I/O
> > > > > operations have a different weightage.  So even if we have a delay
> > > > > point at I/O operation there is a possibility that we might delay the
> > > > > worker which is just performing read buffer with page
> > > > > hit(VacuumCostPageHit).  But, the other worker who is actually
> > > > > dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> > > > > more I/O.
> > > > >
> > > > > >
> > > > > > > Stepping back a bit,  OTOH, I think that we can not guarantee that the
> > > > > > > one worker who has done more I/O will continue to do further I/O and
> > > > > > > the one which has not done much I/O will not perform more I/O in
> > > > > > > future.  So it might not be too bad if we compute shared costs as you
> > > > > > > suggested above.
> > > > > > >
> > > > > >
> > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > What do you think?
> > > > >
> > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > Otherwise, I will try to write it after finishing the first one
> > > > > (approach b).
> > > > >
> > > > I have come up with the POC for approach (a).
> > > >
> > > > The idea is
> > > > 1) Before launching the worker divide the current VacuumCostBalance
> > > > among workers so that workers start accumulating the balance from that
> > > > point.
> > > > 2) Also, divide the VacuumCostLimit among the workers.
> > > > 3) Once the worker are done with the index vacuum, send back the
> > > > remaining balance with the leader.
> > > > 4) The leader will sum all the balances and add that to its current
> > > > VacuumCostBalance.  And start accumulating its balance from this
> > > > point.
> > > >
> > > > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > > > I could not find an easy way to test that so I just put the tracepoint
> > > > in the code and just checked that at what point we are giving the
> > > > delay.
> > > > I also printed the cost balance at various point to see that after how
> > > > much I/O accumulation we are hitting the delay.  Please feel free to
> > > > suggest a better way to test this.
> > > >
> > > > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > > > patch for dividing i/o limit (attached with the mail)
> > > >
> > > > Note: Patch and the test results are attached.
> > > >
> > >
> > > Thank you!
> > >
> > > For approach (a) the basic idea I've come up with is that we have a
> > > shared balance value on DSM and each workers including the leader
> > > process add its local balance value to it in vacuum_delay_point, and
> > > then based on the shared value workers sleep. I'll submit that patch
> > > with other updates.
> > IMHO, if we add the local balance to the shared balance in
> > vacuum_delay_point and each worker is working with full limit then
> > there will be a problem right? because suppose VacuumCostLimit is 2000
> > then the first time each worker hit the vacuum_delay_point when their
> > local limit will be 2000 so in most cases, the first delay will be hit
> > when there gross I/O is 6000 (if there are 3 workers).
>
> For more detail of my idea it is that the first worker who entered to
> vacuum_delay_point adds its local value to shared value and reset the
> local value to 0. And then the worker sleeps if it exceeds
> VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
> from the shared value. Since vacuum_delay_point are typically called
> per page processed I expect there will not such problem. Thoughts?

Oh right, I assumed that when the local balance is exceeding the
VacuumCostLimit that time you are adding it to the shared value but
you are adding it to to shared value every time in vacuum_delay_point.
So I think your idea is correct.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <> wrote:
>
> On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <> wrote:
> >
> > For more detail of my idea it is that the first worker who entered to
> > vacuum_delay_point adds its local value to shared value and reset the
> > local value to 0. And then the worker sleeps if it exceeds
> > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
> > from the shared value. Since vacuum_delay_point are typically called
> > per page processed I expect there will not such problem. Thoughts?
>
> Oh right, I assumed that when the local balance is exceeding the
> VacuumCostLimit that time you are adding it to the shared value but
> you are adding it to to shared value every time in vacuum_delay_point.
> So I think your idea is correct.

I've attached the updated patch set.

First three patches add new variables and a callback to index AM.

Next two patches are the main part to support parallel vacuum. I've
incorporated all review comments I got so far. The memory layout of
variable-length index statistics might be complex a bit. It's similar
to the format of heap tuple header, having a null bitmap. And both the
size of index statistics and actual data for each indexes follows.

Last patch is a PoC patch that implements the shared vacuum cost
balance. For now it's separated but after testing both approaches it
will be merged to 0004 patch. I'll test both next week.

This patch set can be applied on top of the patch[1] that improves
gist index bulk-deletion. So canparallelvacuum of gist index is true.

[1] https://www.postgresql.org/message-id/CAFiTN-uQY%2BB%2BCLb8W3YYdb7XmB9hyYFXkAy3C7RY%3D-YSWRV1DA%40mail.gmail.com

Regards,

--
Masahiko Sawada

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <> wrote:
>
> On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <> wrote:
> >
> > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <> wrote:
> > >
> > > For more detail of my idea it is that the first worker who entered to
> > > vacuum_delay_point adds its local value to shared value and reset the
> > > local value to 0. And then the worker sleeps if it exceeds
> > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
> > > from the shared value. Since vacuum_delay_point are typically called
> > > per page processed I expect there will not such problem. Thoughts?
> >
> > Oh right, I assumed that when the local balance is exceeding the
> > VacuumCostLimit that time you are adding it to the shared value but
> > you are adding it to to shared value every time in vacuum_delay_point.
> > So I think your idea is correct.
>
> I've attached the updated patch set.
>
> First three patches add new variables and a callback to index AM.
>
> Next two patches are the main part to support parallel vacuum. I've
> incorporated all review comments I got so far. The memory layout of
> variable-length index statistics might be complex a bit. It's similar
> to the format of heap tuple header, having a null bitmap. And both the
> size of index statistics and actual data for each indexes follows.
>
> Last patch is a PoC patch that implements the shared vacuum cost
> balance. For now it's separated but after testing both approaches it
> will be merged to 0004 patch. I'll test both next week.
>
> This patch set can be applied on top of the patch[1] that improves
> gist index bulk-deletion. So canparallelvacuum of gist index is true.
>
> [1] https://www.postgresql.org/message-id/CAFiTN-uQY%2BB%2BCLb8W3YYdb7XmB9hyYFXkAy3C7RY%3D-YSWRV1DA%40mail.gmail.com
>
I haven't yet read the new set of the patch.  But, I have noticed one
thing.  That we are getting the size of the statistics using the AM
routine.  But, we are copying those statistics from local memory to
the shared memory directly using the memcpy.   Wouldn't it be a good
idea to have an AM specific routine to get it copied from the local
memory to the shared memory?  I am not sure it is worth it or not but
my thought behind this point is that it will give AM to have local
stats in any form ( like they can store a pointer in that ) but they
can serialize that while copying to shared stats.  And, later when
shared stats are passed back to the Am then it can deserialize in its
local form and use it.

+ * Since all vacuum workers write the bulk-deletion result at
+ * different slots we can write them without locking.
+ */
+ if (!shared_indstats->updated && stats[idx] != NULL)
+ {
+ memcpy(bulkdelete_res, stats[idx], shared_indstats->size);
+ shared_indstats->updated = true;
+
+ /*
+ * no longer need the locally allocated result and now
+ * stats[idx] points to the DSM segment.
+ */
+ pfree(stats[idx]);
+ stats[idx] = bulkdelete_res;
+ }

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <> wrote:
>
> On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <> wrote:
> >
> > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > > > >
> > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > some tests to see the behavior of throttling, that might help us in
> > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > What do you think?
> > > >
> > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > by any chance will you be interested to write POC with approach (a)?
> > > > Otherwise, I will try to write it after finishing the first one
> > > > (approach b).
> > > >
> > > I have come up with the POC for approach (a).

> > Can we compute the overall throttling (sleep time) in the operation
> > separately for heap and index, then divide the index's sleep_time with
> > a number of workers and add it to heap's sleep time?  Then, it will be
> > a bit easier to compare the data between parallel and non-parallel
> > case.
I have come up with a patch to compute the total delay during the
vacuum.  So the idea of computing the total cost delay is

Total cost delay = Total dealy of heap scan + Total dealy of
index/worker;  Patch is attached for the same.

I have prepared this patch on the latest patch of the parallel
vacuum[1].  I have also rebased the patch for the approach [b] for
dividing the vacuum cost limit and done some testing for computing the
I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
and 0002-POC-divide-vacuum-cost-limit can be applied on top of
v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
rebased on top of v31-0006, because v31-0006 is implementing the I/O
throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
doing the same with another approach.   But,
0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
well (just 1-2 lines conflict).

Testing:  I have performed 2 tests, one with the same size indexes and
second with the different size indexes and measured total I/O delay
with the attached patch.

Setup:
VacuumCostDelay=10ms
VacuumCostLimit=2000

Test1 (Same size index):
create table test(a int, b varchar, c varchar);
create index idx1 on test(a);
create index idx2 on test(b);
create index idx3 on test(c);
insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
generate_series(1,500000) as i;
delete from test where a < 200000;

                      Vacuum (Head)                   Parallel Vacuum
           Vacuum Cost Divide Patch
Total Delay        1784 (ms)                           1398(ms)
                 1938(ms)


Test2 (Variable size dead tuple in index)
create table test(a int, b varchar, c varchar);
create index idx1 on test(a);
create index idx2 on test(b) where a > 100000;
create index idx3 on test(c) where a > 150000;

insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
generate_series(1,500000) as i;
delete from test where a < 200000;

Vacuum (Head)                                   Parallel Vacuum
              Vacuum Cost Divide Patch
Total Delay 1438 (ms)                               1029(ms)
                   1529(ms)


Conclusion:
1. The tests prove that the total I/O delay is significantly less with
the parallel vacuum.
2. With the vacuum cost divide the problem is solved but the delay bit
more compared to the non-parallel version.  The reason could be the
problem discussed at[2], but it needs further investigation.

Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
will also try to test different types of indexes.

[1] https://www.postgresql.org/message-id/CAD21AoBMo9dr_QmhT%3DdKh7fmiq7tpx%2ByLHR8nw9i5NZ-SgtaVg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAA4eK1%2BPeiFLdTuwrE6CvbNdx80E-O%3DZxCuWB2maREKFD-RaCA%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <> wrote:
>
> On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <> wrote:
> >
> >
> I haven't yet read the new set of the patch.  But, I have noticed one
> thing.  That we are getting the size of the statistics using the AM
> routine.  But, we are copying those statistics from local memory to
> the shared memory directly using the memcpy.   Wouldn't it be a good
> idea to have an AM specific routine to get it copied from the local
> memory to the shared memory?  I am not sure it is worth it or not but
> my thought behind this point is that it will give AM to have local
> stats in any form ( like they can store a pointer in that ) but they
> can serialize that while copying to shared stats.  And, later when
> shared stats are passed back to the Am then it can deserialize in its
> local form and use it.
>

You have a point, but after changing the gist index, we don't have any
current usage for indexes that need something like that. So, on one
side there is some value in having an API to copy the stats, but on
the other side without having clear usage of an API, it might not be
good to expose a new API for the same.   I think we can expose such an
API in the future if there is a need for the same.  Do you or anyone
know of any external IndexAM that has such a need?

Few minor comments while glancing through the latest patchset.

1. I think you can merge 0001*, 0002*, 0003* patch into one patch as
all three expose new variable/function from IndexAmRoutine.

2.
+prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
+{
+ char *p = (char *) GetSharedIndStats(lvshared);
+ int vac_work_mem = IsAutoVacuumWorkerProcess() &&
+ autovacuum_work_mem != -1 ?
+ autovacuum_work_mem : maintenance_work_mem;

I think this function won't be called from AutoVacuumWorkerProcess at
least not as of now, so isn't it a better idea to have an Assert for
it?

3.
+void
+heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)

This function is for performing a parallel operation on the index, so
why to start with heap?  It is better to name it as
index_parallel_vacuum_main or simply parallel_vacuum_main.

4.
/* useindex = true means two-pass strategy; false means one-pass */
@@ -128,17 +280,12 @@ typedef struct LVRelStats
  BlockNumber pages_removed;
  double tuples_deleted;
  BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
- /* List of TIDs of tuples we intend to delete */
- /* NB: this list is ordered by TID address */
- int num_dead_tuples; /* current # of entries */
- int max_dead_tuples; /* # slots allocated in array */
- ItemPointer dead_tuples; /* array of ItemPointerData */
+ LVDeadTuples *dead_tuples;
  int num_index_scans;
  TransactionId latestRemovedXid;
  bool lock_waiter_detected;
 } LVRelStats;

-
 /* A few variables that don't seem worth passing around as parameters */
 static int elevel = -1;

It seems like a spurious line removal.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Mon, Oct 28, 2019 at 12:20 PM Amit Kapila <> wrote:
>
> On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <> wrote:
> >
> > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <> wrote:
> > >
> > >
> > I haven't yet read the new set of the patch.  But, I have noticed one
> > thing.  That we are getting the size of the statistics using the AM
> > routine.  But, we are copying those statistics from local memory to
> > the shared memory directly using the memcpy.   Wouldn't it be a good
> > idea to have an AM specific routine to get it copied from the local
> > memory to the shared memory?  I am not sure it is worth it or not but
> > my thought behind this point is that it will give AM to have local
> > stats in any form ( like they can store a pointer in that ) but they
> > can serialize that while copying to shared stats.  And, later when
> > shared stats are passed back to the Am then it can deserialize in its
> > local form and use it.
> >
>
> You have a point, but after changing the gist index, we don't have any
> current usage for indexes that need something like that. So, on one
> side there is some value in having an API to copy the stats, but on
> the other side without having clear usage of an API, it might not be
> good to expose a new API for the same.   I think we can expose such an
> API in the future if there is a need for the same.
I agree with the point.  But, the current patch exposes an API for
estimating the size for the statistics.  So IMHO, either we expose
both APIs for estimating the size of the stats and copy the stats or
none.  Am I missing something here?

 Do you or anyone
> know of any external IndexAM that has such a need?
>
> Few minor comments while glancing through the latest patchset.
>
> 1. I think you can merge 0001*, 0002*, 0003* patch into one patch as
> all three expose new variable/function from IndexAmRoutine.
>
> 2.
> +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> +{
> + char *p = (char *) GetSharedIndStats(lvshared);
> + int vac_work_mem = IsAutoVacuumWorkerProcess() &&
> + autovacuum_work_mem != -1 ?
> + autovacuum_work_mem : maintenance_work_mem;
>
> I think this function won't be called from AutoVacuumWorkerProcess at
> least not as of now, so isn't it a better idea to have an Assert for
> it?
>
> 3.
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
>
> This function is for performing a parallel operation on the index, so
> why to start with heap?  It is better to name it as
> index_parallel_vacuum_main or simply parallel_vacuum_main.
>
> 4.
> /* useindex = true means two-pass strategy; false means one-pass */
> @@ -128,17 +280,12 @@ typedef struct LVRelStats
>   BlockNumber pages_removed;
>   double tuples_deleted;
>   BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
> - /* List of TIDs of tuples we intend to delete */
> - /* NB: this list is ordered by TID address */
> - int num_dead_tuples; /* current # of entries */
> - int max_dead_tuples; /* # slots allocated in array */
> - ItemPointer dead_tuples; /* array of ItemPointerData */
> + LVDeadTuples *dead_tuples;
>   int num_index_scans;
>   TransactionId latestRemovedXid;
>   bool lock_waiter_detected;
>  } LVRelStats;
>
> -
>  /* A few variables that don't seem worth passing around as parameters */
>  static int elevel = -1;
>
> It seems like a spurious line removal.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <> wrote:
>
> On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <> wrote:
> >
> > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <> wrote:
> > >
> > > For more detail of my idea it is that the first worker who entered to
> > > vacuum_delay_point adds its local value to shared value and reset the
> > > local value to 0. And then the worker sleeps if it exceeds
> > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
> > > from the shared value. Since vacuum_delay_point are typically called
> > > per page processed I expect there will not such problem. Thoughts?
> >
> > Oh right, I assumed that when the local balance is exceeding the
> > VacuumCostLimit that time you are adding it to the shared value but
> > you are adding it to to shared value every time in vacuum_delay_point.
> > So I think your idea is correct.
>
> I've attached the updated patch set.
>
> First three patches add new variables and a callback to index AM.
>
> Next two patches are the main part to support parallel vacuum. I've
> incorporated all review comments I got so far. The memory layout of
> variable-length index statistics might be complex a bit. It's similar
> to the format of heap tuple header, having a null bitmap. And both the
> size of index statistics and actual data for each indexes follows.
>
> Last patch is a PoC patch that implements the shared vacuum cost
> balance. For now it's separated but after testing both approaches it
> will be merged to 0004 patch. I'll test both next week.
>
> This patch set can be applied on top of the patch[1] that improves
> gist index bulk-deletion. So canparallelvacuum of gist index is true.
>

+ /* Get the space for IndexBulkDeleteResult */
+ bulkdelete_res = GetIndexBulkDeleteResult(shared_indstats);
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result
+ * if someone has already updated it.
+ */
+ if (shared_indstats->updated && stats[idx] == NULL)
+ stats[idx] = bulkdelete_res;
+

I have a doubt in this hunk,  I do not understand when this condition
will be hit?  Because whenever we are setting shared_indstats->updated
to true at the same time we are setting stats[idx] to shared stat.  So
I am not sure in what case the shared_indstats->updated will be true
but stats[idx] is still pointing to NULL?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Mon, Oct 28, 2019 at 6:08 PM Dilip Kumar <> wrote:
>
> On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <> wrote:
> >
> > On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <> wrote:
> > >
> > > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <> wrote:
> > > >
> > > > For more detail of my idea it is that the first worker who entered to
> > > > vacuum_delay_point adds its local value to shared value and reset the
> > > > local value to 0. And then the worker sleeps if it exceeds
> > > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
> > > > from the shared value. Since vacuum_delay_point are typically called
> > > > per page processed I expect there will not such problem. Thoughts?
> > >
> > > Oh right, I assumed that when the local balance is exceeding the
> > > VacuumCostLimit that time you are adding it to the shared value but
> > > you are adding it to to shared value every time in vacuum_delay_point.
> > > So I think your idea is correct.
> >
> > I've attached the updated patch set.
> >
> > First three patches add new variables and a callback to index AM.
> >
> > Next two patches are the main part to support parallel vacuum. I've
> > incorporated all review comments I got so far. The memory layout of
> > variable-length index statistics might be complex a bit. It's similar
> > to the format of heap tuple header, having a null bitmap. And both the
> > size of index statistics and actual data for each indexes follows.
> >
> > Last patch is a PoC patch that implements the shared vacuum cost
> > balance. For now it's separated but after testing both approaches it
> > will be merged to 0004 patch. I'll test both next week.
> >
> > This patch set can be applied on top of the patch[1] that improves
> > gist index bulk-deletion. So canparallelvacuum of gist index is true.
> >
>
> + /* Get the space for IndexBulkDeleteResult */
> + bulkdelete_res = GetIndexBulkDeleteResult(shared_indstats);
> +
> + /*
> + * Update the pointer to the corresponding bulk-deletion result
> + * if someone has already updated it.
> + */
> + if (shared_indstats->updated && stats[idx] == NULL)
> + stats[idx] = bulkdelete_res;
> +
>
> I have a doubt in this hunk,  I do not understand when this condition
> will be hit?  Because whenever we are setting shared_indstats->updated
> to true at the same time we are setting stats[idx] to shared stat.  So
> I am not sure in what case the shared_indstats->updated will be true
> but stats[idx] is still pointing to NULL?
>

I think it can be true in the case where one parallel vacuum worker
vacuums the index that was vacuumed by other workers in previous index
vacuum cycle. Suppose that worker-A and worker-B vacuumed index-A and
index-B respectively. After that worker-A vacuum index-B in the next
index vacuum cycle. In this case, shared_indstats->updated is true
because worker-B already vacuumed in the previous vacuum cycle. On the
other hand stats[idx] on worker-A is NULL because it's first time for
worker-A to vacuum index-B. Therefore worker-A updates its stats[idx]
to the bulk-deletion result on DSM in order to pass it to the index
AM.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Tue, Oct 29, 2019 at 10:01 AM Masahiko Sawada <> wrote:
>
> On Mon, Oct 28, 2019 at 6:08 PM Dilip Kumar <> wrote:
> >
> > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <> wrote:
> > >
> > > On Fri, Oct 25, 2019 at 2:06 PM Dilip Kumar <> wrote:
> > > >
> > > > On Fri, Oct 25, 2019 at 10:22 AM Masahiko Sawada <> wrote:
> > > > >
> > > > > For more detail of my idea it is that the first worker who entered to
> > > > > vacuum_delay_point adds its local value to shared value and reset the
> > > > > local value to 0. And then the worker sleeps if it exceeds
> > > > > VacuumCostLimit but before sleeping it can subtract VacuumCostLimit
> > > > > from the shared value. Since vacuum_delay_point are typically called
> > > > > per page processed I expect there will not such problem. Thoughts?
> > > >
> > > > Oh right, I assumed that when the local balance is exceeding the
> > > > VacuumCostLimit that time you are adding it to the shared value but
> > > > you are adding it to to shared value every time in vacuum_delay_point.
> > > > So I think your idea is correct.
> > >
> > > I've attached the updated patch set.
> > >
> > > First three patches add new variables and a callback to index AM.
> > >
> > > Next two patches are the main part to support parallel vacuum. I've
> > > incorporated all review comments I got so far. The memory layout of
> > > variable-length index statistics might be complex a bit. It's similar
> > > to the format of heap tuple header, having a null bitmap. And both the
> > > size of index statistics and actual data for each indexes follows.
> > >
> > > Last patch is a PoC patch that implements the shared vacuum cost
> > > balance. For now it's separated but after testing both approaches it
> > > will be merged to 0004 patch. I'll test both next week.
> > >
> > > This patch set can be applied on top of the patch[1] that improves
> > > gist index bulk-deletion. So canparallelvacuum of gist index is true.
> > >
> >
> > + /* Get the space for IndexBulkDeleteResult */
> > + bulkdelete_res = GetIndexBulkDeleteResult(shared_indstats);
> > +
> > + /*
> > + * Update the pointer to the corresponding bulk-deletion result
> > + * if someone has already updated it.
> > + */
> > + if (shared_indstats->updated && stats[idx] == NULL)
> > + stats[idx] = bulkdelete_res;
> > +
> >
> > I have a doubt in this hunk,  I do not understand when this condition
> > will be hit?  Because whenever we are setting shared_indstats->updated
> > to true at the same time we are setting stats[idx] to shared stat.  So
> > I am not sure in what case the shared_indstats->updated will be true
> > but stats[idx] is still pointing to NULL?
> >
>
> I think it can be true in the case where one parallel vacuum worker
> vacuums the index that was vacuumed by other workers in previous index
> vacuum cycle. Suppose that worker-A and worker-B vacuumed index-A and
> index-B respectively. After that worker-A vacuum index-B in the next
> index vacuum cycle. In this case, shared_indstats->updated is true
> because worker-B already vacuumed in the previous vacuum cycle. On the
> other hand stats[idx] on worker-A is NULL because it's first time for
> worker-A to vacuum index-B. Therefore worker-A updates its stats[idx]
> to the bulk-deletion result on DSM in order to pass it to the index
> AM.
Okay, that makes sense.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <> wrote:
>
> On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <> wrote:
> >
> > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > > > > >
> > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > What do you think?
> > > > >
> > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > Otherwise, I will try to write it after finishing the first one
> > > > > (approach b).
> > > > >
> > > > I have come up with the POC for approach (a).
>
> > > Can we compute the overall throttling (sleep time) in the operation
> > > separately for heap and index, then divide the index's sleep_time with
> > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > a bit easier to compare the data between parallel and non-parallel
> > > case.
> I have come up with a patch to compute the total delay during the
> vacuum.  So the idea of computing the total cost delay is
>
> Total cost delay = Total dealy of heap scan + Total dealy of
> index/worker;  Patch is attached for the same.
>
> I have prepared this patch on the latest patch of the parallel
> vacuum[1].  I have also rebased the patch for the approach [b] for
> dividing the vacuum cost limit and done some testing for computing the
> I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> rebased on top of v31-0006, because v31-0006 is implementing the I/O
> throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> doing the same with another approach.   But,
> 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> well (just 1-2 lines conflict).
>
> Testing:  I have performed 2 tests, one with the same size indexes and
> second with the different size indexes and measured total I/O delay
> with the attached patch.
>
> Setup:
> VacuumCostDelay=10ms
> VacuumCostLimit=2000
>
> Test1 (Same size index):
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b);
> create index idx3 on test(c);
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
>                       Vacuum (Head)                   Parallel Vacuum
>            Vacuum Cost Divide Patch
> Total Delay        1784 (ms)                           1398(ms)
>                  1938(ms)
>
>
> Test2 (Variable size dead tuple in index)
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b) where a > 100000;
> create index idx3 on test(c) where a > 150000;
>
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
> Vacuum (Head)                                   Parallel Vacuum
>               Vacuum Cost Divide Patch
> Total Delay 1438 (ms)                               1029(ms)
>                    1529(ms)
>
>
> Conclusion:
> 1. The tests prove that the total I/O delay is significantly less with
> the parallel vacuum.
> 2. With the vacuum cost divide the problem is solved but the delay bit
> more compared to the non-parallel version.  The reason could be the
> problem discussed at[2], but it needs further investigation.
>
> Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> will also try to test different types of indexes.
>

Thank you for testing!

I realized that v31-0006 patch doesn't work fine so I've attached the
updated version patch that also incorporated some comments I got so
far. Sorry for the inconvenience. I'll apply your 0001 patch and also
test the total delay time.

Regards,

--
Masahiko Sawada

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, Oct 29, 2019 at 4:06 PM Masahiko Sawada <> wrote:
>
> On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <> wrote:
> >
> > On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <> wrote:
> > > >
> > > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> > > > > >
> > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > > > > > >
> > > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > > What do you think?
> > > > > >
> > > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > > Otherwise, I will try to write it after finishing the first one
> > > > > > (approach b).
> > > > > >
> > > > > I have come up with the POC for approach (a).
> >
> > > > Can we compute the overall throttling (sleep time) in the operation
> > > > separately for heap and index, then divide the index's sleep_time with
> > > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > > a bit easier to compare the data between parallel and non-parallel
> > > > case.
> > I have come up with a patch to compute the total delay during the
> > vacuum.  So the idea of computing the total cost delay is
> >
> > Total cost delay = Total dealy of heap scan + Total dealy of
> > index/worker;  Patch is attached for the same.
> >
> > I have prepared this patch on the latest patch of the parallel
> > vacuum[1].  I have also rebased the patch for the approach [b] for
> > dividing the vacuum cost limit and done some testing for computing the
> > I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> > and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> > v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> > rebased on top of v31-0006, because v31-0006 is implementing the I/O
> > throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> > doing the same with another approach.   But,
> > 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> > well (just 1-2 lines conflict).
> >
> > Testing:  I have performed 2 tests, one with the same size indexes and
> > second with the different size indexes and measured total I/O delay
> > with the attached patch.
> >
> > Setup:
> > VacuumCostDelay=10ms
> > VacuumCostLimit=2000
> >
> > Test1 (Same size index):
> > create table test(a int, b varchar, c varchar);
> > create index idx1 on test(a);
> > create index idx2 on test(b);
> > create index idx3 on test(c);
> > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> > generate_series(1,500000) as i;
> > delete from test where a < 200000;
> >
> >                       Vacuum (Head)                   Parallel Vacuum
> >            Vacuum Cost Divide Patch
> > Total Delay        1784 (ms)                           1398(ms)
> >                  1938(ms)
> >
> >
> > Test2 (Variable size dead tuple in index)
> > create table test(a int, b varchar, c varchar);
> > create index idx1 on test(a);
> > create index idx2 on test(b) where a > 100000;
> > create index idx3 on test(c) where a > 150000;
> >
> > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> > generate_series(1,500000) as i;
> > delete from test where a < 200000;
> >
> > Vacuum (Head)                                   Parallel Vacuum
> >               Vacuum Cost Divide Patch
> > Total Delay 1438 (ms)                               1029(ms)
> >                    1529(ms)
> >
> >
> > Conclusion:
> > 1. The tests prove that the total I/O delay is significantly less with
> > the parallel vacuum.
> > 2. With the vacuum cost divide the problem is solved but the delay bit
> > more compared to the non-parallel version.  The reason could be the
> > problem discussed at[2], but it needs further investigation.
> >
> > Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> > will also try to test different types of indexes.
> >
>
> Thank you for testing!
>
> I realized that v31-0006 patch doesn't work fine so I've attached the
> updated version patch that also incorporated some comments I got so
> far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> test the total delay time.
>

FWIW I'd like to share the results of total delay time evaluation of
approach (a) (shared cost balance). I used the same workloads that
Dilip shared and set vacuum_cost_delay to 10. The results of two test
cases are here:

* Test1
normal      : 12656 ms (hit 50594, miss 5700, dirty 7258, total 63552)
2 workers : 17149 ms (hit 47673, miss 8647, dirty 9157, total 65477)
1 worker   : 19498 ms (hit 45954, miss 10340, dirty 10517, total 66811)

* Test2
normal      : 1530 ms (hit 30645, miss 2, dirty 3, total 30650)
2 workers : 1538 ms (hit 30645, miss 2, dirty 3, total 30650)
1 worker   : 1538 ms (hit 30645, miss 2, dirty 3, total 30650)

'hit', 'miss' and 'dirty' are the total numbers of buffer hits, buffer
misses and flushing dirty buffer, respectively. 'total' is the sum of
these three values.

In this evaluation I expect that parallel vacuum cases delay time as
much as the time of normal vacuum because the total number of pages to
vacuum is the same and we have the shared cost balance value and each
workers decide to sleep based on that value. According to the above
Test1 results, we can see that there is a big difference in the total
delay time among  these cases (normal vacuum case is shortest), but
the cause of this is that parallel vacuum had to to flush more dirty
pages. Actually after increased shared_buffer I got expected results:

* Test1 (after increased shared_buffers)
normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)

I updated the patch that computes the total cost delay shared by
Dilip[1] so that it collects the number of buffer hits and so on, and
have attached it. It can be applied on top of my latest patch set[1].

[1] https://www.postgresql.org/message-id/CAFiTN-thU-z8f04jO7xGMu5yUUpTpsBTvBrFW6EhRf-jGvEz%3Dg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com

Regards,

--
Masahiko Sawada

Вложения

Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <> wrote:
>
> On Tue, Oct 29, 2019 at 4:06 PM Masahiko Sawada <> wrote:
> >
> > On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <> wrote:
> > > >
> > > > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <> wrote:
> > > > >
> > > > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <> wrote:
> > > > > >
> > > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> > > > > > >
> > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > > > > > > >
> > > > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > > > What do you think?
> > > > > > >
> > > > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > > > Otherwise, I will try to write it after finishing the first one
> > > > > > > (approach b).
> > > > > > >
> > > > > > I have come up with the POC for approach (a).
> > >
> > > > > Can we compute the overall throttling (sleep time) in the operation
> > > > > separately for heap and index, then divide the index's sleep_time with
> > > > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > > > a bit easier to compare the data between parallel and non-parallel
> > > > > case.
> > > I have come up with a patch to compute the total delay during the
> > > vacuum.  So the idea of computing the total cost delay is
> > >
> > > Total cost delay = Total dealy of heap scan + Total dealy of
> > > index/worker;  Patch is attached for the same.
> > >
> > > I have prepared this patch on the latest patch of the parallel
> > > vacuum[1].  I have also rebased the patch for the approach [b] for
> > > dividing the vacuum cost limit and done some testing for computing the
> > > I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> > > and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> > > v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> > > rebased on top of v31-0006, because v31-0006 is implementing the I/O
> > > throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> > > doing the same with another approach.   But,
> > > 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> > > well (just 1-2 lines conflict).
> > >
> > > Testing:  I have performed 2 tests, one with the same size indexes and
> > > second with the different size indexes and measured total I/O delay
> > > with the attached patch.
> > >
> > > Setup:
> > > VacuumCostDelay=10ms
> > > VacuumCostLimit=2000
> > >
> > > Test1 (Same size index):
> > > create table test(a int, b varchar, c varchar);
> > > create index idx1 on test(a);
> > > create index idx2 on test(b);
> > > create index idx3 on test(c);
> > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> > > generate_series(1,500000) as i;
> > > delete from test where a < 200000;
> > >
> > >                       Vacuum (Head)                   Parallel Vacuum
> > >            Vacuum Cost Divide Patch
> > > Total Delay        1784 (ms)                           1398(ms)
> > >                  1938(ms)
> > >
> > >
> > > Test2 (Variable size dead tuple in index)
> > > create table test(a int, b varchar, c varchar);
> > > create index idx1 on test(a);
> > > create index idx2 on test(b) where a > 100000;
> > > create index idx3 on test(c) where a > 150000;
> > >
> > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> > > generate_series(1,500000) as i;
> > > delete from test where a < 200000;
> > >
> > > Vacuum (Head)                                   Parallel Vacuum
> > >               Vacuum Cost Divide Patch
> > > Total Delay 1438 (ms)                               1029(ms)
> > >                    1529(ms)
> > >
> > >
> > > Conclusion:
> > > 1. The tests prove that the total I/O delay is significantly less with
> > > the parallel vacuum.
> > > 2. With the vacuum cost divide the problem is solved but the delay bit
> > > more compared to the non-parallel version.  The reason could be the
> > > problem discussed at[2], but it needs further investigation.
> > >
> > > Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> > > will also try to test different types of indexes.
> > >
> >
> > Thank you for testing!
> >
> > I realized that v31-0006 patch doesn't work fine so I've attached the
> > updated version patch that also incorporated some comments I got so
> > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > test the total delay time.
> >
>
> FWIW I'd like to share the results of total delay time evaluation of
> approach (a) (shared cost balance). I used the same workloads that
> Dilip shared and set vacuum_cost_delay to 10. The results of two test
> cases are here:
>
> * Test1
> normal      : 12656 ms (hit 50594, miss 5700, dirty 7258, total 63552)
> 2 workers : 17149 ms (hit 47673, miss 8647, dirty 9157, total 65477)
> 1 worker   : 19498 ms (hit 45954, miss 10340, dirty 10517, total 66811)
>
> * Test2
> normal      : 1530 ms (hit 30645, miss 2, dirty 3, total 30650)
> 2 workers : 1538 ms (hit 30645, miss 2, dirty 3, total 30650)
> 1 worker   : 1538 ms (hit 30645, miss 2, dirty 3, total 30650)
>
> 'hit', 'miss' and 'dirty' are the total numbers of buffer hits, buffer
> misses and flushing dirty buffer, respectively. 'total' is the sum of
> these three values.
>
> In this evaluation I expect that parallel vacuum cases delay time as
> much as the time of normal vacuum because the total number of pages to
> vacuum is the same and we have the shared cost balance value and each
> workers decide to sleep based on that value. According to the above
> Test1 results, we can see that there is a big difference in the total
> delay time among  these cases (normal vacuum case is shortest), but
> the cause of this is that parallel vacuum had to to flush more dirty
> pages. Actually after increased shared_buffer I got expected results:
>
> * Test1 (after increased shared_buffers)
> normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
>
> I updated the patch that computes the total cost delay shared by
> Dilip[1] so that it collects the number of buffer hits and so on, and
> have attached it. It can be applied on top of my latest patch set[1].

Thanks, Sawada-san.  In my next test, I will use this updated patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Tue, Oct 29, 2019 at 3:11 PM Dilip Kumar <> wrote:
>
> On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <> wrote:
> >
> > On Tue, Oct 29, 2019 at 4:06 PM Masahiko Sawada <> wrote:
> > >
> > > On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <> wrote:
> > > >
> > > > On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <> wrote:
> > > > >
> > > > > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <> wrote:
> > > > > >
> > > > > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <> wrote:
> > > > > > >
> > > > > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> > > > > > > >
> > > > > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > > > > > > > >
> > > > > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > > > > What do you think?
> > > > > > > >
> > > > > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > > > > Otherwise, I will try to write it after finishing the first one
> > > > > > > > (approach b).
> > > > > > > >
> > > > > > > I have come up with the POC for approach (a).
> > > >
> > > > > > Can we compute the overall throttling (sleep time) in the operation
> > > > > > separately for heap and index, then divide the index's sleep_time with
> > > > > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > > > > a bit easier to compare the data between parallel and non-parallel
> > > > > > case.
> > > > I have come up with a patch to compute the total delay during the
> > > > vacuum.  So the idea of computing the total cost delay is
> > > >
> > > > Total cost delay = Total dealy of heap scan + Total dealy of
> > > > index/worker;  Patch is attached for the same.
> > > >
> > > > I have prepared this patch on the latest patch of the parallel
> > > > vacuum[1].  I have also rebased the patch for the approach [b] for
> > > > dividing the vacuum cost limit and done some testing for computing the
> > > > I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> > > > and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> > > > v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> > > > rebased on top of v31-0006, because v31-0006 is implementing the I/O
> > > > throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> > > > doing the same with another approach.   But,
> > > > 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> > > > well (just 1-2 lines conflict).
> > > >
> > > > Testing:  I have performed 2 tests, one with the same size indexes and
> > > > second with the different size indexes and measured total I/O delay
> > > > with the attached patch.
> > > >
> > > > Setup:
> > > > VacuumCostDelay=10ms
> > > > VacuumCostLimit=2000
> > > >
> > > > Test1 (Same size index):
> > > > create table test(a int, b varchar, c varchar);
> > > > create index idx1 on test(a);
> > > > create index idx2 on test(b);
> > > > create index idx3 on test(c);
> > > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> > > > generate_series(1,500000) as i;
> > > > delete from test where a < 200000;
> > > >
> > > >                       Vacuum (Head)                   Parallel Vacuum
> > > >            Vacuum Cost Divide Patch
> > > > Total Delay        1784 (ms)                           1398(ms)
> > > >                  1938(ms)
> > > >
> > > >
> > > > Test2 (Variable size dead tuple in index)
> > > > create table test(a int, b varchar, c varchar);
> > > > create index idx1 on test(a);
> > > > create index idx2 on test(b) where a > 100000;
> > > > create index idx3 on test(c) where a > 150000;
> > > >
> > > > insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> > > > generate_series(1,500000) as i;
> > > > delete from test where a < 200000;
> > > >
> > > > Vacuum (Head)                                   Parallel Vacuum
> > > >               Vacuum Cost Divide Patch
> > > > Total Delay 1438 (ms)                               1029(ms)
> > > >                    1529(ms)
> > > >
> > > >
> > > > Conclusion:
> > > > 1. The tests prove that the total I/O delay is significantly less with
> > > > the parallel vacuum.
> > > > 2. With the vacuum cost divide the problem is solved but the delay bit
> > > > more compared to the non-parallel version.  The reason could be the
> > > > problem discussed at[2], but it needs further investigation.
> > > >
> > > > Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> > > > will also try to test different types of indexes.
> > > >
> > >
> > > Thank you for testing!
> > >
> > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > updated version patch that also incorporated some comments I got so
> > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > test the total delay time.
> > >
> >
> > FWIW I'd like to share the results of total delay time evaluation of
> > approach (a) (shared cost balance). I used the same workloads that
> > Dilip shared and set vacuum_cost_delay to 10. The results of two test
> > cases are here:
> >
> > * Test1
> > normal      : 12656 ms (hit 50594, miss 5700, dirty 7258, total 63552)
> > 2 workers : 17149 ms (hit 47673, miss 8647, dirty 9157, total 65477)
> > 1 worker   : 19498 ms (hit 45954, miss 10340, dirty 10517, total 66811)
> >
> > * Test2
> > normal      : 1530 ms (hit 30645, miss 2, dirty 3, total 30650)
> > 2 workers : 1538 ms (hit 30645, miss 2, dirty 3, total 30650)
> > 1 worker   : 1538 ms (hit 30645, miss 2, dirty 3, total 30650)
> >
> > 'hit', 'miss' and 'dirty' are the total numbers of buffer hits, buffer
> > misses and flushing dirty buffer, respectively. 'total' is the sum of
> > these three values.
> >
> > In this evaluation I expect that parallel vacuum cases delay time as
> > much as the time of normal vacuum because the total number of pages to
> > vacuum is the same and we have the shared cost balance value and each
> > workers decide to sleep based on that value. According to the above
> > Test1 results, we can see that there is a big difference in the total
> > delay time among  these cases (normal vacuum case is shortest), but
> > the cause of this is that parallel vacuum had to to flush more dirty
> > pages. Actually after increased shared_buffer I got expected results:
> >
> > * Test1 (after increased shared_buffers)
> > normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> > 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
> >
> > I updated the patch that computes the total cost delay shared by
> > Dilip[1] so that it collects the number of buffer hits and so on, and
> > have attached it. It can be applied on top of my latest patch set[1].
>
> Thanks, Sawada-san.  In my next test, I will use this updated patch.
>
Few comments on the latest patch.

+heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
...
+
+ stats = (IndexBulkDeleteResult **)
+ palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
+
+ if (lvshared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = lvshared->maintenance_work_mem_worker;

So for a worker, we have set the new value of the
maintenance_work_mem,  But if the leader is participating in the index
vacuuming then
shouldn't we set the new value of the maintenance_work_mem for the
leader as well?


+static void
+prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
+{
+ char *p = (char *) GetSharedIndStats(lvshared);
+ int vac_work_mem = IsAutoVacuumWorkerProcess() &&
+ autovacuum_work_mem != -1 ?
+ autovacuum_work_mem : maintenance_work_mem;
+ int nindexes_mwm = 0;
+ int i;

Can this ever be called from the Autovacuum Worker?  I think instead
of adding handling for the auto vacuum worker we
can have an assert.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Mon, Oct 28, 2019 at 3:50 PM Amit Kapila <> wrote:
>
> On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <> wrote:
> >
> > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <> wrote:
> > >
> > >
> > I haven't yet read the new set of the patch.  But, I have noticed one
> > thing.  That we are getting the size of the statistics using the AM
> > routine.  But, we are copying those statistics from local memory to
> > the shared memory directly using the memcpy.   Wouldn't it be a good
> > idea to have an AM specific routine to get it copied from the local
> > memory to the shared memory?  I am not sure it is worth it or not but
> > my thought behind this point is that it will give AM to have local
> > stats in any form ( like they can store a pointer in that ) but they
> > can serialize that while copying to shared stats.  And, later when
> > shared stats are passed back to the Am then it can deserialize in its
> > local form and use it.
> >
>
> You have a point, but after changing the gist index, we don't have any
> current usage for indexes that need something like that. So, on one
> side there is some value in having an API to copy the stats, but on
> the other side without having clear usage of an API, it might not be
> good to expose a new API for the same.   I think we can expose such an
> API in the future if there is a need for the same.  Do you or anyone
> know of any external IndexAM that has such a need?
>
> Few minor comments while glancing through the latest patchset.
>
> 1. I think you can merge 0001*, 0002*, 0003* patch into one patch as
> all three expose new variable/function from IndexAmRoutine.

Fixed.

>
> 2.
> +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> +{
> + char *p = (char *) GetSharedIndStats(lvshared);
> + int vac_work_mem = IsAutoVacuumWorkerProcess() &&
> + autovacuum_work_mem != -1 ?
> + autovacuum_work_mem : maintenance_work_mem;
>
> I think this function won't be called from AutoVacuumWorkerProcess at
> least not as of now, so isn't it a better idea to have an Assert for
> it?

Fixed.

>
> 3.
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
>
> This function is for performing a parallel operation on the index, so
> why to start with heap?

Because parallel vacuum supports only indexes that are created on heaps.

>  It is better to name it as
> index_parallel_vacuum_main or simply parallel_vacuum_main.

I'm concerned that both names index_parallel_vacuum_main and
parallel_vacuum_main seem to be generic in spite of these codes are
heap-specific code.

>
> 4.
> /* useindex = true means two-pass strategy; false means one-pass */
> @@ -128,17 +280,12 @@ typedef struct LVRelStats
>   BlockNumber pages_removed;
>   double tuples_deleted;
>   BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
> - /* List of TIDs of tuples we intend to delete */
> - /* NB: this list is ordered by TID address */
> - int num_dead_tuples; /* current # of entries */
> - int max_dead_tuples; /* # slots allocated in array */
> - ItemPointer dead_tuples; /* array of ItemPointerData */
> + LVDeadTuples *dead_tuples;
>   int num_index_scans;
>   TransactionId latestRemovedXid;
>   bool lock_waiter_detected;
>  } LVRelStats;
>
> -
>  /* A few variables that don't seem worth passing around as parameters */
>  static int elevel = -1;
>
> It seems like a spurious line removal.

Fixed.

These above comments are incorporated in the latest patch set(v32) [1].

[1] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <> wrote:
> Actually after increased shared_buffer I got expected results:
>
> * Test1 (after increased shared_buffers)
> normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
>
> I updated the patch that computes the total cost delay shared by
> Dilip[1] so that it collects the number of buffer hits and so on, and
> have attached it. It can be applied on top of my latest patch set[1].

I tried to repeat the test to see the IO delay with
v32-0004-PoC-shared-vacuum-cost-balance.patch [1].  I tried with
shared memory 4GB.  I recreated the database and restarted the server
before each run.  But, I could not see the same I/O delay and cost is
also not the same.  Can you please tell me how much shared buffers did
you set?

Test1 (4GB shared buffers)
normal:      stats delay 1348.160000, hit 68952, miss 2, dirty 10063,
total 79017
1 worker:   stats delay 1821.255000, hit 78184, miss 2, dirty 14095, total 92281
2 workers: stats delay 2224.415000, hit 86482, miss 2, dirty 17665, total 104149

[1] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Thu, Oct 31, 2019 at 11:33 AM Dilip Kumar <> wrote:
>
> On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <> wrote:
> > Actually after increased shared_buffer I got expected results:
> >
> > * Test1 (after increased shared_buffers)
> > normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> > 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
> >
> > I updated the patch that computes the total cost delay shared by
> > Dilip[1] so that it collects the number of buffer hits and so on, and
> > have attached it. It can be applied on top of my latest patch set[1].

While reading your modified patch (PoC-delay-stats.patch), I have
noticed that in my patch I used below formulae to compute the total
delay
total delay = delay in heap scan + (total delay of index scan
/nworkers). But, in your patch, I can see that it is just total sum of
all delay.  IMHO, the total sleep time during the index vacuum phase
must be divided by the number of workers, because even if at some
point, all the workers go for sleep (e.g. 10 msec) then the delay in
I/O will be only for 10msec not 30 msec.  I think the same is
discussed upthread[1]

[1] https://www.postgresql.org/message-id/CAA4eK1%2BPeiFLdTuwrE6CvbNdx80E-O%3DZxCuWB2maREKFD-RaCA%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Thu, Oct 31, 2019 at 3:45 PM Dilip Kumar <> wrote:
>
> On Thu, Oct 31, 2019 at 11:33 AM Dilip Kumar <> wrote:
> >
> > On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <> wrote:
> > > Actually after increased shared_buffer I got expected results:
> > >
> > > * Test1 (after increased shared_buffers)
> > > normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> > > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> > > 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
> > >
> > > I updated the patch that computes the total cost delay shared by
> > > Dilip[1] so that it collects the number of buffer hits and so on, and
> > > have attached it. It can be applied on top of my latest patch set[1].
>
> While reading your modified patch (PoC-delay-stats.patch), I have
> noticed that in my patch I used below formulae to compute the total
> delay
> total delay = delay in heap scan + (total delay of index scan
> /nworkers). But, in your patch, I can see that it is just total sum of
> all delay.  IMHO, the total sleep time during the index vacuum phase
> must be divided by the number of workers, because even if at some
> point, all the workers go for sleep (e.g. 10 msec) then the delay in
> I/O will be only for 10msec not 30 msec.  I think the same is
> discussed upthread[1]
>

I think that two approaches make parallel vacuum worker wait in
different way: in approach(a) the vacuum delay works as if vacuum is
performed by single process, on the other hand in approach(b) the
vacuum delay work for each workers independently.

Suppose that the total number of blocks to vacuum is 10,000 blocks,
the cost per blocks is 10, the cost limit is 200 and sleep time is 5
ms. In single process vacuum the total sleep time is 2,500ms (=
(10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
Because all parallel vacuum workers use the shared balance value and a
worker sleeps once the balance value exceeds the limit. In
approach(b), since the cost limit is divided evenly the value of each
workers is 40 (e.g. when 5 parallel degree). And suppose each workers
processes blocks  evenly,  the total sleep time of all workers is
12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
compute the sleep time of approach(b) by dividing the total value by
the number of parallel workers.

IOW the approach(b) makes parallel vacuum delay much more than normal
vacuum and parallel vacuum with approach(a) even with the same
settings. Which behaviors do we expect? I thought the vacuum delay for
parallel vacuum should work as if it's a single process vacuum as we
did for memory usage. I might be missing something. If we prefer
approach(b) I should change the patch so that the leader process
divides the cost limit evenly.

Regards,

--
Masahiko Sawada



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <> wrote:
>
> On Thu, Oct 31, 2019 at 3:45 PM Dilip Kumar <> wrote:
> >
> > On Thu, Oct 31, 2019 at 11:33 AM Dilip Kumar <> wrote:
> > >
> > > On Tue, Oct 29, 2019 at 1:59 PM Masahiko Sawada <> wrote:
> > > > Actually after increased shared_buffer I got expected results:
> > > >
> > > > * Test1 (after increased shared_buffers)
> > > > normal      : 2807 ms (hit 56295, miss 2, dirty 3, total 56300)
> > > > 2 workers : 2840 ms (hit 56295, miss 2, dirty 3, total 56300)
> > > > 1 worker   : 2841 ms (hit 56295, miss 2, dirty 3, total 56300)
> > > >
> > > > I updated the patch that computes the total cost delay shared by
> > > > Dilip[1] so that it collects the number of buffer hits and so on, and
> > > > have attached it. It can be applied on top of my latest patch set[1].
> >
> > While reading your modified patch (PoC-delay-stats.patch), I have
> > noticed that in my patch I used below formulae to compute the total
> > delay
> > total delay = delay in heap scan + (total delay of index scan
> > /nworkers). But, in your patch, I can see that it is just total sum of
> > all delay.  IMHO, the total sleep time during the index vacuum phase
> > must be divided by the number of workers, because even if at some
> > point, all the workers go for sleep (e.g. 10 msec) then the delay in
> > I/O will be only for 10msec not 30 msec.  I think the same is
> > discussed upthread[1]
> >
>
> I think that two approaches make parallel vacuum worker wait in
> different way: in approach(a) the vacuum delay works as if vacuum is
> performed by single process, on the other hand in approach(b) the
> vacuum delay work for each workers independently.
>
> Suppose that the total number of blocks to vacuum is 10,000 blocks,
> the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> ms. In single process vacuum the total sleep time is 2,500ms (=
> (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> Because all parallel vacuum workers use the shared balance value and a
> worker sleeps once the balance value exceeds the limit. In
> approach(b), since the cost limit is divided evenly the value of each
> workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> processes blocks  evenly,  the total sleep time of all workers is
> 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> compute the sleep time of approach(b) by dividing the total value by
> the number of parallel workers.
>
> IOW the approach(b) makes parallel vacuum delay much more than normal
> vacuum and parallel vacuum with approach(a) even with the same
> settings. Which behaviors do we expect? I thought the vacuum delay for
> parallel vacuum should work as if it's a single process vacuum as we
> did for memory usage. I might be missing something. If we prefer
> approach(b) I should change the patch so that the leader process
> divides the cost limit evenly.
>
I have repeated the same test (test1 and test2)[1] with a higher
shared buffer (1GB).  Currently, I have used the same formula for
computing the total delay
heap scan delay + index vacuuming delay / workers.  Because, In my
opinion, multiple workers are doing I/O here so the total delay should
also be in multiple
of the number of workers.  So if we want to compare the delay with the
sequential vacuum then we should divide total delay by the number of
workers.  But, I am not
sure whether computing the total delay is the right way to compute the
I/O throttling or not.  But, I support the approach (b) for dividing
the I/O limit because
auto vacuum workers are already operating with this approach.

test1:
normal: stats delay 1348.160000, hit 68952, miss 2, dirty 10063, total 79017
1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146,
total 79102 (cost divide patch)
2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036,
total 78994 (cost divide patch)
1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066,
total 92252 (share cost patch)
2 worker: stats delay 904.366667, hit 86482, miss 2, dirty 17806,
total 104290 (share cost patch)

test2:
normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472
1 worker: stats delay 530.700000, hit 36984, miss 2, dirty 3527, total
40513 (cost divide patch)
2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total
40518 (cost divide patch)
1 worker: stats delay 490.570000, hit 39090, miss 2, dirty 3497, total
42589 (share cost patch)
2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total
42871 (share cost patch)

So with higher, shared buffers,  I can see with approach (b) we can
see the same total delay.  With approach (a) I can see a bit less
total delay.  But, a point to be noted that I have used the same
formulae for computing the total delay for both the approaches.  But,
Sawada-san explained in the above mail that it may not be the right
way to computing the total delay for the approach (a).  But my take is
that whether we are working with shared cost or we are dividing the
cost, the delay must be divided by number of workers in the parallel
phase. @Amit Kapila, what is your opinion on this?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Mon, Oct 28, 2019 at 1:52 PM Dilip Kumar <> wrote:
>
> On Mon, Oct 28, 2019 at 12:20 PM Amit Kapila <> wrote:
> >
> > On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <> wrote:
> > >
> > > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <> wrote:
> > > >
> > > >
> > > I haven't yet read the new set of the patch.  But, I have noticed one
> > > thing.  That we are getting the size of the statistics using the AM
> > > routine.  But, we are copying those statistics from local memory to
> > > the shared memory directly using the memcpy.   Wouldn't it be a good
> > > idea to have an AM specific routine to get it copied from the local
> > > memory to the shared memory?  I am not sure it is worth it or not but
> > > my thought behind this point is that it will give AM to have local
> > > stats in any form ( like they can store a pointer in that ) but they
> > > can serialize that while copying to shared stats.  And, later when
> > > shared stats are passed back to the Am then it can deserialize in its
> > > local form and use it.
> > >
> >
> > You have a point, but after changing the gist index, we don't have any
> > current usage for indexes that need something like that. So, on one
> > side there is some value in having an API to copy the stats, but on
> > the other side without having clear usage of an API, it might not be
> > good to expose a new API for the same.   I think we can expose such an
> > API in the future if there is a need for the same.
> I agree with the point.  But, the current patch exposes an API for
> estimating the size for the statistics.  So IMHO, either we expose
> both APIs for estimating the size of the stats and copy the stats or
> none.  Am I missing something here?
>

I think the first one is a must as the things stand today because
otherwise, we won't be able to copy the stats.  The second one (expose
an API to copy stats) is good to have but there is no usage of it
immediately.  We can expose the second API considering the future need
but as there is no valid case as of now, it will be difficult to test
and we are also not sure whether in future any IndexAM will require
such an API.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <> wrote:
>
> I think that two approaches make parallel vacuum worker wait in
> different way: in approach(a) the vacuum delay works as if vacuum is
> performed by single process, on the other hand in approach(b) the
> vacuum delay work for each workers independently.
>
> Suppose that the total number of blocks to vacuum is 10,000 blocks,
> the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> ms. In single process vacuum the total sleep time is 2,500ms (=
> (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> Because all parallel vacuum workers use the shared balance value and a
> worker sleeps once the balance value exceeds the limit. In
> approach(b), since the cost limit is divided evenly the value of each
> workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> processes blocks  evenly,  the total sleep time of all workers is
> 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> compute the sleep time of approach(b) by dividing the total value by
> the number of parallel workers.
>
> IOW the approach(b) makes parallel vacuum delay much more than normal
> vacuum and parallel vacuum with approach(a) even with the same
> settings. Which behaviors do we expect?
>

Yeah, this is an important thing to decide.  I don't think that the
conclusion you are drawing is correct because it that is true then the
same applies to the current autovacuum work division where we divide
the cost_limit among workers but the cost_delay is same (see
autovac_balance_cost).  Basically, if we consider the delay time of
each worker independently, then it would appear that a parallel vacuum
delay with approach (b) is more, but that is true only if the workers
run serially which is not true.

> I thought the vacuum delay for
> parallel vacuum should work as if it's a single process vacuum as we
> did for memory usage. I might be missing something. If we prefer
> approach(b) I should change the patch so that the leader process
> divides the cost limit evenly.
>

I am also not completely sure which approach is better but I slightly
lean towards approach (b).  I think we need input from some other
people as well.  I will start a separate thread to discuss this and
see if that helps to get the input from others.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Sun, Nov 3, 2019 at 9:49 AM Dilip Kumar <> wrote:
>
> On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <> wrote:
> >
> >
> > I think that two approaches make parallel vacuum worker wait in
> > different way: in approach(a) the vacuum delay works as if vacuum is
> > performed by single process, on the other hand in approach(b) the
> > vacuum delay work for each workers independently.
> >
> > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > ms. In single process vacuum the total sleep time is 2,500ms (=
> > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > Because all parallel vacuum workers use the shared balance value and a
> > worker sleeps once the balance value exceeds the limit. In
> > approach(b), since the cost limit is divided evenly the value of each
> > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > processes blocks  evenly,  the total sleep time of all workers is
> > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > compute the sleep time of approach(b) by dividing the total value by
> > the number of parallel workers.
> >
> > IOW the approach(b) makes parallel vacuum delay much more than normal
> > vacuum and parallel vacuum with approach(a) even with the same
> > settings. Which behaviors do we expect? I thought the vacuum delay for
> > parallel vacuum should work as if it's a single process vacuum as we
> > did for memory usage. I might be missing something. If we prefer
> > approach(b) I should change the patch so that the leader process
> > divides the cost limit evenly.
> >
> I have repeated the same test (test1 and test2)[1] with a higher
> shared buffer (1GB).  Currently, I have used the same formula for
> computing the total delay
> heap scan delay + index vacuuming delay / workers.  Because, In my
> opinion, multiple workers are doing I/O here so the total delay should
> also be in multiple
> of the number of workers.  So if we want to compare the delay with the
> sequential vacuum then we should divide total delay by the number of
> workers.  But, I am not
> sure whether computing the total delay is the right way to compute the
> I/O throttling or not.  But, I support the approach (b) for dividing
> the I/O limit because
> auto vacuum workers are already operating with this approach.
>
> test1:
> normal: stats delay 1348.160000, hit 68952, miss 2, dirty 10063, total 79017
> 1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146,
> total 79102 (cost divide patch)
> 2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036,
> total 78994 (cost divide patch)
> 1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066,
> total 92252 (share cost patch)
> 2 worker: stats delay 904.366667, hit 86482, miss 2, dirty 17806,
> total 104290 (share cost patch)
>
> test2:
> normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472
> 1 worker: stats delay 530.700000, hit 36984, miss 2, dirty 3527, total
> 40513 (cost divide patch)
> 2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total
> 40518 (cost divide patch)
> 1 worker: stats delay 490.570000, hit 39090, miss 2, dirty 3497, total
> 42589 (share cost patch)
> 2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total
> 42871 (share cost patch)
>
> So with higher, shared buffers,  I can see with approach (b) we can
> see the same total delay.  With approach (a) I can see a bit less
> total delay.  But, a point to be noted that I have used the same
> formulae for computing the total delay for both the approaches.  But,
> Sawada-san explained in the above mail that it may not be the right
> way to computing the total delay for the approach (a).  But my take is
> that whether we are working with shared cost or we are dividing the
> cost, the delay must be divided by number of workers in the parallel
> phase.
>

Why do you think so?  I think with approach (b) if all the workers are
doing equal amount of I/O, they will probably sleep at the same time
whereas with approach (a) each of them will sleep at different times.
So, probably dividing the delay in approach (b) makes more sense.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Mon, Nov 4, 2019 at 10:45 AM Amit Kapila <> wrote:
>
> On Sun, Nov 3, 2019 at 9:49 AM Dilip Kumar <> wrote:
> >
> > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <> wrote:
> > >
> > >
> > > I think that two approaches make parallel vacuum worker wait in
> > > different way: in approach(a) the vacuum delay works as if vacuum is
> > > performed by single process, on the other hand in approach(b) the
> > > vacuum delay work for each workers independently.
> > >
> > > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > > ms. In single process vacuum the total sleep time is 2,500ms (=
> > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > > Because all parallel vacuum workers use the shared balance value and a
> > > worker sleeps once the balance value exceeds the limit. In
> > > approach(b), since the cost limit is divided evenly the value of each
> > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > > processes blocks  evenly,  the total sleep time of all workers is
> > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > > compute the sleep time of approach(b) by dividing the total value by
> > > the number of parallel workers.
> > >
> > > IOW the approach(b) makes parallel vacuum delay much more than normal
> > > vacuum and parallel vacuum with approach(a) even with the same
> > > settings. Which behaviors do we expect? I thought the vacuum delay for
> > > parallel vacuum should work as if it's a single process vacuum as we
> > > did for memory usage. I might be missing something. If we prefer
> > > approach(b) I should change the patch so that the leader process
> > > divides the cost limit evenly.
> > >
> > I have repeated the same test (test1 and test2)[1] with a higher
> > shared buffer (1GB).  Currently, I have used the same formula for
> > computing the total delay
> > heap scan delay + index vacuuming delay / workers.  Because, In my
> > opinion, multiple workers are doing I/O here so the total delay should
> > also be in multiple
> > of the number of workers.  So if we want to compare the delay with the
> > sequential vacuum then we should divide total delay by the number of
> > workers.  But, I am not
> > sure whether computing the total delay is the right way to compute the
> > I/O throttling or not.  But, I support the approach (b) for dividing
> > the I/O limit because
> > auto vacuum workers are already operating with this approach.
> >
> > test1:
> > normal: stats delay 1348.160000, hit 68952, miss 2, dirty 10063, total 79017
> > 1 worker: stats delay 1349.585000, hit 68954, miss 2, dirty 10146,
> > total 79102 (cost divide patch)
> > 2 worker: stats delay 1341.416141, hit 68956, miss 2, dirty 10036,
> > total 78994 (cost divide patch)
> > 1 worker: stats delay 1025.495000, hit 78184, miss 2, dirty 14066,
> > total 92252 (share cost patch)
> > 2 worker: stats delay 904.366667, hit 86482, miss 2, dirty 17806,
> > total 104290 (share cost patch)
> >
> > test2:
> > normal: stats delay 530.475000, hit 36982, miss 2, dirty 3488, total 40472
> > 1 worker: stats delay 530.700000, hit 36984, miss 2, dirty 3527, total
> > 40513 (cost divide patch)
> > 2 worker: stats delay 530.675000, hit 36984, miss 2, dirty 3532, total
> > 40518 (cost divide patch)
> > 1 worker: stats delay 490.570000, hit 39090, miss 2, dirty 3497, total
> > 42589 (share cost patch)
> > 2 worker: stats delay 480.571667, hit 39050, miss 2, dirty 3819, total
> > 42871 (share cost patch)
> >
> > So with higher, shared buffers,  I can see with approach (b) we can
> > see the same total delay.  With approach (a) I can see a bit less
> > total delay.  But, a point to be noted that I have used the same
> > formulae for computing the total delay for both the approaches.  But,
> > Sawada-san explained in the above mail that it may not be the right
> > way to computing the total delay for the approach (a).  But my take is
> > that whether we are working with shared cost or we are dividing the
> > cost, the delay must be divided by number of workers in the parallel
> > phase.
> >
>
> Why do you think so?  I think with approach (b) if all the workers are
> doing equal amount of I/O, they will probably sleep at the same time
> whereas with approach (a) each of them will sleep at different times.
> So, probably dividing the delay in approach (b) makes more sense.

Just to be clear,  I did not mean that we divide the sleep time for
each worker.  Actually, I meant how to project the total delay in the
test patch.  So I think if we directly want to compare the sleep time
of the sequential vs parallel then it's not fair to just compare the
total sleep time because when multiple workers are working parallelly
shouldn't we need to consider their average sleep time?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Mon, Nov 4, 2019 at 10:32 AM Amit Kapila <> wrote:
>
> On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <> wrote:
> >
> > I think that two approaches make parallel vacuum worker wait in
> > different way: in approach(a) the vacuum delay works as if vacuum is
> > performed by single process, on the other hand in approach(b) the
> > vacuum delay work for each workers independently.
> >
> > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > ms. In single process vacuum the total sleep time is 2,500ms (=
> > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > Because all parallel vacuum workers use the shared balance value and a
> > worker sleeps once the balance value exceeds the limit. In
> > approach(b), since the cost limit is divided evenly the value of each
> > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > processes blocks  evenly,  the total sleep time of all workers is
> > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > compute the sleep time of approach(b) by dividing the total value by
> > the number of parallel workers.
> >
> > IOW the approach(b) makes parallel vacuum delay much more than normal
> > vacuum and parallel vacuum with approach(a) even with the same
> > settings. Which behaviors do we expect?
> >
>
> Yeah, this is an important thing to decide.  I don't think that the
> conclusion you are drawing is correct because it that is true then the
> same applies to the current autovacuum work division where we divide
> the cost_limit among workers but the cost_delay is same (see
> autovac_balance_cost).  Basically, if we consider the delay time of
> each worker independently, then it would appear that a parallel vacuum
> delay with approach (b) is more, but that is true only if the workers
> run serially which is not true.
>
> > I thought the vacuum delay for
> > parallel vacuum should work as if it's a single process vacuum as we
> > did for memory usage. I might be missing something. If we prefer
> > approach(b) I should change the patch so that the leader process
> > divides the cost limit evenly.
> >
>
> I am also not completely sure which approach is better but I slightly
> lean towards approach (b).  I think we need input from some other
> people as well.  I will start a separate thread to discuss this and
> see if that helps to get the input from others.

+1


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Mon, 4 Nov 2019 at 14:02, Amit Kapila <> wrote:
>
> On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <> wrote:
> >
> > I think that two approaches make parallel vacuum worker wait in
> > different way: in approach(a) the vacuum delay works as if vacuum is
> > performed by single process, on the other hand in approach(b) the
> > vacuum delay work for each workers independently.
> >
> > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > ms. In single process vacuum the total sleep time is 2,500ms (=
> > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > Because all parallel vacuum workers use the shared balance value and a
> > worker sleeps once the balance value exceeds the limit. In
> > approach(b), since the cost limit is divided evenly the value of each
> > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > processes blocks  evenly,  the total sleep time of all workers is
> > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > compute the sleep time of approach(b) by dividing the total value by
> > the number of parallel workers.
> >
> > IOW the approach(b) makes parallel vacuum delay much more than normal
> > vacuum and parallel vacuum with approach(a) even with the same
> > settings. Which behaviors do we expect?
> >
>
> Yeah, this is an important thing to decide.  I don't think that the
> conclusion you are drawing is correct because it that is true then the
> same applies to the current autovacuum work division where we divide
> the cost_limit among workers but the cost_delay is same (see
> autovac_balance_cost).  Basically, if we consider the delay time of
> each worker independently, then it would appear that a parallel vacuum
> delay with approach (b) is more, but that is true only if the workers
> run serially which is not true.
>
> > I thought the vacuum delay for
> > parallel vacuum should work as if it's a single process vacuum as we
> > did for memory usage. I might be missing something. If we prefer
> > approach(b) I should change the patch so that the leader process
> > divides the cost limit evenly.
> >
>
> I am also not completely sure which approach is better but I slightly
> lean towards approach (b).

Can we get the same sleep time as approach (b) if we divide the cost
limit by the number of workers and have the shared cost balance (i.e.
approach (a) with dividing the cost limit)? Currently the approach (b)
seems better but I'm concerned that it might unnecessarily delay
vacuum if some indexes are very small or bulk-deletions of indexes
does almost nothing such as brin.

>
>   I think we need input from some other
> people as well.  I will start a separate thread to discuss this and
> see if that helps to get the input from others.

+1

--
Masahiko Sawada  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Mon, Nov 4, 2019 at 1:00 PM Masahiko Sawada
<> wrote:
>
> On Mon, 4 Nov 2019 at 14:02, Amit Kapila <> wrote:
> >
> > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <> wrote:
> > >
> > > I think that two approaches make parallel vacuum worker wait in
> > > different way: in approach(a) the vacuum delay works as if vacuum is
> > > performed by single process, on the other hand in approach(b) the
> > > vacuum delay work for each workers independently.
> > >
> > > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > > ms. In single process vacuum the total sleep time is 2,500ms (=
> > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > > Because all parallel vacuum workers use the shared balance value and a
> > > worker sleeps once the balance value exceeds the limit. In
> > > approach(b), since the cost limit is divided evenly the value of each
> > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > > processes blocks  evenly,  the total sleep time of all workers is
> > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > > compute the sleep time of approach(b) by dividing the total value by
> > > the number of parallel workers.
> > >
> > > IOW the approach(b) makes parallel vacuum delay much more than normal
> > > vacuum and parallel vacuum with approach(a) even with the same
> > > settings. Which behaviors do we expect?
> > >
> >
> > Yeah, this is an important thing to decide.  I don't think that the
> > conclusion you are drawing is correct because it that is true then the
> > same applies to the current autovacuum work division where we divide
> > the cost_limit among workers but the cost_delay is same (see
> > autovac_balance_cost).  Basically, if we consider the delay time of
> > each worker independently, then it would appear that a parallel vacuum
> > delay with approach (b) is more, but that is true only if the workers
> > run serially which is not true.
> >
> > > I thought the vacuum delay for
> > > parallel vacuum should work as if it's a single process vacuum as we
> > > did for memory usage. I might be missing something. If we prefer
> > > approach(b) I should change the patch so that the leader process
> > > divides the cost limit evenly.
> > >
> >
> > I am also not completely sure which approach is better but I slightly
> > lean towards approach (b).
>
> Can we get the same sleep time as approach (b) if we divide the cost
> limit by the number of workers and have the shared cost balance (i.e.
> approach (a) with dividing the cost limit)? Currently the approach (b)
> seems better but I'm concerned that it might unnecessarily delay
> vacuum if some indexes are very small or bulk-deletions of indexes
> does almost nothing such as brin.

Are you worried that some of the workers might not have much I/O to do
but still we divide the cost limit equally? If that is the case then
that is the case with the auto vacuum workers also right?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Mon, 4 Nov 2019 at 17:26, Dilip Kumar <> wrote:
>
> On Mon, Nov 4, 2019 at 1:00 PM Masahiko Sawada
> <> wrote:
> >
> > On Mon, 4 Nov 2019 at 14:02, Amit Kapila <> wrote:
> > >
> > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <> wrote:
> > > >
> > > > I think that two approaches make parallel vacuum worker wait in
> > > > different way: in approach(a) the vacuum delay works as if vacuum is
> > > > performed by single process, on the other hand in approach(b) the
> > > > vacuum delay work for each workers independently.
> > > >
> > > > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > > > ms. In single process vacuum the total sleep time is 2,500ms (=
> > > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > > > Because all parallel vacuum workers use the shared balance value and a
> > > > worker sleeps once the balance value exceeds the limit. In
> > > > approach(b), since the cost limit is divided evenly the value of each
> > > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > > > processes blocks  evenly,  the total sleep time of all workers is
> > > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > > > compute the sleep time of approach(b) by dividing the total value by
> > > > the number of parallel workers.
> > > >
> > > > IOW the approach(b) makes parallel vacuum delay much more than normal
> > > > vacuum and parallel vacuum with approach(a) even with the same
> > > > settings. Which behaviors do we expect?
> > > >
> > >
> > > Yeah, this is an important thing to decide.  I don't think that the
> > > conclusion you are drawing is correct because it that is true then the
> > > same applies to the current autovacuum work division where we divide
> > > the cost_limit among workers but the cost_delay is same (see
> > > autovac_balance_cost).  Basically, if we consider the delay time of
> > > each worker independently, then it would appear that a parallel vacuum
> > > delay with approach (b) is more, but that is true only if the workers
> > > run serially which is not true.
> > >
> > > > I thought the vacuum delay for
> > > > parallel vacuum should work as if it's a single process vacuum as we
> > > > did for memory usage. I might be missing something. If we prefer
> > > > approach(b) I should change the patch so that the leader process
> > > > divides the cost limit evenly.
> > > >
> > >
> > > I am also not completely sure which approach is better but I slightly
> > > lean towards approach (b).
> >
> > Can we get the same sleep time as approach (b) if we divide the cost
> > limit by the number of workers and have the shared cost balance (i.e.
> > approach (a) with dividing the cost limit)? Currently the approach (b)
> > seems better but I'm concerned that it might unnecessarily delay
> > vacuum if some indexes are very small or bulk-deletions of indexes
> > does almost nothing such as brin.
>
> Are you worried that some of the workers might not have much I/O to do
> but still we divide the cost limit equally?

Yes.

> If that is the case then
> that is the case with the auto vacuum workers also right?

I think It is not right because we rebalance the cost after an
autovacuum worker finished. So as Amit mentioned on the new thread we
might need to make parallel vacuum workers notice to the leader once
exited so that it can rebalance the cost.

Regards,

--
Masahiko Sawada      http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Mon, Nov 4, 2019 at 2:11 PM Masahiko Sawada
<> wrote:
>
> On Mon, 4 Nov 2019 at 17:26, Dilip Kumar <> wrote:
> >
> > On Mon, Nov 4, 2019 at 1:00 PM Masahiko Sawada
> > <> wrote:
> > >
> > > On Mon, 4 Nov 2019 at 14:02, Amit Kapila <> wrote:
> > > >
> > > > On Fri, Nov 1, 2019 at 2:21 PM Masahiko Sawada <> wrote:
> > > > >
> > > > > I think that two approaches make parallel vacuum worker wait in
> > > > > different way: in approach(a) the vacuum delay works as if vacuum is
> > > > > performed by single process, on the other hand in approach(b) the
> > > > > vacuum delay work for each workers independently.
> > > > >
> > > > > Suppose that the total number of blocks to vacuum is 10,000 blocks,
> > > > > the cost per blocks is 10, the cost limit is 200 and sleep time is 5
> > > > > ms. In single process vacuum the total sleep time is 2,500ms (=
> > > > > (10,000 * 10 / 200) * 5). The approach (a) is the same, 2,500ms.
> > > > > Because all parallel vacuum workers use the shared balance value and a
> > > > > worker sleeps once the balance value exceeds the limit. In
> > > > > approach(b), since the cost limit is divided evenly the value of each
> > > > > workers is 40 (e.g. when 5 parallel degree). And suppose each workers
> > > > > processes blocks  evenly,  the total sleep time of all workers is
> > > > > 12,500ms (=(2,000 * 10 / 40) * 5 * 5). I think that's why we can
> > > > > compute the sleep time of approach(b) by dividing the total value by
> > > > > the number of parallel workers.
> > > > >
> > > > > IOW the approach(b) makes parallel vacuum delay much more than normal
> > > > > vacuum and parallel vacuum with approach(a) even with the same
> > > > > settings. Which behaviors do we expect?
> > > > >
> > > >
> > > > Yeah, this is an important thing to decide.  I don't think that the
> > > > conclusion you are drawing is correct because it that is true then the
> > > > same applies to the current autovacuum work division where we divide
> > > > the cost_limit among workers but the cost_delay is same (see
> > > > autovac_balance_cost).  Basically, if we consider the delay time of
> > > > each worker independently, then it would appear that a parallel vacuum
> > > > delay with approach (b) is more, but that is true only if the workers
> > > > run serially which is not true.
> > > >
> > > > > I thought the vacuum delay for
> > > > > parallel vacuum should work as if it's a single process vacuum as we
> > > > > did for memory usage. I might be missing something. If we prefer
> > > > > approach(b) I should change the patch so that the leader process
> > > > > divides the cost limit evenly.
> > > > >
> > > >
> > > > I am also not completely sure which approach is better but I slightly
> > > > lean towards approach (b).
> > >
> > > Can we get the same sleep time as approach (b) if we divide the cost
> > > limit by the number of workers and have the shared cost balance (i.e.
> > > approach (a) with dividing the cost limit)? Currently the approach (b)
> > > seems better but I'm concerned that it might unnecessarily delay
> > > vacuum if some indexes are very small or bulk-deletions of indexes
> > > does almost nothing such as brin.
> >
> > Are you worried that some of the workers might not have much I/O to do
> > but still we divide the cost limit equally?
>
> Yes.
>
> > If that is the case then
> > that is the case with the auto vacuum workers also right?
>
> I think It is not right because we rebalance the cost after an
> autovacuum worker finished. So as Amit mentioned on the new thread we
> might need to make parallel vacuum workers notice to the leader once
> exited so that it can rebalance the cost.

I agree that if the auto vacuum worker finishes then we rebalance the
cost and we need to do something similar here.  And, that will be a
bit difficult to implement in parallel vacuum case.

We might need some shared memory array where we can set the worker
status as running as soon as the worker started running.  And, when a
worker exit we can set it false and we can also set some flag saying
we need cost rebalancing.  And, in vacuum_delay_point if we identify
that we need to rebalance then we can process the shared memory array
and find out how many workers are running and based on that we can
rebalance.  Having said that I think for rebalancing we just need a
shared memory counter that how many workers are running.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Mahendra Singh
Дата:
Hi
I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum" mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for all existence test suite.
For reference, I am attaching patch.

What does this patch?
As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test, I used existence guc force_parallel_mode and tested parallel vacuuming.

If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallel workers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option is not given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader in this case), but if there is more than one index, then i am using leader as a worker for one index and launching workers for all other indexes.

After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)

I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode is set as on, or we should use new GUC to test parallel worker vacuum for existence test suite?

Please let me know your thoughts for this patch.

Thanks and Regards
Mahendra Thalor

On Tue, 29 Oct 2019 at 12:37, Masahiko Sawada <> wrote:
On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <> wrote:
>
> On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <> wrote:
> >
> > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > > > > >
> > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > What do you think?
> > > > >
> > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > Otherwise, I will try to write it after finishing the first one
> > > > > (approach b).
> > > > >
> > > > I have come up with the POC for approach (a).
>
> > > Can we compute the overall throttling (sleep time) in the operation
> > > separately for heap and index, then divide the index's sleep_time with
> > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > a bit easier to compare the data between parallel and non-parallel
> > > case.
> I have come up with a patch to compute the total delay during the
> vacuum.  So the idea of computing the total cost delay is
>
> Total cost delay = Total dealy of heap scan + Total dealy of
> index/worker;  Patch is attached for the same.
>
> I have prepared this patch on the latest patch of the parallel
> vacuum[1].  I have also rebased the patch for the approach [b] for
> dividing the vacuum cost limit and done some testing for computing the
> I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> rebased on top of v31-0006, because v31-0006 is implementing the I/O
> throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> doing the same with another approach.   But,
> 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> well (just 1-2 lines conflict).
>
> Testing:  I have performed 2 tests, one with the same size indexes and
> second with the different size indexes and measured total I/O delay
> with the attached patch.
>
> Setup:
> VacuumCostDelay=10ms
> VacuumCostLimit=2000
>
> Test1 (Same size index):
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b);
> create index idx3 on test(c);
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
>                       Vacuum (Head)                   Parallel Vacuum
>            Vacuum Cost Divide Patch
> Total Delay        1784 (ms)                           1398(ms)
>                  1938(ms)
>
>
> Test2 (Variable size dead tuple in index)
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b) where a > 100000;
> create index idx3 on test(c) where a > 150000;
>
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
> Vacuum (Head)                                   Parallel Vacuum
>               Vacuum Cost Divide Patch
> Total Delay 1438 (ms)                               1029(ms)
>                    1529(ms)
>
>
> Conclusion:
> 1. The tests prove that the total I/O delay is significantly less with
> the parallel vacuum.
> 2. With the vacuum cost divide the problem is solved but the delay bit
> more compared to the non-parallel version.  The reason could be the
> problem discussed at[2], but it needs further investigation.
>
> Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> will also try to test different types of indexes.
>

Thank you for testing!

I realized that v31-0006 patch doesn't work fine so I've attached the
updated version patch that also incorporated some comments I got so
far. Sorry for the inconvenience. I'll apply your 0001 patch and also
test the total delay time.

Regards,

--
Masahiko Sawada
Вложения

Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <> wrote:
>
> Hi
> I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist
vacuum"mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for
allexistence test suite. 
> For reference, I am attaching patch.
>
> What does this patch?
> As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to
test,I used existence guc force_parallel_mode and tested parallel vacuuming. 
>
> If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use
parallelworkers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option
isnot given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader
inthis case), but if there is more than one index, then i am using leader as a worker for one index and launching
workersfor all other indexes. 
>
> After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)
>
> I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode
isset as on, or we should use new GUC to test parallel worker vacuum for existence test suite? 

IMHO, with force_parallel_mode=on we don't need to do anything here
because that is useful for normal query parallelism where if the user
thinks that the parallel plan should have been selected by the planer
but planer did not select the parallel plan then the user can force
and check.  But, vacuum parallelism is itself forced by the user so
there is no point in doing it with force_parallel_mode=on.   However,
force_parallel_mode=regress is useful for testing the vacuum with an
existing test suit.

>
> Please let me know your thoughts for this patch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <> wrote:
>
> On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <> wrote:
> >
> > Hi
> > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist
vacuum"mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for
allexistence test suite. 

Thank you for looking at this patch!

> > For reference, I am attaching patch.
> >
> > What does this patch?
> > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to
test,I used existence guc force_parallel_mode and tested parallel vacuuming. 
> >
> > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use
parallelworkers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option
isnot given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader
inthis case), but if there is more than one index, then i am using leader as a worker for one index and launching
workersfor all other indexes. 
> >
> > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)
> >
> > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode
isset as on, or we should use new GUC to test parallel worker vacuum for existence test suite? 
>
> IMHO, with force_parallel_mode=on we don't need to do anything here
> because that is useful for normal query parallelism where if the user
> thinks that the parallel plan should have been selected by the planer
> but planer did not select the parallel plan then the user can force
> and check.  But, vacuum parallelism is itself forced by the user so
> there is no point in doing it with force_parallel_mode=on.

Yeah I think so too. force_parallel_mode is a planner parameter and
parallel vacuum can be forced by vacuum option.

>  However,
> force_parallel_mode=regress is useful for testing the vacuum with an
> existing test suit.

If we want to control the leader participation by GUC parameter I
think we would need to have another GUC parameter rather than using
force_parallel_mode. And it's useful if we can use the parameter for
parallel CREATE INDEX as well. But it should be a separate patch.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Wed, Nov 6, 2019 at 3:50 PM Masahiko Sawada
<> wrote:
>
> On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <> wrote:
> >
> > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <> wrote:
> > >
> > > Hi
> > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist
vacuum"mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for
allexistence test suite. 
>
> Thank you for looking at this patch!
>
> > > For reference, I am attaching patch.
> > >
> > > What does this patch?
> > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to
test,I used existence guc force_parallel_mode and tested parallel vacuuming. 
> > >
> > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use
parallelworkers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option
isnot given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader
inthis case), but if there is more than one index, then i am using leader as a worker for one index and launching
workersfor all other indexes. 
> > >
> > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check
world)
> > >
> > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if
force_parallel_modeis set as on, or we should use new GUC to test parallel worker vacuum for existence test suite? 
> >
> > IMHO, with force_parallel_mode=on we don't need to do anything here
> > because that is useful for normal query parallelism where if the user
> > thinks that the parallel plan should have been selected by the planer
> > but planer did not select the parallel plan then the user can force
> > and check.  But, vacuum parallelism is itself forced by the user so
> > there is no point in doing it with force_parallel_mode=on.
>
> Yeah I think so too. force_parallel_mode is a planner parameter and
> parallel vacuum can be forced by vacuum option.
>
> >  However,
> > force_parallel_mode=regress is useful for testing the vacuum with an
> > existing test suit.
>
> If we want to control the leader participation by GUC parameter I
> think we would need to have another GUC parameter rather than using
> force_parallel_mode.
I think the purpose is not to disable the leader participation,
instead, I think the purpose of 'force_parallel_mode=regress' is that
without changing the existing test suit we can execute the existing
vacuum commands in the test suit with the worker.  I did not study the
patch but the idea should be that if "force_parallel_mode=regress"
then normal vacuum command should be executed in parallel by using 1
worker.

 And it's useful if we can use the parameter for
> parallel CREATE INDEX as well. But it should be a separate patch.
>

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Wed, 6 Nov 2019 at 20:29, Dilip Kumar <> wrote:
>
> On Wed, Nov 6, 2019 at 3:50 PM Masahiko Sawada
> <> wrote:
> >
> > On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <> wrote:
> > >
> > > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <> wrote:
> > > >
> > > > Hi
> > > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist
vacuum"mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for
allexistence test suite. 
> >
> > Thank you for looking at this patch!
> >
> > > > For reference, I am attaching patch.
> > > >
> > > > What does this patch?
> > > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So
totest, I used existence guc force_parallel_mode and tested parallel vacuuming. 
> > > >
> > > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use
parallelworkers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option
isnot given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader
inthis case), but if there is more than one index, then i am using leader as a worker for one index and launching
workersfor all other indexes. 
> > > >
> > > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check
world)
> > > >
> > > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if
force_parallel_modeis set as on, or we should use new GUC to test parallel worker vacuum for existence test suite? 
> > >
> > > IMHO, with force_parallel_mode=on we don't need to do anything here
> > > because that is useful for normal query parallelism where if the user
> > > thinks that the parallel plan should have been selected by the planer
> > > but planer did not select the parallel plan then the user can force
> > > and check.  But, vacuum parallelism is itself forced by the user so
> > > there is no point in doing it with force_parallel_mode=on.
> >
> > Yeah I think so too. force_parallel_mode is a planner parameter and
> > parallel vacuum can be forced by vacuum option.
> >
> > >  However,
> > > force_parallel_mode=regress is useful for testing the vacuum with an
> > > existing test suit.
> >
> > If we want to control the leader participation by GUC parameter I
> > think we would need to have another GUC parameter rather than using
> > force_parallel_mode.
> I think the purpose is not to disable the leader participation,
> instead, I think the purpose of 'force_parallel_mode=regress' is that
> without changing the existing test suit we can execute the existing
> vacuum commands in the test suit with the worker.  I did not study the
> patch but the idea should be that if "force_parallel_mode=regress"
> then normal vacuum command should be executed in parallel by using 1
> worker.

Oh I got it. Considering the current parallel vacuum design I'm not
sure that we can cover more test cases by forcing parallel vacuum
during existing test suite because most of these would be tables with
several indexes and one index vacuum cycle. It might be better to add
more test cases for parallel vacuum.

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Wed, 6 Nov 2019, 20:07 Masahiko Sawada, <> wrote:
On Wed, 6 Nov 2019 at 20:29, Dilip Kumar <> wrote:
>
> On Wed, Nov 6, 2019 at 3:50 PM Masahiko Sawada
> <> wrote:
> >
> > On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <> wrote:
> > >
> > > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <> wrote:
> > > >
> > > > Hi
> > > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum" mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for all existence test suite.
> >
> > Thank you for looking at this patch!
> >
> > > > For reference, I am attaching patch.
> > > >
> > > > What does this patch?
> > > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test, I used existence guc force_parallel_mode and tested parallel vacuuming.
> > > >
> > > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallel workers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option is not given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader in this case), but if there is more than one index, then i am using leader as a worker for one index and launching workers for all other indexes.
> > > >
> > > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)
> > > >
> > > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode is set as on, or we should use new GUC to test parallel worker vacuum for existence test suite?
> > >
> > > IMHO, with force_parallel_mode=on we don't need to do anything here
> > > because that is useful for normal query parallelism where if the user
> > > thinks that the parallel plan should have been selected by the planer
> > > but planer did not select the parallel plan then the user can force
> > > and check.  But, vacuum parallelism is itself forced by the user so
> > > there is no point in doing it with force_parallel_mode=on.
> >
> > Yeah I think so too. force_parallel_mode is a planner parameter and
> > parallel vacuum can be forced by vacuum option.
> >
> > >  However,
> > > force_parallel_mode=regress is useful for testing the vacuum with an
> > > existing test suit.
> >
> > If we want to control the leader participation by GUC parameter I
> > think we would need to have another GUC parameter rather than using
> > force_parallel_mode.
> I think the purpose is not to disable the leader participation,
> instead, I think the purpose of 'force_parallel_mode=regress' is that
> without changing the existing test suit we can execute the existing
> vacuum commands in the test suit with the worker.  I did not study the
> patch but the idea should be that if "force_parallel_mode=regress"
> then normal vacuum command should be executed in parallel by using 1
> worker.

Oh I got it. Considering the current parallel vacuum design I'm not
sure that we can cover more test cases by forcing parallel vacuum
during existing test suite because most of these would be tables with
several indexes and one index vacuum cycle.
Oh sure, but still it would be good to get them tested with the parallel vacuum.
 
It might be better to add
more test cases for parallel vacuum.

 I agree that it would be good to add additional test cases.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
>
> I realized that v31-0006 patch doesn't work fine so I've attached the
> updated version patch that also incorporated some comments I got so
> far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> test the total delay time.
>

+ /*
+ * Generally index cleanup does not scan the index when index
+ * vacuuming (ambulkdelete) was already performed.  So we perform
+ * index cleanup with parallel workers only if we have not
+ * performed index vacuuming yet.  Otherwise, we do it in the
+ * leader process alone.
+ */
+ if (vacrelstats->num_index_scans == 0)
+ lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
+ stats, lps);

Today, I was thinking about this point where this check will work for
most cases, but still, exceptions are there like for brin index, the
main work is done in amvacuumcleanup function.  Similarly, I think
there are few more indexes like gin, bloom where it seems we take
another pass over-index in the amvacuumcleanup phase.  Don't you think
we should try to allow parallel workers for such cases?  If so, I
don't have any great ideas on how to do that, but what comes to my
mind is to indicate that via stats (
IndexBulkDeleteResult) or via an indexam API.  I am not sure if it is
acceptable to have indexam API for this.

Thoughts?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Mahendra Singh
Дата:
Thanks Masahiko san and Dilip for looking into this patch.

In previous patch, when 'force_parallel_mode=regress', I was doing all the vacuum using multiple workers but we should do all the vacuuming using only 1 worker(leader should not participate in vacuuming). So attaching patch for same.

What does this patch?
If 'force_parallel_mode=regress' and parallel option is not given with vacuum, then all the vacuuming work will be done by one single worker and leader will not participate.  But if parallel option is given with vacuum, then preference will be given to specified degree.

After applying this patch, all the test cases are passing(make check-world) and I can't see any improvement in code coverage with this patch.

Please let me know your thoughts for this patch.

Thanks and Regards
Mahendra Thalor


On Wed, 6 Nov 2019 at 16:59, Dilip Kumar <> wrote:
On Wed, Nov 6, 2019 at 3:50 PM Masahiko Sawada
<> wrote:
>
> On Wed, 6 Nov 2019 at 18:42, Dilip Kumar <> wrote:
> >
> > On Wed, Nov 6, 2019 at 2:01 PM Mahendra Singh <> wrote:
> > >
> > > Hi
> > > I took all attached patches(v32-01 to v32-4) and one Dilip's patch from "Questions/Observations related to Gist vacuum" mail thread. On the top of all these patches, I created one more patch to test parallel vacuum functionally for all existence test suite.
>
> Thank you for looking at this patch!
>
> > > For reference, I am attaching patch.
> > >
> > > What does this patch?
> > > As we know that if we give parallel option with vacuum, then only we are vacuuming using parallel workers. So to test, I used existence guc force_parallel_mode and tested parallel vacuuming.
> > >
> > > If force_parallel_mode is set as regress, then if parallel option is not given with vacuum, I am forcing to use parallel workers for vacuum. If there is only one index and parallel degree is not given with vacuum(or parallel option is not given), and force_parallel_mode = regress, then I am launching one parallel worker(I am not doing work by leader in this case), but if there is more than one index, then i am using leader as a worker for one index and launching workers for all other indexes.
> > >
> > > After applying this patch and setting force_parallel_mode = regress, all test cases are passing (make-check world)
> > >
> > > I have some questions regarding my patch. Should we do vacuuming using parallel workers even if force_parallel_mode is set as on, or we should use new GUC to test parallel worker vacuum for existence test suite?
> >
> > IMHO, with force_parallel_mode=on we don't need to do anything here
> > because that is useful for normal query parallelism where if the user
> > thinks that the parallel plan should have been selected by the planer
> > but planer did not select the parallel plan then the user can force
> > and check.  But, vacuum parallelism is itself forced by the user so
> > there is no point in doing it with force_parallel_mode=on.
>
> Yeah I think so too. force_parallel_mode is a planner parameter and
> parallel vacuum can be forced by vacuum option.
>
> >  However,
> > force_parallel_mode=regress is useful for testing the vacuum with an
> > existing test suit.
>
> If we want to control the leader participation by GUC parameter I
> think we would need to have another GUC parameter rather than using
> force_parallel_mode.
I think the purpose is not to disable the leader participation,
instead, I think the purpose of 'force_parallel_mode=regress' is that
without changing the existing test suit we can execute the existing
vacuum commands in the test suit with the worker.  I did not study the
patch but the idea should be that if "force_parallel_mode=regress"
then normal vacuum command should be executed in parallel by using 1
worker.

 And it's useful if we can use the parameter for
> parallel CREATE INDEX as well. But it should be a separate patch.
>

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Вложения

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Fri, 8 Nov 2019 at 18:48, Amit Kapila <> wrote:
>
> On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> >
> > I realized that v31-0006 patch doesn't work fine so I've attached the
> > updated version patch that also incorporated some comments I got so
> > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > test the total delay time.
> >
>
> + /*
> + * Generally index cleanup does not scan the index when index
> + * vacuuming (ambulkdelete) was already performed.  So we perform
> + * index cleanup with parallel workers only if we have not
> + * performed index vacuuming yet.  Otherwise, we do it in the
> + * leader process alone.
> + */
> + if (vacrelstats->num_index_scans == 0)
> + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> + stats, lps);
>
> Today, I was thinking about this point where this check will work for
> most cases, but still, exceptions are there like for brin index, the
> main work is done in amvacuumcleanup function.  Similarly, I think
> there are few more indexes like gin, bloom where it seems we take
> another pass over-index in the amvacuumcleanup phase.  Don't you think
> we should try to allow parallel workers for such cases?  If so, I
> don't have any great ideas on how to do that, but what comes to my
> mind is to indicate that via stats (
> IndexBulkDeleteResult) or via an indexam API.  I am not sure if it is
> acceptable to have indexam API for this.
>
> Thoughts?

Good point. gin and bloom do a certain heavy work during cleanup and
during bulkdelete as you mentioned. Brin does it during cleanup, and
hash and gist do it during bulkdelete. There are three types of index
AM just inside postgres code. An idea I came up with is that we can
control parallel vacuum and parallel cleanup separately.  That is,
adding a variable amcanparallelcleanup and we can do parallel cleanup
on only indexes of which amcanparallelcleanup is true. IndexBulkDelete
can be stored locally if both amcanparallelvacuum and
amcanparallelcleanup of an index are false because only the leader
process deals with such indexes. Otherwise we need to store it in DSM
as always.

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada
<> wrote:
>
> On Fri, 8 Nov 2019 at 18:48, Amit Kapila <> wrote:
> >
> > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> > >
> > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > updated version patch that also incorporated some comments I got so
> > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > test the total delay time.
> > >
> >
> > + /*
> > + * Generally index cleanup does not scan the index when index
> > + * vacuuming (ambulkdelete) was already performed.  So we perform
> > + * index cleanup with parallel workers only if we have not
> > + * performed index vacuuming yet.  Otherwise, we do it in the
> > + * leader process alone.
> > + */
> > + if (vacrelstats->num_index_scans == 0)
> > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> > + stats, lps);
> >
> > Today, I was thinking about this point where this check will work for
> > most cases, but still, exceptions are there like for brin index, the
> > main work is done in amvacuumcleanup function.  Similarly, I think
> > there are few more indexes like gin, bloom where it seems we take
> > another pass over-index in the amvacuumcleanup phase.  Don't you think
> > we should try to allow parallel workers for such cases?  If so, I
> > don't have any great ideas on how to do that, but what comes to my
> > mind is to indicate that via stats (
> > IndexBulkDeleteResult) or via an indexam API.  I am not sure if it is
> > acceptable to have indexam API for this.
> >
> > Thoughts?
>
> Good point. gin and bloom do a certain heavy work during cleanup and
> during bulkdelete as you mentioned. Brin does it during cleanup, and
> hash and gist do it during bulkdelete. There are three types of index
> AM just inside postgres code. An idea I came up with is that we can
> control parallel vacuum and parallel cleanup separately.  That is,
> adding a variable amcanparallelcleanup and we can do parallel cleanup
> on only indexes of which amcanparallelcleanup is true. IndexBulkDelete
> can be stored locally if both amcanparallelvacuum and
> amcanparallelcleanup of an index are false because only the leader
> process deals with such indexes. Otherwise we need to store it in DSM
> as always.
>
IIUC,  amcanparallelcleanup will be true for those indexes which does
heavy work during cleanup irrespective of whether bulkdelete is called
or not e.g. gin? If so, along with an amcanparallelcleanup flag, we
need to consider vacrelstats->num_index_scans right? So if
vacrelstats->num_index_scans == 0 then we need to launch parallel
worker for all the indexes who support amcanparallelvacuum and if
vacrelstats->num_index_scans > 0 then only for those who has
amcanparallelcleanup as true.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <> wrote:
>
> On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada
> <> wrote:
> >
> > On Fri, 8 Nov 2019 at 18:48, Amit Kapila <> wrote:
> > >
> > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> > > >
> > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > updated version patch that also incorporated some comments I got so
> > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > test the total delay time.
> > > >
> > >
> > > + /*
> > > + * Generally index cleanup does not scan the index when index
> > > + * vacuuming (ambulkdelete) was already performed.  So we perform
> > > + * index cleanup with parallel workers only if we have not
> > > + * performed index vacuuming yet.  Otherwise, we do it in the
> > > + * leader process alone.
> > > + */
> > > + if (vacrelstats->num_index_scans == 0)
> > > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> > > + stats, lps);
> > >
> > > Today, I was thinking about this point where this check will work for
> > > most cases, but still, exceptions are there like for brin index, the
> > > main work is done in amvacuumcleanup function.  Similarly, I think
> > > there are few more indexes like gin, bloom where it seems we take
> > > another pass over-index in the amvacuumcleanup phase.  Don't you think
> > > we should try to allow parallel workers for such cases?  If so, I
> > > don't have any great ideas on how to do that, but what comes to my
> > > mind is to indicate that via stats (
> > > IndexBulkDeleteResult) or via an indexam API.  I am not sure if it is
> > > acceptable to have indexam API for this.
> > >
> > > Thoughts?
> >
> > Good point. gin and bloom do a certain heavy work during cleanup and
> > during bulkdelete as you mentioned. Brin does it during cleanup, and
> > hash and gist do it during bulkdelete. There are three types of index
> > AM just inside postgres code. An idea I came up with is that we can
> > control parallel vacuum and parallel cleanup separately.  That is,
> > adding a variable amcanparallelcleanup and we can do parallel cleanup
> > on only indexes of which amcanparallelcleanup is true. IndexBulkDelete
> > can be stored locally if both amcanparallelvacuum and
> > amcanparallelcleanup of an index are false because only the leader
> > process deals with such indexes. Otherwise we need to store it in DSM
> > as always.
> >
> IIUC,  amcanparallelcleanup will be true for those indexes which does
> heavy work during cleanup irrespective of whether bulkdelete is called
> or not e.g. gin?

Yes, I guess that gin and brin set amcanparallelcleanup to true (gin
might set amcanparallevacuum to true as well).

>  If so, along with an amcanparallelcleanup flag, we
> need to consider vacrelstats->num_index_scans right? So if
> vacrelstats->num_index_scans == 0 then we need to launch parallel
> worker for all the indexes who support amcanparallelvacuum and if
> vacrelstats->num_index_scans > 0 then only for those who has
> amcanparallelcleanup as true.

Yes, you're right. But this won't work fine for brin indexes who don't
want to participate in parallel vacuum but always want to participate
in parallel cleanup.

After more thoughts, I think we can have a ternary value: never,
always, once. If it's 'never' the index never participates in parallel
cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
index always participates regardless of vacrelstats->num_index_scan. I
guess gin, brin and bloom use 'always'. Finally if it's 'once' the
index participates in parallel cleanup only when it's the first time
(that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
spgist use 'once'.

Regards,


--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
<> wrote:
>
> On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <> wrote:
> >
> > On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada
> > <> wrote:
> > >
> > > On Fri, 8 Nov 2019 at 18:48, Amit Kapila <> wrote:
> > > >
> > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> > > > >
> > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > updated version patch that also incorporated some comments I got so
> > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > test the total delay time.
> > > > >
> > > >
> > > > + /*
> > > > + * Generally index cleanup does not scan the index when index
> > > > + * vacuuming (ambulkdelete) was already performed.  So we perform
> > > > + * index cleanup with parallel workers only if we have not
> > > > + * performed index vacuuming yet.  Otherwise, we do it in the
> > > > + * leader process alone.
> > > > + */
> > > > + if (vacrelstats->num_index_scans == 0)
> > > > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> > > > + stats, lps);
> > > >
> > > > Today, I was thinking about this point where this check will work for
> > > > most cases, but still, exceptions are there like for brin index, the
> > > > main work is done in amvacuumcleanup function.  Similarly, I think
> > > > there are few more indexes like gin, bloom where it seems we take
> > > > another pass over-index in the amvacuumcleanup phase.  Don't you think
> > > > we should try to allow parallel workers for such cases?  If so, I
> > > > don't have any great ideas on how to do that, but what comes to my
> > > > mind is to indicate that via stats (
> > > > IndexBulkDeleteResult) or via an indexam API.  I am not sure if it is
> > > > acceptable to have indexam API for this.
> > > >
> > > > Thoughts?
> > >
> > > Good point. gin and bloom do a certain heavy work during cleanup and
> > > during bulkdelete as you mentioned. Brin does it during cleanup, and
> > > hash and gist do it during bulkdelete. There are three types of index
> > > AM just inside postgres code. An idea I came up with is that we can
> > > control parallel vacuum and parallel cleanup separately.  That is,
> > > adding a variable amcanparallelcleanup and we can do parallel cleanup
> > > on only indexes of which amcanparallelcleanup is true. IndexBulkDelete
> > > can be stored locally if both amcanparallelvacuum and
> > > amcanparallelcleanup of an index are false because only the leader
> > > process deals with such indexes. Otherwise we need to store it in DSM
> > > as always.
> > >
> > IIUC,  amcanparallelcleanup will be true for those indexes which does
> > heavy work during cleanup irrespective of whether bulkdelete is called
> > or not e.g. gin?
>
> Yes, I guess that gin and brin set amcanparallelcleanup to true (gin
> might set amcanparallevacuum to true as well).
>
> >  If so, along with an amcanparallelcleanup flag, we
> > need to consider vacrelstats->num_index_scans right? So if
> > vacrelstats->num_index_scans == 0 then we need to launch parallel
> > worker for all the indexes who support amcanparallelvacuum and if
> > vacrelstats->num_index_scans > 0 then only for those who has
> > amcanparallelcleanup as true.
>
> Yes, you're right. But this won't work fine for brin indexes who don't
> want to participate in parallel vacuum but always want to participate
> in parallel cleanup.
Yeah, right.
>
> After more thoughts, I think we can have a ternary value: never,
> always, once. If it's 'never' the index never participates in parallel
> cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> index always participates regardless of vacrelstats->num_index_scan. I
> guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> index participates in parallel cleanup only when it's the first time
> (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> spgist use 'once'.
Yeah, this make sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> I realized that v31-0006 patch doesn't work fine so I've attached the
> updated version patch that also incorporated some comments I got so
> far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> test the total delay time.
>
While reviewing the 0002, I got one doubt related to how we are
dividing the maintainance_work_mem

+prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
+{
+ /* Compute the new maitenance_work_mem value for index vacuuming */
+ lvshared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
maintenance_work_mem;
+}
Is it fair to just consider the number of indexes which use
maintenance_work_mem?  Or we need to consider the number of worker as
well.  My point is suppose there are 10 indexes which will use the
maintenance_work_mem but we are launching just 2 workers then what is
the point in dividing the maintenance_work_mem by 10.

IMHO the calculation should be like this
lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
maintenance_work_mem;

Am I missing something?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Mahendra Singh
Дата:
Hi All,
I did some performance testing with the help of Dilip to test normal vacuum and parallel vacuum. Below is the test summary-

Configuration settings:
autovacuum = off
shared_buffers = 2GB
max_parallel_maintenance_workers = 6

Test  1: (table has 4 index on all tuples and deleting alternative tuples)
create table test(a int, b int, c int, d int, e int, f int, g int, h int); create index i1 on test (a); create index i2 on test (b); create index i3 on test (c); create index i4 on test (d); insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i; delete from test where a %2=0;

case 1: (run normal vacuum)
vacuum test;
1019.453 ms

Case 2: (run vacuum with 1 parallel degree)
vacuum (parallel 1) test;
765.366 ms

Case 3:(run vacuum with 3 parallel degree)
vacuum (parallel 3) test;
555.227 ms

From above results, we can concluded that with the help of parallel vacuum, performance is increased for large indexes.

Test 2:(table has 16 indexes and indexes are small , deleting alternative tuples)
create table test(a int, b int, c int, d int, e int, f int, g int, h int);
create index i1 on test (a) where a < 100000;
create index i2 on test (a) where a > 100000 and a < 200000;
create index i3 on test (a) where a > 200000 and a < 300000;
create index i4 on test (a) where a > 300000 and a < 400000;
create index i5 on test (a) where a > 400000 and a < 500000;
create index i6 on test (a) where a > 500000 and a < 600000;
create index i7 on test (b) where a < 100000;
create index i8 on test (c) where a < 100000;
create index i9 on test (d) where a < 100000;
create index i10 on test (d) where a < 100000;
create index i11 on test (d) where a < 100000;
create index i12 on test (d) where a < 100000;
create index i13 on test (d) where a < 100000;
create index i14 on test (d) where a < 100000;
create index i15 on test (d) where a < 100000;
create index i16 on test (d) where a < 100000;
insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
delete from test where a %2=0;

case 1: (run normal vacuum)
vacuum test;
649.187 ms

Case 2: (run vacuum with 1 parallel degree)
vacuum (parallel 1) test;
492.075 ms

Case 3:(run vacuum with 3 parallel degree)
vacuum (parallel 3) test;
435.581 ms

For small indexes also, we gained some performance by parallel vacuum.

I will continue my testing for stats collection.

Please let me know, if anybody has any suggestion for other testing(What should be tested).

Thanks and Regards
Mahendra Thalor

On Tue, 29 Oct 2019 at 12:37, Masahiko Sawada <> wrote:
On Mon, Oct 28, 2019 at 2:13 PM Dilip Kumar <> wrote:
>
> On Thu, Oct 24, 2019 at 4:33 PM Dilip Kumar <> wrote:
> >
> > On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <> wrote:
> > >
> > > On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <> wrote:
> > > >
> > > > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <> wrote:
> > > > >
> > > > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <> wrote:
> > > > > >
> > > > > > I am thinking if we can write the patch for both the approaches (a.
> > > > > > compute shared costs and try to delay based on that, b. try to divide
> > > > > > the I/O cost among workers as described in the email above[1]) and do
> > > > > > some tests to see the behavior of throttling, that might help us in
> > > > > > deciding what is the best strategy to solve this problem, if any.
> > > > > > What do you think?
> > > > >
> > > > > I agree with this idea.  I can come up with a POC patch for approach
> > > > > (b).  Meanwhile, if someone is interested to quickly hack with the
> > > > > approach (a) then we can do some testing and compare.  Sawada-san,
> > > > > by any chance will you be interested to write POC with approach (a)?
> > > > > Otherwise, I will try to write it after finishing the first one
> > > > > (approach b).
> > > > >
> > > > I have come up with the POC for approach (a).
>
> > > Can we compute the overall throttling (sleep time) in the operation
> > > separately for heap and index, then divide the index's sleep_time with
> > > a number of workers and add it to heap's sleep time?  Then, it will be
> > > a bit easier to compare the data between parallel and non-parallel
> > > case.
> I have come up with a patch to compute the total delay during the
> vacuum.  So the idea of computing the total cost delay is
>
> Total cost delay = Total dealy of heap scan + Total dealy of
> index/worker;  Patch is attached for the same.
>
> I have prepared this patch on the latest patch of the parallel
> vacuum[1].  I have also rebased the patch for the approach [b] for
> dividing the vacuum cost limit and done some testing for computing the
> I/O throttling.  Attached patches 0001-POC-compute-total-cost-delay
> and 0002-POC-divide-vacuum-cost-limit can be applied on top of
> v31-0005-Add-paralell-P-option-to-vacuumdb-command.patch.  I haven't
> rebased on top of v31-0006, because v31-0006 is implementing the I/O
> throttling with one approach and 0002-POC-divide-vacuum-cost-limit is
> doing the same with another approach.   But,
> 0001-POC-compute-total-cost-delay can be applied on top of v31-0006 as
> well (just 1-2 lines conflict).
>
> Testing:  I have performed 2 tests, one with the same size indexes and
> second with the different size indexes and measured total I/O delay
> with the attached patch.
>
> Setup:
> VacuumCostDelay=10ms
> VacuumCostLimit=2000
>
> Test1 (Same size index):
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b);
> create index idx3 on test(c);
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
>                       Vacuum (Head)                   Parallel Vacuum
>            Vacuum Cost Divide Patch
> Total Delay        1784 (ms)                           1398(ms)
>                  1938(ms)
>
>
> Test2 (Variable size dead tuple in index)
> create table test(a int, b varchar, c varchar);
> create index idx1 on test(a);
> create index idx2 on test(b) where a > 100000;
> create index idx3 on test(c) where a > 150000;
>
> insert into test select i, repeat('a',30)||i, repeat('a',20)||i from
> generate_series(1,500000) as i;
> delete from test where a < 200000;
>
> Vacuum (Head)                                   Parallel Vacuum
>               Vacuum Cost Divide Patch
> Total Delay 1438 (ms)                               1029(ms)
>                    1529(ms)
>
>
> Conclusion:
> 1. The tests prove that the total I/O delay is significantly less with
> the parallel vacuum.
> 2. With the vacuum cost divide the problem is solved but the delay bit
> more compared to the non-parallel version.  The reason could be the
> problem discussed at[2], but it needs further investigation.
>
> Next, I will test with the v31-0006 (shared vacuum cost) patch.  I
> will also try to test different types of indexes.
>

Thank you for testing!

I realized that v31-0006 patch doesn't work fine so I've attached the
updated version patch that also incorporated some comments I got so
far. Sorry for the inconvenience. I'll apply your 0001 patch and also
test the total delay time.

Regards,

--
Masahiko Sawada

Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
<> wrote:
>
> On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <> wrote:
> >
> > On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada
> > <> wrote:
> > >
> > > Good point. gin and bloom do a certain heavy work during cleanup and
> > > during bulkdelete as you mentioned. Brin does it during cleanup, and
> > > hash and gist do it during bulkdelete. There are three types of index
> > > AM just inside postgres code. An idea I came up with is that we can
> > > control parallel vacuum and parallel cleanup separately.  That is,
> > > adding a variable amcanparallelcleanup and we can do parallel cleanup
> > > on only indexes of which amcanparallelcleanup is true.
> > >

This is what I mentioned in my email as a second option (whether to
expose via IndexAM).  I am not sure if we can have a new variable just
for this.

> > > IndexBulkDelete
> > > can be stored locally if both amcanparallelvacuum and
> > > amcanparallelcleanup of an index are false because only the leader
> > > process deals with such indexes. Otherwise we need to store it in DSM
> > > as always.
> > >
> > IIUC,  amcanparallelcleanup will be true for those indexes which does
> > heavy work during cleanup irrespective of whether bulkdelete is called
> > or not e.g. gin?
>
> Yes, I guess that gin and brin set amcanparallelcleanup to true (gin
> might set amcanparallevacuum to true as well).
>
> >  If so, along with an amcanparallelcleanup flag, we
> > need to consider vacrelstats->num_index_scans right? So if
> > vacrelstats->num_index_scans == 0 then we need to launch parallel
> > worker for all the indexes who support amcanparallelvacuum and if
> > vacrelstats->num_index_scans > 0 then only for those who has
> > amcanparallelcleanup as true.
>
> Yes, you're right. But this won't work fine for brin indexes who don't
> want to participate in parallel vacuum but always want to participate
> in parallel cleanup.
>
> After more thoughts, I think we can have a ternary value: never,
> always, once. If it's 'never' the index never participates in parallel
> cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> index always participates regardless of vacrelstats->num_index_scan. I
> guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> index participates in parallel cleanup only when it's the first time
> (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> spgist use 'once'.
>

I think this 'once' option is confusing especially because it also
depends on 'num_index_scans' which the IndexAM has no control over.
It might be that the option name is not good, but I am not sure.
Another thing is that for brin indexes, we don't want bulkdelete to
participate in parallelism.  Do we want to have separate variables for
ambulkdelete and amvacuumcleanup which decides whether the particular
phase can be done in parallel?  Another possibility could be to just
have one variable (say uint16 amparallelvacuum) which will tell us all
the options but I don't think that will be a popular approach
considering all the other methods and variables exposed.  What do you
think?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Mon, Nov 11, 2019 at 2:53 PM Mahendra Singh <> wrote:
>
>
> For small indexes also, we gained some performance by parallel vacuum.
>

Thanks for doing all these tests.  It is clear with this and previous
tests that this patch has benefit in wide variety of cases.  However,
we should try to see some worst cases as well.  For example, if there
are multiple indexes on a table and only one of them is large whereas
all others are very small say having a few 100 or 1000 rows.

Note: Please don't use the top-posting style to reply.  Here, we use
inline reply.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Mon, 11 Nov 2019 at 19:29, Amit Kapila <> wrote:
>
> On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
> <> wrote:
> >
> > On Mon, 11 Nov 2019 at 15:06, Dilip Kumar <> wrote:
> > >
> > > On Mon, Nov 11, 2019 at 9:57 AM Masahiko Sawada
> > > <> wrote:
> > > >
> > > > Good point. gin and bloom do a certain heavy work during cleanup and
> > > > during bulkdelete as you mentioned. Brin does it during cleanup, and
> > > > hash and gist do it during bulkdelete. There are three types of index
> > > > AM just inside postgres code. An idea I came up with is that we can
> > > > control parallel vacuum and parallel cleanup separately.  That is,
> > > > adding a variable amcanparallelcleanup and we can do parallel cleanup
> > > > on only indexes of which amcanparallelcleanup is true.
> > > >
>
> This is what I mentioned in my email as a second option (whether to
> expose via IndexAM).  I am not sure if we can have a new variable just
> for this.
>
> > > > IndexBulkDelete
> > > > can be stored locally if both amcanparallelvacuum and
> > > > amcanparallelcleanup of an index are false because only the leader
> > > > process deals with such indexes. Otherwise we need to store it in DSM
> > > > as always.
> > > >
> > > IIUC,  amcanparallelcleanup will be true for those indexes which does
> > > heavy work during cleanup irrespective of whether bulkdelete is called
> > > or not e.g. gin?
> >
> > Yes, I guess that gin and brin set amcanparallelcleanup to true (gin
> > might set amcanparallevacuum to true as well).
> >
> > >  If so, along with an amcanparallelcleanup flag, we
> > > need to consider vacrelstats->num_index_scans right? So if
> > > vacrelstats->num_index_scans == 0 then we need to launch parallel
> > > worker for all the indexes who support amcanparallelvacuum and if
> > > vacrelstats->num_index_scans > 0 then only for those who has
> > > amcanparallelcleanup as true.
> >
> > Yes, you're right. But this won't work fine for brin indexes who don't
> > want to participate in parallel vacuum but always want to participate
> > in parallel cleanup.
> >
> > After more thoughts, I think we can have a ternary value: never,
> > always, once. If it's 'never' the index never participates in parallel
> > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> > index always participates regardless of vacrelstats->num_index_scan. I
> > guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> > index participates in parallel cleanup only when it's the first time
> > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> > spgist use 'once'.
> >
>
> I think this 'once' option is confusing especially because it also
> depends on 'num_index_scans' which the IndexAM has no control over.
> It might be that the option name is not good, but I am not sure.
> Another thing is that for brin indexes, we don't want bulkdelete to
> participate in parallelism.

I thought brin should set amcanparallelvacuum is false and
amcanparallelcleanup is 'always'.

> Do we want to have separate variables for
> ambulkdelete and amvacuumcleanup which decides whether the particular
> phase can be done in parallel?

You mean adding variables to ambulkdelete and amvacuumcleanup as
function arguments? If so isn't it too late to tell the leader whether
the particular pchase can be done in parallel?

> Another possibility could be to just
> have one variable (say uint16 amparallelvacuum) which will tell us all
> the options but I don't think that will be a popular approach
> considering all the other methods and variables exposed.  What do you
> think?

Adding only one variable that can have flags would also be a good
idea, instead of having multiple variables for each option. For
instance FDW API uses such interface (see eflags of BeginForeignScan).

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Tue, Nov 12, 2019 at 7:43 AM Masahiko Sawada
<> wrote:
>
> On Mon, 11 Nov 2019 at 19:29, Amit Kapila <> wrote:
> >
> > On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
> > <> wrote:
> > >
> > > After more thoughts, I think we can have a ternary value: never,
> > > always, once. If it's 'never' the index never participates in parallel
> > > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> > > index always participates regardless of vacrelstats->num_index_scan. I
> > > guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> > > index participates in parallel cleanup only when it's the first time
> > > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> > > spgist use 'once'.
> > >
> >
> > I think this 'once' option is confusing especially because it also
> > depends on 'num_index_scans' which the IndexAM has no control over.
> > It might be that the option name is not good, but I am not sure.
> > Another thing is that for brin indexes, we don't want bulkdelete to
> > participate in parallelism.
>
> I thought brin should set amcanparallelvacuum is false and
> amcanparallelcleanup is 'always'.
>

In that case, it is better to name the variable as amcanparallelbulkdelete.

> > Do we want to have separate variables for
> > ambulkdelete and amvacuumcleanup which decides whether the particular
> > phase can be done in parallel?
>
> You mean adding variables to ambulkdelete and amvacuumcleanup as
> function arguments?
>

No, I mean separate variables amcanparallelbulkdelete (bool) and
amcanparallelvacuumcleanup (unit16) variables.

>
> > Another possibility could be to just
> > have one variable (say uint16 amparallelvacuum) which will tell us all
> > the options but I don't think that will be a popular approach
> > considering all the other methods and variables exposed.  What do you
> > think?
>
> Adding only one variable that can have flags would also be a good
> idea, instead of having multiple variables for each option. For
> instance FDW API uses such interface (see eflags of BeginForeignScan).
>

Yeah, maybe something like amparallelvacuumoptions.  The options can be:

VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
vacuumcleanup) can't be performed in parallel
VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
performed in parallel (hash index will set this flag)
VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
flag)
VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
gin, gist, spgist, bloom will set this flag)
VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
parallel even if bulkdelete is already performed (Indexes gin, brin,
and bloom will set this flag)

Does something like this make sense?   If we all agree on this, then I
think we can summarize the part of the discussion related to this API
and get feedback from a broader audience.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <> wrote:
>
> On Tue, Nov 12, 2019 at 7:43 AM Masahiko Sawada
> <> wrote:
> >
> > On Mon, 11 Nov 2019 at 19:29, Amit Kapila <> wrote:
> > >
> > > On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
> > > <> wrote:
> > > >
> > > > After more thoughts, I think we can have a ternary value: never,
> > > > always, once. If it's 'never' the index never participates in parallel
> > > > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> > > > index always participates regardless of vacrelstats->num_index_scan. I
> > > > guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> > > > index participates in parallel cleanup only when it's the first time
> > > > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> > > > spgist use 'once'.
> > > >
> > >
> > > I think this 'once' option is confusing especially because it also
> > > depends on 'num_index_scans' which the IndexAM has no control over.
> > > It might be that the option name is not good, but I am not sure.
> > > Another thing is that for brin indexes, we don't want bulkdelete to
> > > participate in parallelism.
> >
> > I thought brin should set amcanparallelvacuum is false and
> > amcanparallelcleanup is 'always'.
> >
>
> In that case, it is better to name the variable as amcanparallelbulkdelete.
>
> > > Do we want to have separate variables for
> > > ambulkdelete and amvacuumcleanup which decides whether the particular
> > > phase can be done in parallel?
> >
> > You mean adding variables to ambulkdelete and amvacuumcleanup as
> > function arguments?
> >
>
> No, I mean separate variables amcanparallelbulkdelete (bool) and
> amcanparallelvacuumcleanup (unit16) variables.
>
> >
> > > Another possibility could be to just
> > > have one variable (say uint16 amparallelvacuum) which will tell us all
> > > the options but I don't think that will be a popular approach
> > > considering all the other methods and variables exposed.  What do you
> > > think?
> >
> > Adding only one variable that can have flags would also be a good
> > idea, instead of having multiple variables for each option. For
> > instance FDW API uses such interface (see eflags of BeginForeignScan).
> >
>
> Yeah, maybe something like amparallelvacuumoptions.  The options can be:
>
> VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> vacuumcleanup) can't be performed in parallel
> VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> performed in parallel (hash index will set this flag)

Maybe we don't want this option?  because if 3 or 4 is not set then we
will not do the cleanup in parallel right?

> VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> flag)
> VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> gin, gist, spgist, bloom will set this flag)
> VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> parallel even if bulkdelete is already performed (Indexes gin, brin,
> and bloom will set this flag)
>
> Does something like this make sense?
Yeah, something like that seems better to me.

> If we all agree on this, then I
> think we can summarize the part of the discussion related to this API
> and get feedback from a broader audience.

Make sense.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Mahendra Singh
Дата:
On Mon, 11 Nov 2019 at 16:36, Amit Kapila <> wrote:
>
> On Mon, Nov 11, 2019 at 2:53 PM Mahendra Singh <> wrote:
> >
> >
> > For small indexes also, we gained some performance by parallel vacuum.
> >
>
> Thanks for doing all these tests.  It is clear with this and previous
> tests that this patch has benefit in wide variety of cases.  However,
> we should try to see some worst cases as well.  For example, if there
> are multiple indexes on a table and only one of them is large whereas
> all others are very small say having a few 100 or 1000 rows.
>

Thanks Amit for your comments.

I did some testing on the above suggested lines. Below is the summary:
Test case:(I created 16 indexes but only 1 index is large, other are very small)
create table test(a int, b int, c int, d int, e int, f int, g int, h int);
create index i3 on test (a) where a > 2000 and a < 3000;
create index i4 on test (a) where a > 3000 and a < 4000;
create index i5 on test (a) where a > 4000 and a < 5000;
create index i6 on test (a) where a > 5000 and a < 6000;
create index i7 on test (b) where a < 1000;
create index i8 on test (c) where a < 1000;
create index i9 on test (d) where a < 1000;
create index i10 on test (d) where a < 1000;
create index i11 on test (d) where a < 1000;
create index i12 on test (d) where a < 1000;
create index i13 on test (d) where a < 1000;
create index i14 on test (d) where a < 1000;
create index i15 on test (d) where a < 1000;
create index i16 on test (d) where a < 1000;
insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
delete from test where a %2=0;

case 1: vacuum without using parallel workers.
vacuum test;
228.259 ms

case 2: vacuum with 1 parallel worker.
vacuum (parallel 1) test;
251.725 ms

case 3: vacuum with 3 parallel workers.
vacuum (parallel 3) test;
259.986

From above results, it seems that if indexes are small, then parallel vacuum is not beneficial as compared to normal vacuum.

> Note: Please don't use the top-posting style to reply.  Here, we use
> inline reply.

Okay. I will follow inline reply.

Thanks and Regards
Mahendra Thalor

Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <> wrote:
>
> On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <> wrote:
> >
> > On Tue, Nov 12, 2019 at 7:43 AM Masahiko Sawada
> > <> wrote:
> > >
> > > On Mon, 11 Nov 2019 at 19:29, Amit Kapila <> wrote:
> > > >
> > > > On Mon, Nov 11, 2019 at 12:26 PM Masahiko Sawada
> > > > <> wrote:
> > > > >
> > > > > After more thoughts, I think we can have a ternary value: never,
> > > > > always, once. If it's 'never' the index never participates in parallel
> > > > > cleanup. I guess hash indexes use 'never'. Next, if it's 'always' the
> > > > > index always participates regardless of vacrelstats->num_index_scan. I
> > > > > guess gin, brin and bloom use 'always'. Finally if it's 'once' the
> > > > > index participates in parallel cleanup only when it's the first time
> > > > > (that is, vacrelstats->num_index_scan == 0), I guess btree, gist and
> > > > > spgist use 'once'.
> > > > >
> > > >
> > > > I think this 'once' option is confusing especially because it also
> > > > depends on 'num_index_scans' which the IndexAM has no control over.
> > > > It might be that the option name is not good, but I am not sure.
> > > > Another thing is that for brin indexes, we don't want bulkdelete to
> > > > participate in parallelism.
> > >
> > > I thought brin should set amcanparallelvacuum is false and
> > > amcanparallelcleanup is 'always'.
> > >
> >
> > In that case, it is better to name the variable as amcanparallelbulkdelete.
> >
> > > > Do we want to have separate variables for
> > > > ambulkdelete and amvacuumcleanup which decides whether the particular
> > > > phase can be done in parallel?
> > >
> > > You mean adding variables to ambulkdelete and amvacuumcleanup as
> > > function arguments?
> > >
> >
> > No, I mean separate variables amcanparallelbulkdelete (bool) and
> > amcanparallelvacuumcleanup (unit16) variables.
> >
> > >
> > > > Another possibility could be to just
> > > > have one variable (say uint16 amparallelvacuum) which will tell us all
> > > > the options but I don't think that will be a popular approach
> > > > considering all the other methods and variables exposed.  What do you
> > > > think?
> > >
> > > Adding only one variable that can have flags would also be a good
> > > idea, instead of having multiple variables for each option. For
> > > instance FDW API uses such interface (see eflags of BeginForeignScan).
> > >
> >
> > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> >
> > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > vacuumcleanup) can't be performed in parallel
> > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > performed in parallel (hash index will set this flag)
>
> Maybe we don't want this option?  because if 3 or 4 is not set then we
> will not do the cleanup in parallel right?
>
> > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > flag)
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > gin, gist, spgist, bloom will set this flag)
> > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > and bloom will set this flag)
> >
> > Does something like this make sense?

3 and 4 confused me because 4 also looks conditional. How about having
two flags instead: one for doing parallel cleanup when not performed
yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)? That way, we
can have flags as follows and index AM chooses two flags, one from the
first two flags for bulk deletion and another from next three flags
for cleanup.

VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4

> Yeah, something like that seems better to me.
>
> > If we all agree on this, then I
> > think we can summarize the part of the discussion related to this API
> > and get feedback from a broader audience.
>
> Make sense.

+1

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <> wrote:
>
> On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> > I realized that v31-0006 patch doesn't work fine so I've attached the
> > updated version patch that also incorporated some comments I got so
> > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > test the total delay time.
> >
> While reviewing the 0002, I got one doubt related to how we are
> dividing the maintainance_work_mem
>
> +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> +{
> + /* Compute the new maitenance_work_mem value for index vacuuming */
> + lvshared->maintenance_work_mem_worker =
> + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> maintenance_work_mem;
> +}
> Is it fair to just consider the number of indexes which use
> maintenance_work_mem?  Or we need to consider the number of worker as
> well.  My point is suppose there are 10 indexes which will use the
> maintenance_work_mem but we are launching just 2 workers then what is
> the point in dividing the maintenance_work_mem by 10.
>
> IMHO the calculation should be like this
> lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> maintenance_work_mem;
>
> Am I missing something?

No, I think you're right. On the other hand I think that dividing it
by the number of indexes that will use the mantenance_work_mem makes
sense when parallel degree > the number of such indexes. Suppose the
table has 2 indexes and there are 10 workers then we should divide the
maintenance_work_mem by 2 rather than 10 because it's possible that at
most 2 indexes that uses the maintenance_work_mem are processed in
parallel at a time.

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
<> wrote:
>
> On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <> wrote:
> >
> > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <> wrote:
> > >
> > > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> > >
> > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > vacuumcleanup) can't be performed in parallel
> > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > performed in parallel (hash index will set this flag)
> >
> > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > will not do the cleanup in parallel right?
> >

Yeah, but it is better to be explicit about this.

> > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > flag)
> > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > gin, gist, spgist, bloom will set this flag)
> > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > and bloom will set this flag)
> > >
> > > Does something like this make sense?
>
> 3 and 4 confused me because 4 also looks conditional. How about having
> two flags instead: one for doing parallel cleanup when not performed
> yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
>

Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
what makes you think 4 is conditional.

> That way, we
> can have flags as follows and index AM chooses two flags, one from the
> first two flags for bulk deletion and another from next three flags
> for cleanup.
>
> VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
>

This also looks reasonable, but if there is an index that doesn't want
to support a parallel vacuum, it needs to set multiple flags.

> > Yeah, something like that seems better to me.
> >
> > > If we all agree on this, then I
> > > think we can summarize the part of the discussion related to this API
> > > and get feedback from a broader audience.
> >
> > Make sense.
>
> +1
>

Okay, then I will write a separate email for this topic.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
<> wrote:
>
> On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <> wrote:
> >
> > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > updated version patch that also incorporated some comments I got so
> > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > test the total delay time.
> > >
> > While reviewing the 0002, I got one doubt related to how we are
> > dividing the maintainance_work_mem
> >
> > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > +{
> > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > + lvshared->maintenance_work_mem_worker =
> > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > maintenance_work_mem;
> > +}
> > Is it fair to just consider the number of indexes which use
> > maintenance_work_mem?  Or we need to consider the number of worker as
> > well.  My point is suppose there are 10 indexes which will use the
> > maintenance_work_mem but we are launching just 2 workers then what is
> > the point in dividing the maintenance_work_mem by 10.
> >
> > IMHO the calculation should be like this
> > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > maintenance_work_mem;
> >
> > Am I missing something?
>
> No, I think you're right. On the other hand I think that dividing it
> by the number of indexes that will use the mantenance_work_mem makes
> sense when parallel degree > the number of such indexes. Suppose the
> table has 2 indexes and there are 10 workers then we should divide the
> maintenance_work_mem by 2 rather than 10 because it's possible that at
> most 2 indexes that uses the maintenance_work_mem are processed in
> parallel at a time.
>
Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, 12 Nov 2019 at 20:11, Amit Kapila <> wrote:
>
> On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
> <> wrote:
> >
> > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <> wrote:
> > > >
> > > > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> > > >
> > > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > > vacuumcleanup) can't be performed in parallel
> > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > > performed in parallel (hash index will set this flag)
> > >
> > > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > > will not do the cleanup in parallel right?
> > >
>
> Yeah, but it is better to be explicit about this.

VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing? I think brin indexes
will use this flag. It will end up with
(VACUUM_OPTION_NO_PARALLEL_CLEANUP |
VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to
VACUUM_OPTION_NO_PARALLEL, though.

>
> > > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > flag)
> > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > > gin, gist, spgist, bloom will set this flag)
> > > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > and bloom will set this flag)
> > > >
> > > > Does something like this make sense?
> >
> > 3 and 4 confused me because 4 also looks conditional. How about having
> > two flags instead: one for doing parallel cleanup when not performed
> > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
> >
>
> Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
> what makes you think 4 is conditional.

Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets
4 it doesn't need to set 3 because 4 means always doing cleanup in
parallel.

>
> > That way, we
> > can have flags as follows and index AM chooses two flags, one from the
> > first two flags for bulk deletion and another from next three flags
> > for cleanup.
> >
> > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
> >
>
> This also looks reasonable, but if there is an index that doesn't want
> to support a parallel vacuum, it needs to set multiple flags.

Right. It would be better to use uint16 as two uint8. I mean that if
first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if
next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags
could be followings:

VACUUM_OPTION_PARALLEL_BULKDEL 0x0001
VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100
VACUUM_OPTION_PARALLEL_CLEANUP 0x0200

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <> wrote:
>
> On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> <> wrote:
> >
> > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <> wrote:
> > >
> > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > updated version patch that also incorporated some comments I got so
> > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > test the total delay time.
> > > >
> > > While reviewing the 0002, I got one doubt related to how we are
> > > dividing the maintainance_work_mem
> > >
> > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > +{
> > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > + lvshared->maintenance_work_mem_worker =
> > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > maintenance_work_mem;
> > > +}
> > > Is it fair to just consider the number of indexes which use
> > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > well.  My point is suppose there are 10 indexes which will use the
> > > maintenance_work_mem but we are launching just 2 workers then what is
> > > the point in dividing the maintenance_work_mem by 10.
> > >
> > > IMHO the calculation should be like this
> > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > maintenance_work_mem;
> > >
> > > Am I missing something?
> >
> > No, I think you're right. On the other hand I think that dividing it
> > by the number of indexes that will use the mantenance_work_mem makes
> > sense when parallel degree > the number of such indexes. Suppose the
> > table has 2 indexes and there are 10 workers then we should divide the
> > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > most 2 indexes that uses the maintenance_work_mem are processed in
> > parallel at a time.
> >
> Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).

Thanks! I'll fix it in the next version patch.

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada
<> wrote:
>
> On Tue, 12 Nov 2019 at 20:11, Amit Kapila <> wrote:
> >
> > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
> > <> wrote:
> > >
> > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <> wrote:
> > > >
> > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <> wrote:
> > > > >
> > > > > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> > > > >
> > > > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > > > vacuumcleanup) can't be performed in parallel
> > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > > > performed in parallel (hash index will set this flag)
> > > >
> > > > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > > > will not do the cleanup in parallel right?
> > > >
> >
> > Yeah, but it is better to be explicit about this.
>
> VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing?
>

I am not sure if that is required.

> I think brin indexes
> will use this flag.
>

Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and
it should work.

> It will end up with
> (VACUUM_OPTION_NO_PARALLEL_CLEANUP |
> VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to
> VACUUM_OPTION_NO_PARALLEL, though.
>
> >
> > > > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > flag)
> > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > > > gin, gist, spgist, bloom will set this flag)
> > > > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > and bloom will set this flag)
> > > > >
> > > > > Does something like this make sense?
> > >
> > > 3 and 4 confused me because 4 also looks conditional. How about having
> > > two flags instead: one for doing parallel cleanup when not performed
> > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
> > >
> >
> > Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
> > what makes you think 4 is conditional.
>
> Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets
> 4 it doesn't need to set 3 because 4 means always doing cleanup in
> parallel.
>

Yeah, that makes sense.  They can just set 4.

> >
> > > That way, we
> > > can have flags as follows and index AM chooses two flags, one from the
> > > first two flags for bulk deletion and another from next three flags
> > > for cleanup.
> > >
> > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
> > >
> >
> > This also looks reasonable, but if there is an index that doesn't want
> > to support a parallel vacuum, it needs to set multiple flags.
>
> Right. It would be better to use uint16 as two uint8. I mean that if
> first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if
> next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags
> could be followings:
>
> VACUUM_OPTION_PARALLEL_BULKDEL 0x0001
> VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100
> VACUUM_OPTION_PARALLEL_CLEANUP 0x0200
>

Hmm, I think we should define these flags in the most simple way.
Your previous proposal sounds okay to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Tue, 12 Nov 2019 at 22:33, Amit Kapila <> wrote:
>
> On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada
> <> wrote:
> >
> > On Tue, 12 Nov 2019 at 20:11, Amit Kapila <> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
> > > <> wrote:
> > > >
> > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <> wrote:
> > > > >
> > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <> wrote:
> > > > > >
> > > > > > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> > > > > >
> > > > > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > > > > vacuumcleanup) can't be performed in parallel
> > > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > > > > performed in parallel (hash index will set this flag)
> > > > >
> > > > > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > > > > will not do the cleanup in parallel right?
> > > > >
> > >
> > > Yeah, but it is better to be explicit about this.
> >
> > VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing?
> >
>
> I am not sure if that is required.
>
> > I think brin indexes
> > will use this flag.
> >
>
> Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and
> it should work.
>
> > It will end up with
> > (VACUUM_OPTION_NO_PARALLEL_CLEANUP |
> > VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to
> > VACUUM_OPTION_NO_PARALLEL, though.
> >
> > >
> > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > > flag)
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > > > > gin, gist, spgist, bloom will set this flag)
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > and bloom will set this flag)
> > > > > >
> > > > > > Does something like this make sense?
> > > >
> > > > 3 and 4 confused me because 4 also looks conditional. How about having
> > > > two flags instead: one for doing parallel cleanup when not performed
> > > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> > > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
> > > >
> > >
> > > Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
> > > what makes you think 4 is conditional.
> >
> > Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets
> > 4 it doesn't need to set 3 because 4 means always doing cleanup in
> > parallel.
> >
>
> Yeah, that makes sense.  They can just set 4.

Okay,

>
> > >
> > > > That way, we
> > > > can have flags as follows and index AM chooses two flags, one from the
> > > > first two flags for bulk deletion and another from next three flags
> > > > for cleanup.
> > > >
> > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> > > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
> > > >
> > >
> > > This also looks reasonable, but if there is an index that doesn't want
> > > to support a parallel vacuum, it needs to set multiple flags.
> >
> > Right. It would be better to use uint16 as two uint8. I mean that if
> > first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if
> > next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags
> > could be followings:
> >
> > VACUUM_OPTION_PARALLEL_BULKDEL 0x0001
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100
> > VACUUM_OPTION_PARALLEL_CLEANUP 0x0200
> >
>
> Hmm, I think we should define these flags in the most simple way.
> Your previous proposal sounds okay to me.

Okay. As you mentioned before, my previous proposal won't work for
existing index AMs that don't set amparallelvacuumoptions. But since we
have amcanparallelvacuum which is false by default I think we don't
need to worry about backward compatibility problem. The existing index
AM will use neither parallel bulk-deletion nor parallel cleanup by
default. When it wants to support parallel vacuum they will set
amparallelvacuumoptions as well as amcanparallelvacuum.

I'll try to use my previous proposal and check it. If something wrong
we can back to your proposal or others.


--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Wed, Nov 13, 2019 at 6:53 AM Masahiko Sawada
<> wrote:
>
> On Tue, 12 Nov 2019 at 22:33, Amit Kapila <> wrote:
> >
> >
> > Hmm, I think we should define these flags in the most simple way.
> > Your previous proposal sounds okay to me.
>
> Okay. As you mentioned before, my previous proposal won't work for
> existing index AMs that don't set amparallelvacuumoptions.
>

You mean to say it won't work because it has to set multiple flags
which means that if IndexAm user doesn't set the value of
amparallelvacuumoptions then it won't work?

> But since we
> have amcanparallelvacuum which is false by default I think we don't
> need to worry about backward compatibility problem. The existing index
> AM will use neither parallel bulk-deletion nor parallel cleanup by
> default. When it wants to support parallel vacuum they will set
> amparallelvacuumoptions as well as amcanparallelvacuum.
>

Hmm, I was not thinking of multiple variables rather only one
variable. The default value should indicate that IndexAm doesn't
support a parallel vacuum.  It might be that we need to do it the way
I originally proposed the different values of amparallelvacuumoptions
or maybe some variant of it where the default value can clearly say
that IndexAm doesn't support a parallel vacuum.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Tue, Nov 12, 2019 at 7:03 PM Amit Kapila <> wrote:
>
> On Tue, Nov 12, 2019 at 5:30 PM Masahiko Sawada
> <> wrote:
> >
> > On Tue, 12 Nov 2019 at 20:11, Amit Kapila <> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 3:39 PM Masahiko Sawada
> > > <> wrote:
> > > >
> > > > On Tue, 12 Nov 2019 at 18:26, Dilip Kumar <> wrote:
> > > > >
> > > > > On Tue, Nov 12, 2019 at 2:25 PM Amit Kapila <> wrote:
> > > > > >
> > > > > > Yeah, maybe something like amparallelvacuumoptions.  The options can be:
> > > > > >
> > > > > > VACUUM_OPTION_NO_PARALLEL   0 # vacuum (neither bulkdelete nor
> > > > > > vacuumcleanup) can't be performed in parallel
> > > > > > VACUUM_OPTION_NO_PARALLEL_CLEANUP  1 # vacuumcleanup cannot be
> > > > > > performed in parallel (hash index will set this flag)
> > > > >
> > > > > Maybe we don't want this option?  because if 3 or 4 is not set then we
> > > > > will not do the cleanup in parallel right?
> > > > >
> > >
> > > Yeah, but it is better to be explicit about this.
> >
> > VACUUM_OPTION_NO_PARALLEL_BULKDEL is missing?
> >
>
> I am not sure if that is required.
>
> > I think brin indexes
> > will use this flag.
> >
>
> Brin index can set VACUUM_OPTION_PARALLEL_CLEANUP in my proposal and
> it should work.

IIUC, VACUUM_OPTION_PARALLEL_CLEANUP means no parallel bulk delete and
always parallel cleanup?  I am not sure whether this is the best way
because for the cleanup option we are being explicit for each option
i.e PARALLEL_CLEANUP, NO_PARALLEL_CLEANUP, etc, then why not the same
for the bulk delete.  I mean why don't we keep both PARALLEL_BULKDEL
and NO_PARALLEL_BULKDEL?

>
> > It will end up with
> > (VACUUM_OPTION_NO_PARALLEL_CLEANUP |
> > VACUUM_OPTION_NO_PARALLEL_BULKDEL) is equivalent to
> > VACUUM_OPTION_NO_PARALLEL, though.
> >
> > >
> > > > > > VACUUM_OPTION_PARALLEL_BULKDEL   2 # bulkdelete can be done in
> > > > > > parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> > > > > > flag)
> > > > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP  3 # vacuumcleanup can be done in
> > > > > > parallel if bulkdelete is not performed (Indexes nbtree, brin, hash,
> > > > > > gin, gist, spgist, bloom will set this flag)
> > > > > > VACUUM_OPTION_PARALLEL_CLEANUP  4 # vacuumcleanup can be done in
> > > > > > parallel even if bulkdelete is already performed (Indexes gin, brin,
> > > > > > and bloom will set this flag)
> > > > > >
> > > > > > Does something like this make sense?
> > > >
> > > > 3 and 4 confused me because 4 also looks conditional. How about having
> > > > two flags instead: one for doing parallel cleanup when not performed
> > > > yet (VACUUM_OPTION_PARALLEL_COND_CLEANUP) and another one for doing
> > > > always parallel cleanup (VACUUM_OPTION_PARALLEL_CLEANUP)?
> > > >
> > >
> > > Hmm, this is exactly what I intend to say with 3 and 4.  I am not sure
> > > what makes you think 4 is conditional.
> >
> > Hmm so why gin and bloom will set 3 and 4 flags? I thought if it sets
> > 4 it doesn't need to set 3 because 4 means always doing cleanup in
> > parallel.
> >
>
> Yeah, that makes sense.  They can just set 4.
>
> > >
> > > > That way, we
> > > > can have flags as follows and index AM chooses two flags, one from the
> > > > first two flags for bulk deletion and another from next three flags
> > > > for cleanup.
> > > >
> > > > VACUUM_OPTION_PARALLEL_NO_BULKDEL 1 << 0
> > > > VACUUM_OPTION_PARALLEL_BULKDEL 1 << 1
> > > > VACUUM_OPTION_PARALLEL_NO_CLEANUP 1 << 2
> > > > VACUUM_OPTION_PARALLEL_COND_CLEANUP 1 << 3
> > > > VACUUM_OPTION_PARALLEL_CLEANUP 1 << 4
> > > >
> > >
> > > This also looks reasonable, but if there is an index that doesn't want
> > > to support a parallel vacuum, it needs to set multiple flags.
> >
> > Right. It would be better to use uint16 as two uint8. I mean that if
> > first 8 bits are 0 it means VACUUM_OPTION_PARALLEL_NO_BULKDEL and if
> > next 8 bits are 0 means VACUUM_OPTION_PARALLEL_NO_CLEANUP. Other flags
> > could be followings:
> >
> > VACUUM_OPTION_PARALLEL_BULKDEL 0x0001
> > VACUUM_OPTION_PARALLEL_COND_CLEANUP 0x0100
> > VACUUM_OPTION_PARALLEL_CLEANUP 0x0200
> >
>
> Hmm, I think we should define these flags in the most simple way.
> Your previous proposal sounds okay to me.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Wed, 13 Nov 2019 at 11:38, Amit Kapila <> wrote:
>
> On Wed, Nov 13, 2019 at 6:53 AM Masahiko Sawada
> <> wrote:
> >
> > On Tue, 12 Nov 2019 at 22:33, Amit Kapila <> wrote:
> > >
> > >
> > > Hmm, I think we should define these flags in the most simple way.
> > > Your previous proposal sounds okay to me.
> >
> > Okay. As you mentioned before, my previous proposal won't work for
> > existing index AMs that don't set amparallelvacuumoptions.
> >
>
> You mean to say it won't work because it has to set multiple flags
> which means that if IndexAm user doesn't set the value of
> amparallelvacuumoptions then it won't work?

Yes. In my previous proposal every index AMs need to set two flags.

>
> > But since we
> > have amcanparallelvacuum which is false by default I think we don't
> > need to worry about backward compatibility problem. The existing index
> > AM will use neither parallel bulk-deletion nor parallel cleanup by
> > default. When it wants to support parallel vacuum they will set
> > amparallelvacuumoptions as well as amcanparallelvacuum.
> >
>
> Hmm, I was not thinking of multiple variables rather only one
> variable. The default value should indicate that IndexAm doesn't
> support a parallel vacuum.

Yes.

> It might be that we need to do it the way
> I originally proposed the different values of amparallelvacuumoptions
> or maybe some variant of it where the default value can clearly say
> that IndexAm doesn't support a parallel vacuum.

Okay. After more thoughts on your original proposal, what I get
confused on your proposal is that there are two types of flags that
enable and disable options. Looking at 2, 3 and 4, it looks like all
options are disabled by default and setting these flags means to
enable them. On the other hand looking at 1, it looks like these
options are enabled by default and setting the flag means to disable
it. 0 makes sense to me. So how about having 0, 2, 3 and 4?

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
<> wrote:
>
> On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <> wrote:
> >
> > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > <> wrote:
> > >
> > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <> wrote:
> > > >
> > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > updated version patch that also incorporated some comments I got so
> > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > test the total delay time.
> > > > >
> > > > While reviewing the 0002, I got one doubt related to how we are
> > > > dividing the maintainance_work_mem
> > > >
> > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > > +{
> > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > + lvshared->maintenance_work_mem_worker =
> > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > maintenance_work_mem;
> > > > +}
> > > > Is it fair to just consider the number of indexes which use
> > > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > > well.  My point is suppose there are 10 indexes which will use the
> > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > the point in dividing the maintenance_work_mem by 10.
> > > >
> > > > IMHO the calculation should be like this
> > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > > maintenance_work_mem;
> > > >
> > > > Am I missing something?
> > >
> > > No, I think you're right. On the other hand I think that dividing it
> > > by the number of indexes that will use the mantenance_work_mem makes
> > > sense when parallel degree > the number of such indexes. Suppose the
> > > table has 2 indexes and there are 10 workers then we should divide the
> > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > parallel at a time.
> > >
> > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).
>
> Thanks! I'll fix it in the next version patch.
>
One more comment.

+lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
+ int nindexes, IndexBulkDeleteResult **stats,
+ LVParallelState *lps)
+{
+ ....

+ if (ParallelVacuumIsActive(lps))
+ {

+
+ lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
+ stats, lps);
+
+ }
+
+ for (idx = 0; idx < nindexes; idx++)
+ {
+ /*
+ * Skip indexes that we have already vacuumed during parallel index
+ * vacuuming.
+ */
+ if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
+ continue;
+
+ lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
+   vacrelstats->old_live_tuples);
+ }
+}

In this function, if ParallelVacuumIsActive, we perform the parallel
vacuum for all the index for which parallel vacuum is supported and
once that is over we finish vacuuming remaining indexes for which
parallel vacuum is not supported.  But, my question is that inside
lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
to finish their job then only we start with the sequential vacuuming
shouldn't we start that immediately as soon as the leader
participation is over in the parallel vacuum?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Wed, Nov 13, 2019 at 8:34 AM Masahiko Sawada
<> wrote:
>
> On Wed, 13 Nov 2019 at 11:38, Amit Kapila <> wrote:
> >
>
> > It might be that we need to do it the way
> > I originally proposed the different values of amparallelvacuumoptions
> > or maybe some variant of it where the default value can clearly say
> > that IndexAm doesn't support a parallel vacuum.
>
> Okay. After more thoughts on your original proposal, what I get
> confused on your proposal is that there are two types of flags that
> enable and disable options. Looking at 2, 3 and 4, it looks like all
> options are disabled by default and setting these flags means to
> enable them. On the other hand looking at 1, it looks like these
> options are enabled by default and setting the flag means to disable
> it. 0 makes sense to me. So how about having 0, 2, 3 and 4?
>

Yeah, 0,2,3 and 4 sounds reasonable to me.  Earlier, Dilip also got
confused with option 1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Wed, Nov 13, 2019 at 9:12 AM Dilip Kumar <> wrote:
>
> On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
> <> wrote:
> >
> > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > > <> wrote:
> > > >
> > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <> wrote:
> > > > >
> > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > > updated version patch that also incorporated some comments I got so
> > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > > test the total delay time.
> > > > > >
> > > > > While reviewing the 0002, I got one doubt related to how we are
> > > > > dividing the maintainance_work_mem
> > > > >
> > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > > > +{
> > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > > + lvshared->maintenance_work_mem_worker =
> > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > > maintenance_work_mem;
> > > > > +}
> > > > > Is it fair to just consider the number of indexes which use
> > > > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > > > well.  My point is suppose there are 10 indexes which will use the
> > > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > > the point in dividing the maintenance_work_mem by 10.
> > > > >
> > > > > IMHO the calculation should be like this
> > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > > > maintenance_work_mem;
> > > > >
> > > > > Am I missing something?
> > > >
> > > > No, I think you're right. On the other hand I think that dividing it
> > > > by the number of indexes that will use the mantenance_work_mem makes
> > > > sense when parallel degree > the number of such indexes. Suppose the
> > > > table has 2 indexes and there are 10 workers then we should divide the
> > > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > > parallel at a time.
> > > >
> > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).
> >
> > Thanks! I'll fix it in the next version patch.
> >
> One more comment.
>
> +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps)
> +{
> + ....
>
> + if (ParallelVacuumIsActive(lps))
> + {
>
> +
> + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> + stats, lps);
> +
> + }
> +
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + /*
> + * Skip indexes that we have already vacuumed during parallel index
> + * vacuuming.
> + */
> + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
> + continue;
> +
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> +   vacrelstats->old_live_tuples);
> + }
> +}
>
> In this function, if ParallelVacuumIsActive, we perform the parallel
> vacuum for all the index for which parallel vacuum is supported and
> once that is over we finish vacuuming remaining indexes for which
> parallel vacuum is not supported.  But, my question is that inside
> lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> to finish their job then only we start with the sequential vacuuming
> shouldn't we start that immediately as soon as the leader
> participation is over in the parallel vacuum?
>

+ /*
+ * Since parallel workers cannot access data in temporary tables, parallel
+ * vacuum is not allowed for temporary relation.
+ */
+ if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
+ {
+ ereport(WARNING,
+ (errmsg("skipping vacuum on \"%s\" --- cannot vacuum temporary
tables in parallel",
+ RelationGetRelationName(onerel))));
+ relation_close(onerel, lmode);
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+ /* It's OK to proceed with ANALYZE on this table */
+ return true;
+ }
+

If we can not support the parallel vacuum for the temporary table then
shouldn't we fall back to the normal vacuum instead of skipping the
table.  I think it's not fair that if the user has given system-wide
parallel vacuum then all the temp table will be skipped and not at all
vacuumed then user need to again perform normal vacuum on those
tables.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Wed, Nov 13, 2019 at 9:48 AM Amit Kapila <> wrote:
>
> Yeah, 0,2,3 and 4 sounds reasonable to me.  Earlier, Dilip also got
> confused with option 1.
>

Let me try to summarize the discussion on this point and see if others
have any opinion on this matter.

We need a way to allow IndexAm to specify whether it can participate
in a parallel vacuum.  As we know there are two phases of
index-vacuum, bulkdelete and vacuumcleanup and in many cases, the
bulkdelete performs the main deletion work and then vacuumcleanup just
returns index statistics. So, for such cases, we don't want the second
phase to be performed by a parallel vacuum worker.  Now, if the
bulkdelete phase is not performed, then vacuumcleanup can process the
entire index in which case it is better to do that phase via parallel
worker.

OTOH, in some cases vacuumcleanup takes another pass over-index to
reclaim empty pages and update record the same in FSM even if
bulkdelete is performed.  This happens in gin and bloom indexes.
Then, we have an index where we do all the work in cleanup phase like
in the case of brin indexes.  Now, for this category of indexes, we
want vacuumcleanup phase to be also performed by a parallel worker.

In short different indexes have different requirements for which phase
of index vacuum can be performed in parallel.  Just to be clear, we
can't perform both the phases (bulkdelete and cleanup) in one-go as
bulk-delete can happen multiple times on a large index whereas
vacuumcleanup is done once at the end.

Based on these needs, we came up with a way to allow users to specify
this information for IndexAm's. Basically, Indexam will expose a
variable amparallelvacuumoptions which can have below options

VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
vacuumcleanup) can't be performed in parallel
VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
flag)
VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
gin, gist,
spgist, bloom will set this flag)
VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
parallel even if bulkdelete is already performed (Indexes gin, brin,
and bloom will set this flag)

We have discussed to expose this information via two variables but the
above seems like a better idea to all the people involved.

Any suggestions?  Anyone thinks this is not the right way to expose
this information or there is no need to expose this information or
they have a better idea for this?

Sawada-San, Dilip, feel free to correct me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <> wrote:
>
> On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
> <> wrote:
> >
> > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > > <> wrote:
> > > >
> > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <> wrote:
> > > > >
> > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > > updated version patch that also incorporated some comments I got so
> > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > > test the total delay time.
> > > > > >
> > > > > While reviewing the 0002, I got one doubt related to how we are
> > > > > dividing the maintainance_work_mem
> > > > >
> > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > > > +{
> > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > > + lvshared->maintenance_work_mem_worker =
> > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > > maintenance_work_mem;
> > > > > +}
> > > > > Is it fair to just consider the number of indexes which use
> > > > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > > > well.  My point is suppose there are 10 indexes which will use the
> > > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > > the point in dividing the maintenance_work_mem by 10.
> > > > >
> > > > > IMHO the calculation should be like this
> > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > > > maintenance_work_mem;
> > > > >
> > > > > Am I missing something?
> > > >
> > > > No, I think you're right. On the other hand I think that dividing it
> > > > by the number of indexes that will use the mantenance_work_mem makes
> > > > sense when parallel degree > the number of such indexes. Suppose the
> > > > table has 2 indexes and there are 10 workers then we should divide the
> > > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > > parallel at a time.
> > > >
> > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).
> >
> > Thanks! I'll fix it in the next version patch.
> >
> One more comment.
>
> +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
> + int nindexes, IndexBulkDeleteResult **stats,
> + LVParallelState *lps)
> +{
> + ....
>
> + if (ParallelVacuumIsActive(lps))
> + {
>
> +
> + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> + stats, lps);
> +
> + }
> +
> + for (idx = 0; idx < nindexes; idx++)
> + {
> + /*
> + * Skip indexes that we have already vacuumed during parallel index
> + * vacuuming.
> + */
> + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
> + continue;
> +
> + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> +   vacrelstats->old_live_tuples);
> + }
> +}
>
> In this function, if ParallelVacuumIsActive, we perform the parallel
> vacuum for all the index for which parallel vacuum is supported and
> once that is over we finish vacuuming remaining indexes for which
> parallel vacuum is not supported.  But, my question is that inside
> lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> to finish their job then only we start with the sequential vacuuming
> shouldn't we start that immediately as soon as the leader
> participation is over in the parallel vacuum?

If we do that, while the leader process is vacuuming indexes that
don't not support parallel vacuum sequentially some workers might be
vacuuming for other indexes. Isn't it a problem? If it's not problem,
I think we can tie up indexes that don't support parallel vacuum to
the leader and do parallel index vacuum.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Tue, Nov 12, 2019 at 3:14 PM Mahendra Singh <> wrote:
>
> On Mon, 11 Nov 2019 at 16:36, Amit Kapila <> wrote:
> >
> > On Mon, Nov 11, 2019 at 2:53 PM Mahendra Singh <> wrote:
> > >
> > >
> > > For small indexes also, we gained some performance by parallel vacuum.
> > >
> >
> > Thanks for doing all these tests.  It is clear with this and previous
> > tests that this patch has benefit in wide variety of cases.  However,
> > we should try to see some worst cases as well.  For example, if there
> > are multiple indexes on a table and only one of them is large whereas
> > all others are very small say having a few 100 or 1000 rows.
> >
>
> Thanks Amit for your comments.
>
> I did some testing on the above suggested lines. Below is the summary:
> Test case:(I created 16 indexes but only 1 index is large, other are very small)
> create table test(a int, b int, c int, d int, e int, f int, g int, h int);
> create index i3 on test (a) where a > 2000 and a < 3000;
> create index i4 on test (a) where a > 3000 and a < 4000;
> create index i5 on test (a) where a > 4000 and a < 5000;
> create index i6 on test (a) where a > 5000 and a < 6000;
> create index i7 on test (b) where a < 1000;
> create index i8 on test (c) where a < 1000;
> create index i9 on test (d) where a < 1000;
> create index i10 on test (d) where a < 1000;
> create index i11 on test (d) where a < 1000;
> create index i12 on test (d) where a < 1000;
> create index i13 on test (d) where a < 1000;
> create index i14 on test (d) where a < 1000;
> create index i15 on test (d) where a < 1000;
> create index i16 on test (d) where a < 1000;
> insert into test select i,i,i,i,i,i,i,i from generate_series(1,1000000) as i;
> delete from test where a %2=0;
>
> case 1: vacuum without using parallel workers.
> vacuum test;
> 228.259 ms
>
> case 2: vacuum with 1 parallel worker.
> vacuum (parallel 1) test;
> 251.725 ms
>
> case 3: vacuum with 3 parallel workers.
> vacuum (parallel 3) test;
> 259.986
>
> From above results, it seems that if indexes are small, then parallel vacuum is not beneficial as compared to normal
vacuum.
>

Right and that is what is expected as well.  However, I think if
somehow disallow very small indexes to use parallel worker, then it
will be better.   Can we use  min_parallel_index_scan_size to decide
whether a particular index can participate in a parallel vacuum?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Wed, Nov 13, 2019 at 11:01 AM Amit Kapila <> wrote:
>
> On Wed, Nov 13, 2019 at 9:48 AM Amit Kapila <> wrote:
> >
> > Yeah, 0,2,3 and 4 sounds reasonable to me.  Earlier, Dilip also got
> > confused with option 1.
> >
>
> Let me try to summarize the discussion on this point and see if others
> have any opinion on this matter.
>
> We need a way to allow IndexAm to specify whether it can participate
> in a parallel vacuum.  As we know there are two phases of
> index-vacuum, bulkdelete and vacuumcleanup and in many cases, the
> bulkdelete performs the main deletion work and then vacuumcleanup just
> returns index statistics. So, for such cases, we don't want the second
> phase to be performed by a parallel vacuum worker.  Now, if the
> bulkdelete phase is not performed, then vacuumcleanup can process the
> entire index in which case it is better to do that phase via parallel
> worker.
>
> OTOH, in some cases vacuumcleanup takes another pass over-index to
> reclaim empty pages and update record the same in FSM even if
> bulkdelete is performed.  This happens in gin and bloom indexes.
> Then, we have an index where we do all the work in cleanup phase like
> in the case of brin indexes.  Now, for this category of indexes, we
> want vacuumcleanup phase to be also performed by a parallel worker.
>
> In short different indexes have different requirements for which phase
> of index vacuum can be performed in parallel.  Just to be clear, we
> can't perform both the phases (bulkdelete and cleanup) in one-go as
> bulk-delete can happen multiple times on a large index whereas
> vacuumcleanup is done once at the end.
>
> Based on these needs, we came up with a way to allow users to specify
> this information for IndexAm's. Basically, Indexam will expose a
> variable amparallelvacuumoptions which can have below options
>
> VACUUM_OPTION_NO_PARALLEL   1 << 0 # vacuum (neither bulkdelete nor
> vacuumcleanup) can't be performed in parallel
> VACUUM_OPTION_PARALLEL_BULKDEL   1 << 1 # bulkdelete can be done in
> parallel (Indexes nbtree, hash, gin, gist, spgist, bloom will set this
> flag)
> VACUUM_OPTION_PARALLEL_COND_CLEANUP  1 << 2 # vacuumcleanup can be
> done in parallel if bulkdelete is not performed (Indexes nbtree, brin,
> gin, gist,
> spgist, bloom will set this flag)
> VACUUM_OPTION_PARALLEL_CLEANUP  1 << 3 # vacuumcleanup can be done in
> parallel even if bulkdelete is already performed (Indexes gin, brin,
> and bloom will set this flag)
>
> We have discussed to expose this information via two variables but the
> above seems like a better idea to all the people involved.
>
> Any suggestions?  Anyone thinks this is not the right way to expose
> this information or there is no need to expose this information or
> they have a better idea for this?
>
> Sawada-San, Dilip, feel free to correct me.
Looks fine to me.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada
<> wrote:
>
> On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <> wrote:
> >
> >
> > In this function, if ParallelVacuumIsActive, we perform the parallel
> > vacuum for all the index for which parallel vacuum is supported and
> > once that is over we finish vacuuming remaining indexes for which
> > parallel vacuum is not supported.  But, my question is that inside
> > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> > to finish their job then only we start with the sequential vacuuming
> > shouldn't we start that immediately as soon as the leader
> > participation is over in the parallel vacuum?
>
> If we do that, while the leader process is vacuuming indexes that
> don't not support parallel vacuum sequentially some workers might be
> vacuuming for other indexes. Isn't it a problem?
>

Can you please explain what problem do you see with that?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Dilip Kumar
Дата:
On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada
<> wrote:
>
> On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <> wrote:
> >
> > On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
> > <> wrote:
> > >
> > > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <> wrote:
> > > >
> > > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > > > <> wrote:
> > > > >
> > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <> wrote:
> > > > > >
> > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > > > updated version patch that also incorporated some comments I got so
> > > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > > > test the total delay time.
> > > > > > >
> > > > > > While reviewing the 0002, I got one doubt related to how we are
> > > > > > dividing the maintainance_work_mem
> > > > > >
> > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > > > > +{
> > > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > > > + lvshared->maintenance_work_mem_worker =
> > > > > > + (nindexes_mwm > 0) ? maintenance_work_mem / nindexes_mwm :
> > > > > > maintenance_work_mem;
> > > > > > +}
> > > > > > Is it fair to just consider the number of indexes which use
> > > > > > maintenance_work_mem?  Or we need to consider the number of worker as
> > > > > > well.  My point is suppose there are 10 indexes which will use the
> > > > > > maintenance_work_mem but we are launching just 2 workers then what is
> > > > > > the point in dividing the maintenance_work_mem by 10.
> > > > > >
> > > > > > IMHO the calculation should be like this
> > > > > > lvshared->maintenance_work_mem_worker = (nindexes_mwm > 0) ?
> > > > > > maintenance_work_mem / Min(nindexes_mwm, nworkers)  :
> > > > > > maintenance_work_mem;
> > > > > >
> > > > > > Am I missing something?
> > > > >
> > > > > No, I think you're right. On the other hand I think that dividing it
> > > > > by the number of indexes that will use the mantenance_work_mem makes
> > > > > sense when parallel degree > the number of such indexes. Suppose the
> > > > > table has 2 indexes and there are 10 workers then we should divide the
> > > > > maintenance_work_mem by 2 rather than 10 because it's possible that at
> > > > > most 2 indexes that uses the maintenance_work_mem are processed in
> > > > > parallel at a time.
> > > > >
> > > > Right, thats the reason I suggested divide with Min(nindexes_mwm, nworkers).
> > >
> > > Thanks! I'll fix it in the next version patch.
> > >
> > One more comment.
> >
> > +lazy_vacuum_indexes(LVRelStats *vacrelstats, Relation *Irel,
> > + int nindexes, IndexBulkDeleteResult **stats,
> > + LVParallelState *lps)
> > +{
> > + ....
> >
> > + if (ParallelVacuumIsActive(lps))
> > + {
> >
> > +
> > + lazy_parallel_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
> > + stats, lps);
> > +
> > + }
> > +
> > + for (idx = 0; idx < nindexes; idx++)
> > + {
> > + /*
> > + * Skip indexes that we have already vacuumed during parallel index
> > + * vacuuming.
> > + */
> > + if (ParallelVacuumIsActive(lps) && !IndStatsIsNull(lps->lvshared, idx))
> > + continue;
> > +
> > + lazy_vacuum_index(Irel[idx], &stats[idx], vacrelstats->dead_tuples,
> > +   vacrelstats->old_live_tuples);
> > + }
> > +}
> >
> > In this function, if ParallelVacuumIsActive, we perform the parallel
> > vacuum for all the index for which parallel vacuum is supported and
> > once that is over we finish vacuuming remaining indexes for which
> > parallel vacuum is not supported.  But, my question is that inside
> > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> > to finish their job then only we start with the sequential vacuuming
> > shouldn't we start that immediately as soon as the leader
> > participation is over in the parallel vacuum?
>
> If we do that, while the leader process is vacuuming indexes that
> don't not support parallel vacuum sequentially some workers might be
> vacuuming for other indexes. Isn't it a problem?

I am not sure what could be the problem.

 If it's not problem,
> I think we can tie up indexes that don't support parallel vacuum to
> the leader and do parallel index vacuum.

I am not sure whether we can do that or not.  Because if we do a
parallel vacuum from the leader for the indexes which don't support a
parallel option then we will unnecessarily allocate the shared memory
for those indexes (index stats).  Moreover, I think it could also
cause a problem in a multi-pass vacuum if we try to copy its stats
into the shared memory.

I think simple option would be that as soon as leader participation is
over we can have a loop for all the indexes who don't support
parallelism in that phase and after completing that we wait for the
parallel workers to finish.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Wed, 13 Nov 2019 at 17:57, Amit Kapila <> wrote:
>
> On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada
> <> wrote:
> >
> > On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <> wrote:
> > >
> > >
> > > In this function, if ParallelVacuumIsActive, we perform the parallel
> > > vacuum for all the index for which parallel vacuum is supported and
> > > once that is over we finish vacuuming remaining indexes for which
> > > parallel vacuum is not supported.  But, my question is that inside
> > > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> > > to finish their job then only we start with the sequential vacuuming
> > > shouldn't we start that immediately as soon as the leader
> > > participation is over in the parallel vacuum?
> >
> > If we do that, while the leader process is vacuuming indexes that
> > don't not support parallel vacuum sequentially some workers might be
> > vacuuming for other indexes. Isn't it a problem?
> >
>
> Can you please explain what problem do you see with that?

I think it depends on index AM user expectation. If disabling parallel
vacuum for an index means that index AM user doesn't just want to
vacuum the index by parallel worker, it's not problem. But if it means
that the user doesn't want to vacuum the index during other indexes is
 being processed in parallel it's unexpected behaviour for the user.
I'm probably worrying too much.

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

От
Amit Kapila
Дата:
On Wed, Nov 13, 2019 at 3:55 PM Masahiko Sawada
<> wrote:
>
> On Wed, 13 Nov 2019 at 17:57, Amit Kapila <> wrote:
> >
> > On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada
> > <> wrote:
> > >
> > > On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <> wrote:
> > > >
> > > >
> > > > In this function, if ParallelVacuumIsActive, we perform the parallel
> > > > vacuum for all the index for which parallel vacuum is supported and
> > > > once that is over we finish vacuuming remaining indexes for which
> > > > parallel vacuum is not supported.  But, my question is that inside
> > > > lazy_parallel_vacuum_or_cleanup_indexes, we wait for all the workers
> > > > to finish their job then only we start with the sequential vacuuming
> > > > shouldn't we start that immediately as soon as the leader
> > > > participation is over in the parallel vacuum?
> > >
> > > If we do that, while the leader process is vacuuming indexes that
> > > don't not support parallel vacuum sequentially some workers might be
> > > vacuuming for other indexes. Isn't it a problem?
> > >
> >
> > Can you please explain what problem do you see with that?
>
> I think it depends on index AM user expectation. If disabling parallel
> vacuum for an index means that index AM user doesn't just want to
> vacuum the index by parallel worker, it's not problem. But if it means
> that the user doesn't want to vacuum the index during other indexes is
>  being processed in parallel it's unexpected behaviour for the user.
>

I would expect the earlier.

> I'm probably worrying too much.
>

Yeah, we can keep the behavior with respect to your first expectation
(If disabling parallel vacuum for an index means that index AM user
doesn't just want to vacuum the index by parallel worker, it's not
problem).  It might not be difficult to change later if there is an
example of such a case.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] Block level parallel vacuum

От
Masahiko Sawada
Дата:
On Wed, 13 Nov 2019 at 18:49, Dilip Kumar <> wrote:
>
> On Wed, Nov 13, 2019 at 11:39 AM Masahiko Sawada
> <> wrote:
> >
> > On Wed, 13 Nov 2019 at 12:43, Dilip Kumar <> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 5:31 PM Masahiko Sawada
> > > <> wrote:
> > > >
> > > > On Tue, 12 Nov 2019 at 20:29, Dilip Kumar <> wrote:
> > > > >
> > > > > On Tue, Nov 12, 2019 at 4:04 PM Masahiko Sawada
> > > > > <> wrote:
> > > > > >
> > > > > > On Mon, 11 Nov 2019 at 17:57, Dilip Kumar <> wrote:
> > > > > > >
> > > > > > > On Tue, Oct 29, 2019 at 12:37 PM Masahiko Sawada <> wrote:
> > > > > > > > I realized that v31-0006 patch doesn't work fine so I've attached the
> > > > > > > > updated version patch that also incorporated some comments I got so
> > > > > > > > far. Sorry for the inconvenience. I'll apply your 0001 patch and also
> > > > > > > > test the total delay time.
> > > > > > > >
> > > > > > > While reviewing the 0002, I got one doubt related to how we are
> > > > > > > dividing the maintainance_work_mem
> > > > > > >
> > > > > > > +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> > > > > > > +{
> > > > > > > + /* Compute the new maitenance_work_mem value for index vacuuming */
> > > > > > > + lvshared-&