Обсуждение: maintenance_work_mem used by Vacuum

Поиск

Список

Период

Сортировка

maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

06 октября 2019 г., 13:54:49

As per docs [1] (see maintenance_work_mem), the maximum amount of memory used by the Vacuum command must be no more than maintenance_work_mem. However, during the review/discussion of the "parallel vacuum" patch [2], we observed that it is not true. Basically, if there is a gin index defined on a table, then the vacuum on that table can consume up to 2 * maintenance_work_mem memory space. The vacuum can use maintenance_work_mem memory space to keep track of dead tuples and another maintenance_work_mem memory space to move tuples from pending pages into regular GIN structure (see ginInsertCleanup). The behavior related to Gin index consuming extra maintenance_work_mem memory is introduced by commit e2c79e14d998cd31f860854bc9210b37b457bb01. It is not clear to me if this is acceptable behavior and if so, shouldn't we document it?

We wanted to decide how a parallel vacuum should use memory? Can each worker consume maintenance_work_mem to clean up the gin Index or all workers should use no more than maintenance_work_mem? We were thinking of later but before we decide what is the right behavior for parallel vacuum, I thought it is better to once discuss if the current memory usage model is right.

[1] - https://www.postgresql.org/docs/devel/runtime-config-resource.html

[2] - https://www.postgresql.org/message-id/CAD21AoARj%3De%3D6_KOZnaR66jRkDmGaVdLcrt33Ua-zMUugKU3mQ%40mail.gmail.com

--

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Robert Haas

Дата:

07 октября 2019 г., 22:27:43

On Sun, Oct 6, 2019 at 6:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> As per docs [1] (see maintenance_work_mem), the maximum amount of memory used by the Vacuum command must be no more
thanmaintenance_work_mem. However, during the review/discussion of the "parallel vacuum" patch [2], we observed that
itis not true. Basically, if there is a gin index defined on a table, then the vacuum on that table can consume up to
2* maintenance_work_mem memory space. The vacuum can use maintenance_work_mem memory space to keep track of dead
tuplesand another maintenance_work_mem memory space to move tuples from pending pages into regular GIN structure (see
ginInsertCleanup). The behavior related to Gin index consuming extra maintenance_work_mem memory is introduced by
commit e2c79e14d998cd31f860854bc9210b37b457bb01. It is not clear to me if this is acceptable behavior and if so,
shouldn'twe document it?

I would say that sucks, because it makes it harder to set
maintenance_work_mem correctly. Not sure how hard it would be to fix,
though.

> We wanted to decide how a parallel vacuum should use memory? Can each worker consume maintenance_work_mem to clean
upthe gin Index or all workers should use no more than maintenance_work_mem? We were thinking of later but before we
decidewhat is the right behavior for parallel vacuum, I thought it is better to once discuss if the current memory
usagemodel is right.

Well, I had the idea when we were developing parallel query that we
should just ignore the problem of work_mem: every node can use X
amount of work_mem, and if there are multiple copies of the node in
multiple processes, then you probably end up using more memory. I
have been informed by Thomas Munro -- in very polite terminology --
that this was a terrible decision which is causing all kinds of
problems for users. I haven't actually encountered that situation
myself, but I don't doubt that it's an issue.

I think it's a lot easier to do better when we're talking about
maintenance commands rather than queries. Maintenance operations
typically don't have the problem that queries do with an unknown
number of nodes using memory; you typically know all of your memory
needs up front. So it's easier to budget that out across workers or
whatever. It's a little harder in this case, because you could have
any number of GIN indexes (1 to infinity) and the amount of memory you
can use depends on not only on how many of them there are but,
presumably also, the number of those that are going to be vacuumed at
the same time. So you might have 8 indexes, 3 workers, and 2 of the
indexes are GIN. In that case, you know that you can't have more than
2 GIN indexes being processed at the same time, but it's likely to be
only one, and maybe with proper scheduling you could make it sure it's
only one. On the other hand, if you dole out the memory assuming it's
only 1, what happens if you start that one, then process all 6 of the
non-GIN indexes, and that one isn't done yet. I guess you could wait
to start cleanup on the other GIN indexes until the previous index
cleanup finishes, but that kinda sucks too. So I'm not really sure how
to handle this particular case. I think the principle of dividing up
the memory rather than just using more is probably a good one, but
figuring out exactly how that should work seems tricky.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: maintenance_work_mem used by Vacuum

От

Peter Geoghegan

Дата:

07 октября 2019 г., 23:17:53

On Mon, Oct 7, 2019 at 12:28 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I would say that sucks, because it makes it harder to set
> maintenance_work_mem correctly.  Not sure how hard it would be to fix,
> though.

ginInsertCleanup() may now be the worst piece of code in the entire
tree, so no surprised that it gets this wrong too.

2016's commit e2c79e14d99 ripped out the following comment about the
use of maintenance_work_mem by ginInsertCleanup():

@@ -821,13 +847,10 @@ ginInsertCleanup(GinState *ginstate,
  * Is it time to flush memory to disk? Flush if we are at the end of
  * the pending list, or if we have a full row and memory is getting
  * full.
- *
- * XXX using up maintenance_work_mem here is probably unreasonably
- * much, since vacuum might already be using that much.
  */

ISTM that the use of maintenance_work_mem wasn't given that much
thought originally.

-- 
Peter Geoghegan

Re: maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

08 октября 2019 г., 08:44:56

On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Oct 7, 2019 at 12:28 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > I would say that sucks, because it makes it harder to set
> > maintenance_work_mem correctly.  Not sure how hard it would be to fix,
> > though.
>
> ginInsertCleanup() may now be the worst piece of code in the entire
> tree, so no surprised that it gets this wrong too.
>
> 2016's commit e2c79e14d99 ripped out the following comment about the
> use of maintenance_work_mem by ginInsertCleanup():
>
> @@ -821,13 +847,10 @@ ginInsertCleanup(GinState *ginstate,
>   * Is it time to flush memory to disk? Flush if we are at the end of
>   * the pending list, or if we have a full row and memory is getting
>   * full.
> - *
> - * XXX using up maintenance_work_mem here is probably unreasonably
> - * much, since vacuum might already be using that much.
>   */
>
> ISTM that the use of maintenance_work_mem wasn't given that much
> thought originally.
>

One idea to something better could be to check, if there is a GIN
index on a table, then use 1/4 (25% or whatever) of
maintenance_work_mem for GIN indexes and 3/4 (75%) of
maintenance_work_mem for collection dead tuples.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

08 октября 2019 г., 08:59:56

On Tue, Oct 8, 2019 at 12:57 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sun, Oct 6, 2019 at 6:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > We wanted to decide how a parallel vacuum should use memory?  Can each worker consume maintenance_work_mem to clean
upthe gin Index or all workers should use no more than maintenance_work_mem?  We were thinking of later but before we
decidewhat is the right behavior for parallel vacuum, I thought it is better to once discuss if the current memory
usagemodel is right. 
>
> Well, I had the idea when we were developing parallel query that we
> should just ignore the problem of work_mem: every node can use X
> amount of work_mem, and if there are multiple copies of the node in
> multiple processes, then you probably end up using more memory.  I
> have been informed by Thomas Munro -- in very polite terminology --
> that this was a terrible decision which is causing all kinds of
> problems for users. I haven't actually encountered that situation
> myself, but I don't doubt that it's an issue.
>
> I think it's a lot easier to do better when we're talking about
> maintenance commands rather than queries. Maintenance operations
> typically don't have the problem that queries do with an unknown
> number of nodes using memory; you typically know all of your memory
> needs up front.  So it's easier to budget that out across workers or
> whatever. It's a little harder in this case, because you could have
> any number of GIN indexes (1 to infinity) and the amount of memory you
> can use depends on not only on how many of them there are but,
> presumably also, the number of those that are going to be vacuumed at
> the same time.  So you might have 8 indexes, 3 workers, and 2 of the
> indexes are GIN. In that case, you know that you can't have more than
> 2 GIN indexes being processed at the same time, but it's likely to be
> only one, and maybe with proper scheduling you could make it sure it's
> only one. On the other hand, if you dole out the memory assuming it's
> only 1, what happens if you start that one, then process all 6 of the
> non-GIN indexes, and that one isn't done yet. I guess you could wait
> to start cleanup on the other GIN indexes until the previous index
> cleanup finishes, but that kinda sucks too. So I'm not really sure how
> to handle this particular case. I think the principle of dividing up
> the memory rather than just using more is probably a good one, but
> figuring out exactly how that should work seems tricky.
>

Yeah and what if we have workers equal to indexes, so doing the clean
up of Gin indexes serially (wait for the prior index to finish before
starting the clean up of next Gin index) in that case would be bad
too.  I think we can do something simple like choose minimum among
'number of Gin Indexes', 'number of workers requested for parallel
vacuum' and 'number of max_parallel_maintenance_workers' and then
divide the maintenance_work_mem by that to get the memory used by each
of the Gin indexes.  I think it has some caveats like we might not be
able to launch the number of workers we decided and in that case we
probably could have computed bigger value of work_mem that can be used
by Gin indexes.   I think whatever we pick here can be good for some
cases and not-so-good for others, so why not pick something general
and simple.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Masahiko Sawada

Дата:

09 октября 2019 г., 07:51:47

On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > On Mon, Oct 7, 2019 at 12:28 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > > I would say that sucks, because it makes it harder to set
> > > maintenance_work_mem correctly.  Not sure how hard it would be to fix,
> > > though.
> >
> > ginInsertCleanup() may now be the worst piece of code in the entire
> > tree, so no surprised that it gets this wrong too.
> >
> > 2016's commit e2c79e14d99 ripped out the following comment about the
> > use of maintenance_work_mem by ginInsertCleanup():
> >
> > @@ -821,13 +847,10 @@ ginInsertCleanup(GinState *ginstate,
> >   * Is it time to flush memory to disk? Flush if we are at the end of
> >   * the pending list, or if we have a full row and memory is getting
> >   * full.
> > - *
> > - * XXX using up maintenance_work_mem here is probably unreasonably
> > - * much, since vacuum might already be using that much.
> >   */
> >
> > ISTM that the use of maintenance_work_mem wasn't given that much
> > thought originally.
> >
>
> One idea to something better could be to check, if there is a GIN
> index on a table, then use 1/4 (25% or whatever) of
> maintenance_work_mem for GIN indexes and 3/4 (75%) of
> maintenance_work_mem for collection dead tuples.
>

I felt that it would not be easy for users to tune
maintenance_work_mem which controls more than one things.  If this is
an index AM(GIN) specific issue we might rather want to control the
memory limit of pending list cleanup by a separate GUC parameter like
gin_pending_list_limit, say gin_pending_list_work_mem. And we can
either set the  (the memory for GIN pending list cleanup / # of GIN
indexes) to the parallel workers.

Regards,

--
Masahiko Sawada

Re: maintenance_work_mem used by Vacuum

От

Dilip Kumar

Дата:

09 октября 2019 г., 11:29:57

On Wed, Oct 9, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > >
> > > On Mon, Oct 7, 2019 at 12:28 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > > > I would say that sucks, because it makes it harder to set
> > > > maintenance_work_mem correctly.  Not sure how hard it would be to fix,
> > > > though.
> > >
> > > ginInsertCleanup() may now be the worst piece of code in the entire
> > > tree, so no surprised that it gets this wrong too.
> > >
> > > 2016's commit e2c79e14d99 ripped out the following comment about the
> > > use of maintenance_work_mem by ginInsertCleanup():
> > >
> > > @@ -821,13 +847,10 @@ ginInsertCleanup(GinState *ginstate,
> > >   * Is it time to flush memory to disk? Flush if we are at the end of
> > >   * the pending list, or if we have a full row and memory is getting
> > >   * full.
> > > - *
> > > - * XXX using up maintenance_work_mem here is probably unreasonably
> > > - * much, since vacuum might already be using that much.
> > >   */
> > >
> > > ISTM that the use of maintenance_work_mem wasn't given that much
> > > thought originally.
> > >
> >
> > One idea to something better could be to check, if there is a GIN
> > index on a table, then use 1/4 (25% or whatever) of
> > maintenance_work_mem for GIN indexes and 3/4 (75%) of
> > maintenance_work_mem for collection dead tuples.
> >
>
> I felt that it would not be easy for users to tune
> maintenance_work_mem which controls more than one things.  If this is
> an index AM(GIN) specific issue we might rather want to control the
> memory limit of pending list cleanup by a separate GUC parameter like
> gin_pending_list_limit, say gin_pending_list_work_mem. And we can
> either set the  (the memory for GIN pending list cleanup / # of GIN
> indexes) to the parallel workers.
>
IMHO if we do that then we will loose the meaning of having
maintenance_work_mem right?  Then user can not control that how much
memory the autovacuum worker will use.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

09 октября 2019 г., 12:10:29

On Wed, Oct 9, 2019 at 2:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Oct 9, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > > >
> > > > ISTM that the use of maintenance_work_mem wasn't given that much
> > > > thought originally.
> > > >
> > >
> > > One idea to something better could be to check, if there is a GIN
> > > index on a table, then use 1/4 (25% or whatever) of
> > > maintenance_work_mem for GIN indexes and 3/4 (75%) of
> > > maintenance_work_mem for collection dead tuples.
> > >
> >
> > I felt that it would not be easy for users to tune
> > maintenance_work_mem which controls more than one things.  If this is
> > an index AM(GIN) specific issue we might rather want to control the
> > memory limit of pending list cleanup by a separate GUC parameter like
> > gin_pending_list_limit, say gin_pending_list_work_mem.

Sure, by having another work_mem parameter for the Gin indexes which
controls when we need to flush the pending list will make life easier
as a programmer.  I think if we have a specific parameter for this
purpose, then we can even think of using the same for a clean up
during insert operation as well.  However, I am not sure how easy it
would be for users?  Basically, now they need to remember another
parameter and for which there is no easy way to know what should be
the value.  I think one has to check
gin_metapage_info->n_pending_pages and then based on that they can
configure the value for this parameter to get the maximum benefit
possible.

Can we think of using work_mem for this?  Basically, we use work_mem
during insert operation, so why not use it during vacuum operation for
this purpose?

Another idea could be to try to divide the maintenance_work_mem
smartly if we know the value of pending_pages for each Gin index, but
I think for that we need to either read the metapage of maybe use some
sort of stats which can be used by vacuum.  We need to somehow divide
it based on the amount of memory required for a number of dead tuples
in heap and memory required by tuples in the pending list.  I am not
sure how feasible is this approach.

About difficulty for users tuning one or two parameters for vacuum, I
think if they can compute what could be the values for Guc's
separately, then why can't they add up and set it as one value.
Having said that, I am not denying that having a separate parameter
gives better control, and for this specific case using separate
parameter can allow us to use it both during vacuum and insert
operations.

> > And we can
> > either set the  (the memory for GIN pending list cleanup / # of GIN
> > indexes) to the parallel workers.
> >
> IMHO if we do that then we will loose the meaning of having
> maintenance_work_mem right?  Then user can not control that how much
> memory the autovacuum worker will use.
>

I am not sure how different it is from the current situation?
Basically, now it can use up to 2 * maintenance_work_mem memory and if
we do what Sawada-San is proposing, then it will be
maintenance_work_mem + gin_*_work_mem.  Do you have some other
alternative idea in mind or you think the current situation is better
than anything else we can do in this area?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Dilip Kumar

Дата:

09 октября 2019 г., 13:12:18

On Wed, Oct 9, 2019 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 9, 2019 at 2:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Oct 9, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > > > >
> > > > > ISTM that the use of maintenance_work_mem wasn't given that much
> > > > > thought originally.
> > > > >
> > > >
> > > > One idea to something better could be to check, if there is a GIN
> > > > index on a table, then use 1/4 (25% or whatever) of
> > > > maintenance_work_mem for GIN indexes and 3/4 (75%) of
> > > > maintenance_work_mem for collection dead tuples.
> > > >
> > >
> > > I felt that it would not be easy for users to tune
> > > maintenance_work_mem which controls more than one things.  If this is
> > > an index AM(GIN) specific issue we might rather want to control the
> > > memory limit of pending list cleanup by a separate GUC parameter like
> > > gin_pending_list_limit, say gin_pending_list_work_mem.
>
> Sure, by having another work_mem parameter for the Gin indexes which
> controls when we need to flush the pending list will make life easier
> as a programmer.  I think if we have a specific parameter for this
> purpose, then we can even think of using the same for a clean up
> during insert operation as well.  However, I am not sure how easy it
> would be for users?  Basically, now they need to remember another
> parameter and for which there is no easy way to know what should be
> the value.  I think one has to check
> gin_metapage_info->n_pending_pages and then based on that they can
> configure the value for this parameter to get the maximum benefit
> possible.
>
> Can we think of using work_mem for this?  Basically, we use work_mem
> during insert operation, so why not use it during vacuum operation for
> this purpose?
>
> Another idea could be to try to divide the maintenance_work_mem
> smartly if we know the value of pending_pages for each Gin index, but
> I think for that we need to either read the metapage of maybe use some
> sort of stats which can be used by vacuum.  We need to somehow divide
> it based on the amount of memory required for a number of dead tuples
> in heap and memory required by tuples in the pending list.  I am not
> sure how feasible is this approach.
>
> About difficulty for users tuning one or two parameters for vacuum, I
> think if they can compute what could be the values for Guc's
> separately, then why can't they add up and set it as one value.
> Having said that, I am not denying that having a separate parameter
> gives better control, and for this specific case using separate
> parameter can allow us to use it both during vacuum and insert
> operations.
>
> > > And we can
> > > either set the  (the memory for GIN pending list cleanup / # of GIN
> > > indexes) to the parallel workers.
> > >
> > IMHO if we do that then we will loose the meaning of having
> > maintenance_work_mem right?  Then user can not control that how much
> > memory the autovacuum worker will use.
> >
>
> I am not sure how different it is from the current situation?
> Basically, now it can use up to 2 * maintenance_work_mem memory and if
> we do what Sawada-San is proposing, then it will be
> maintenance_work_mem + gin_*_work_mem.  Do you have some other
> alternative idea in mind or you think the current situation is better
> than anything else we can do in this area?

I think the current situation is not good but if we try to cap it to
maintenance_work_mem + gin_*_work_mem then also I don't think it will
make the situation much better.  However, I think the idea you
proposed up-thread[1] is better.  At least the  maintenance_work_mem
will be the top limit what the auto vacuum worker can use.

[1] https://www.postgresql.org/message-id/CAA4eK1JhY88BXC%3DZK%3D89MALm%2BLyMkMhi6WG6AZfE4%2BKij6mebg%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Masahiko Sawada

Дата:

10 октября 2019 г., 07:28:17

On Wed, Oct 9, 2019 at 7:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Oct 9, 2019 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Oct 9, 2019 at 2:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Wed, Oct 9, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > > > > >
> > > > > > ISTM that the use of maintenance_work_mem wasn't given that much
> > > > > > thought originally.
> > > > > >
> > > > >
> > > > > One idea to something better could be to check, if there is a GIN
> > > > > index on a table, then use 1/4 (25% or whatever) of
> > > > > maintenance_work_mem for GIN indexes and 3/4 (75%) of
> > > > > maintenance_work_mem for collection dead tuples.
> > > > >
> > > >
> > > > I felt that it would not be easy for users to tune
> > > > maintenance_work_mem which controls more than one things.  If this is
> > > > an index AM(GIN) specific issue we might rather want to control the
> > > > memory limit of pending list cleanup by a separate GUC parameter like
> > > > gin_pending_list_limit, say gin_pending_list_work_mem.
> >
> > Sure, by having another work_mem parameter for the Gin indexes which
> > controls when we need to flush the pending list will make life easier
> > as a programmer.  I think if we have a specific parameter for this
> > purpose, then we can even think of using the same for a clean up
> > during insert operation as well.  However, I am not sure how easy it
> > would be for users?  Basically, now they need to remember another
> > parameter and for which there is no easy way to know what should be
> > the value.  I think one has to check
> > gin_metapage_info->n_pending_pages and then based on that they can
> > configure the value for this parameter to get the maximum benefit
> > possible.
> >
> > Can we think of using work_mem for this?  Basically, we use work_mem
> > during insert operation, so why not use it during vacuum operation for
> > this purpose?
> >
> > Another idea could be to try to divide the maintenance_work_mem
> > smartly if we know the value of pending_pages for each Gin index, but
> > I think for that we need to either read the metapage of maybe use some
> > sort of stats which can be used by vacuum.  We need to somehow divide
> > it based on the amount of memory required for a number of dead tuples
> > in heap and memory required by tuples in the pending list.  I am not
> > sure how feasible is this approach.
> >
> > About difficulty for users tuning one or two parameters for vacuum, I
> > think if they can compute what could be the values for Guc's
> > separately, then why can't they add up and set it as one value.
> > Having said that, I am not denying that having a separate parameter
> > gives better control, and for this specific case using separate
> > parameter can allow us to use it both during vacuum and insert
> > operations.
> >
> > > > And we can
> > > > either set the  (the memory for GIN pending list cleanup / # of GIN
> > > > indexes) to the parallel workers.
> > > >
> > > IMHO if we do that then we will loose the meaning of having
> > > maintenance_work_mem right?  Then user can not control that how much
> > > memory the autovacuum worker will use.
> > >
> >
> > I am not sure how different it is from the current situation?
> > Basically, now it can use up to 2 * maintenance_work_mem memory and if
> > we do what Sawada-San is proposing, then it will be
> > maintenance_work_mem + gin_*_work_mem.  Do you have some other
> > alternative idea in mind or you think the current situation is better
> > than anything else we can do in this area?
>
> I think the current situation is not good but if we try to cap it to
> maintenance_work_mem + gin_*_work_mem then also I don't think it will
> make the situation much better.  However, I think the idea you
> proposed up-thread[1] is better.  At least the  maintenance_work_mem
> will be the top limit what the auto vacuum worker can use.
>

I'm concerned that there are other index AMs that could consume more
memory like GIN. In principle we can vacuum third party index AMs and
will be able to even parallel vacuum them. I  expect that
maintenance_work_mem is the top limit of memory usage of maintenance
command but actually it's hard to set the limit to memory usage of
bulkdelete and cleanup by the core. So I thought that since GIN is the
one of the index AM it can have a new parameter to make its job
faster. If we have that parameter it might not make the current
situation much better but user will be able to set a lower value to
that parameter to not use the memory much while keeping the number of
index vacuums.

Regards,

--
Masahiko Sawada

Re: maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

10 октября 2019 г., 09:36:02

On Thu, Oct 10, 2019 at 9:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Oct 9, 2019 at 7:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I think the current situation is not good but if we try to cap it to
> > maintenance_work_mem + gin_*_work_mem then also I don't think it will
> > make the situation much better.  However, I think the idea you
> > proposed up-thread[1] is better.  At least the  maintenance_work_mem
> > will be the top limit what the auto vacuum worker can use.
> >
>
> I'm concerned that there are other index AMs that could consume more
> memory like GIN. In principle we can vacuum third party index AMs and
> will be able to even parallel vacuum them. I  expect that
> maintenance_work_mem is the top limit of memory usage of maintenance
> command but actually it's hard to set the limit to memory usage of
> bulkdelete and cleanup by the core. So I thought that since GIN is the
> one of the index AM it can have a new parameter to make its job
> faster. If we have that parameter it might not make the current
> situation much better but user will be able to set a lower value to
> that parameter to not use the memory much while keeping the number of
> index vacuums.
>

I can understand your concern why dividing maintenance_work_mem for
vacuuming heap and cleaning up the index might be tricky especially
because of third party indexes, but introducing new Guc isn't free
either.  I think that should be the last resort and we need buy-in
from more people for that.  Did you consider using work_mem for this?
And even if we want to go with a new guc, maybe it is better to have
some generic name like maintenance_index_work_mem or something along
those lines so that it can be used for other index cleanups as well if
required.

Tom, Teodor, do you have any opinion on this matter?  This has been
introduced by commit:

commit ff301d6e690bb5581502ea3d8591a1600fd87acc
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Mar 24 20:17:18 2009 +0000

Implement "fastupdate" support for GIN indexes, in which we try to
accumulate multiple index entries in a holding area before adding them
to the main index structure.  This helps because bulk insert is
(usually) significantly faster than retail insert for GIN.
..
..
Teodor Sigaev

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Masahiko Sawada

Дата:

10 октября 2019 г., 11:39:28

On Thu, Oct 10, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 10, 2019 at 9:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Oct 9, 2019 at 7:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > I think the current situation is not good but if we try to cap it to
> > > maintenance_work_mem + gin_*_work_mem then also I don't think it will
> > > make the situation much better.  However, I think the idea you
> > > proposed up-thread[1] is better.  At least the  maintenance_work_mem
> > > will be the top limit what the auto vacuum worker can use.
> > >
> >
> > I'm concerned that there are other index AMs that could consume more
> > memory like GIN. In principle we can vacuum third party index AMs and
> > will be able to even parallel vacuum them. I  expect that
> > maintenance_work_mem is the top limit of memory usage of maintenance
> > command but actually it's hard to set the limit to memory usage of
> > bulkdelete and cleanup by the core. So I thought that since GIN is the
> > one of the index AM it can have a new parameter to make its job
> > faster. If we have that parameter it might not make the current
> > situation much better but user will be able to set a lower value to
> > that parameter to not use the memory much while keeping the number of
> > index vacuums.
> >
>
> I can understand your concern why dividing maintenance_work_mem for
> vacuuming heap and cleaning up the index might be tricky especially
> because of third party indexes, but introducing new Guc isn't free
> either.  I think that should be the last resort and we need buy-in
> from more people for that.  Did you consider using work_mem for this?

Yeah that seems work too. But I wonder if it could be the similar
story to gin_pending_list_limit. I mean that previously we used to use
 work_mem as the maximum size of GIN pending list. But we concluded
that it was not appropriate to control both by one GUC so we
introduced gin_penidng_list_limit and the storage parameter at commit
263865a4 (originally it's pending_list_cleanup_size but rename to
gin_pending_list_limit at commit c291503b1). I feel that that story is
quite similar to this issue.

Regards,

--
Masahiko Sawada

Re: maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

10 октября 2019 г., 12:37:57

On Thu, Oct 10, 2019 at 2:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Oct 10, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Oct 10, 2019 at 9:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Oct 9, 2019 at 7:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > I think the current situation is not good but if we try to cap it to
> > > > maintenance_work_mem + gin_*_work_mem then also I don't think it will
> > > > make the situation much better.  However, I think the idea you
> > > > proposed up-thread[1] is better.  At least the  maintenance_work_mem
> > > > will be the top limit what the auto vacuum worker can use.
> > > >
> > >
> > > I'm concerned that there are other index AMs that could consume more
> > > memory like GIN. In principle we can vacuum third party index AMs and
> > > will be able to even parallel vacuum them. I  expect that
> > > maintenance_work_mem is the top limit of memory usage of maintenance
> > > command but actually it's hard to set the limit to memory usage of
> > > bulkdelete and cleanup by the core. So I thought that since GIN is the
> > > one of the index AM it can have a new parameter to make its job
> > > faster. If we have that parameter it might not make the current
> > > situation much better but user will be able to set a lower value to
> > > that parameter to not use the memory much while keeping the number of
> > > index vacuums.
> > >
> >
> > I can understand your concern why dividing maintenance_work_mem for
> > vacuuming heap and cleaning up the index might be tricky especially
> > because of third party indexes, but introducing new Guc isn't free
> > either.  I think that should be the last resort and we need buy-in
> > from more people for that.  Did you consider using work_mem for this?
>
> Yeah that seems work too. But I wonder if it could be the similar
> story to gin_pending_list_limit. I mean that previously we used to use
>  work_mem as the maximum size of GIN pending list. But we concluded
> that it was not appropriate to control both by one GUC so we
> introduced gin_penidng_list_limit and the storage parameter at commit
> 263865a4
>

It seems you want to say about commit id
a1b395b6a26ae80cde17fdfd2def8d351872f399.   I wonder why they have not
changed it to gin_penidng_list_limit (at that time
pending_list_cleanup_size) in that commit itself?  I think if we want
to use gin_pending_list_limit in this context then we can replace both
work_mem and maintenance_work_mem with gin_penidng_list_limit.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Masahiko Sawada

Дата:

11 октября 2019 г., 05:05:26

On Thu, Oct 10, 2019 at 6:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Oct 10, 2019 at 2:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Oct 10, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Oct 10, 2019 at 9:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Wed, Oct 9, 2019 at 7:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > > >
> > > > > I think the current situation is not good but if we try to cap it to
> > > > > maintenance_work_mem + gin_*_work_mem then also I don't think it will
> > > > > make the situation much better.  However, I think the idea you
> > > > > proposed up-thread[1] is better.  At least the  maintenance_work_mem
> > > > > will be the top limit what the auto vacuum worker can use.
> > > > >
> > > >
> > > > I'm concerned that there are other index AMs that could consume more
> > > > memory like GIN. In principle we can vacuum third party index AMs and
> > > > will be able to even parallel vacuum them. I  expect that
> > > > maintenance_work_mem is the top limit of memory usage of maintenance
> > > > command but actually it's hard to set the limit to memory usage of
> > > > bulkdelete and cleanup by the core. So I thought that since GIN is the
> > > > one of the index AM it can have a new parameter to make its job
> > > > faster. If we have that parameter it might not make the current
> > > > situation much better but user will be able to set a lower value to
> > > > that parameter to not use the memory much while keeping the number of
> > > > index vacuums.
> > > >
> > >
> > > I can understand your concern why dividing maintenance_work_mem for
> > > vacuuming heap and cleaning up the index might be tricky especially
> > > because of third party indexes, but introducing new Guc isn't free
> > > either.  I think that should be the last resort and we need buy-in
> > > from more people for that.  Did you consider using work_mem for this?
> >
> > Yeah that seems work too. But I wonder if it could be the similar
> > story to gin_pending_list_limit. I mean that previously we used to use
> >  work_mem as the maximum size of GIN pending list. But we concluded
> > that it was not appropriate to control both by one GUC so we
> > introduced gin_penidng_list_limit and the storage parameter at commit
> > 263865a4
> >
>
> It seems you want to say about commit id
> a1b395b6a26ae80cde17fdfd2def8d351872f399.

Yeah thanks.

>  I wonder why they have not
> changed it to gin_penidng_list_limit (at that time
> pending_list_cleanup_size) in that commit itself?  I think if we want
> to use gin_pending_list_limit in this context then we can replace both
> work_mem and maintenance_work_mem with gin_penidng_list_limit.

Hmm as far as I can see the discussion, no one mentioned about
maintenance_work_mem. Perhaps we just oversighted? I also didn't know
that.

I also think we can replace at least the work_mem for cleanup of
pending list with gin_pending_list_limit. In the following comment in
ginfast.c,

/*
 * Force pending list cleanup when it becomes too long. And,
 * ginInsertCleanup could take significant amount of time, so we prefer to
 * call it when it can do all the work in a single collection cycle. In
 * non-vacuum mode, it shouldn't require maintenance_work_mem, so fire it
 * while pending list is still small enough to fit into
 * gin_pending_list_limit.
 *
 * ginInsertCleanup() should not be called inside our CRIT_SECTION.
 */
cleanupSize = GinGetPendingListCleanupSize(index);
if (metadata->nPendingPages * GIN_PAGE_FREESIZE > cleanupSize * 1024L)
    needCleanup = true;

ISTM the gin_pending_list_limit in the above comment corresponds to
the following code in ginfast.c,

/*
 * We are called from regular insert and if we see concurrent cleanup
 * just exit in hope that concurrent process will clean up pending
 * list.
 */
if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock))
    return;
workMemory = work_mem;

If work_mem is smaller than gin_pending_list_limit the pending list
cleanup would behave against the intention of the above comment that
prefers to do all the work in a single collection cycle while pending
list is still small enough to fit into gin_pending_list_limit.

Regards,

--
Masahiko Sawada

Re: maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

11 октября 2019 г., 11:13:19

On Fri, Oct 11, 2019 at 7:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Oct 10, 2019 at 6:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > It seems you want to say about commit id
> > a1b395b6a26ae80cde17fdfd2def8d351872f399.
>
> Yeah thanks.
>
> >  I wonder why they have not
> > changed it to gin_penidng_list_limit (at that time
> > pending_list_cleanup_size) in that commit itself?  I think if we want
> > to use gin_pending_list_limit in this context then we can replace both
> > work_mem and maintenance_work_mem with gin_penidng_list_limit.
>
> Hmm as far as I can see the discussion, no one mentioned about
> maintenance_work_mem. Perhaps we just oversighted?
>

It is possible, but we can't be certain until those people confirm the same.

> I also didn't know
> that.
>
> I also think we can replace at least the work_mem for cleanup of
> pending list with gin_pending_list_limit. In the following comment in
> ginfast.c,
>

Agreed, but that won't solve the original purpose for which we started
this thread.

> /*
>  * Force pending list cleanup when it becomes too long. And,
>  * ginInsertCleanup could take significant amount of time, so we prefer to
>  * call it when it can do all the work in a single collection cycle. In
>  * non-vacuum mode, it shouldn't require maintenance_work_mem, so fire it
>  * while pending list is still small enough to fit into
>  * gin_pending_list_limit.
>  *
>  * ginInsertCleanup() should not be called inside our CRIT_SECTION.
>  */
> cleanupSize = GinGetPendingListCleanupSize(index);
> if (metadata->nPendingPages * GIN_PAGE_FREESIZE > cleanupSize * 1024L)
>     needCleanup = true;
>
> ISTM the gin_pending_list_limit in the above comment corresponds to
> the following code in ginfast.c,
>
> /*
>  * We are called from regular insert and if we see concurrent cleanup
>  * just exit in hope that concurrent process will clean up pending
>  * list.
>  */
> if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock))
>     return;
> workMemory = work_mem;
>
> If work_mem is smaller than gin_pending_list_limit the pending list
> cleanup would behave against the intention of the above comment that
> prefers to do all the work in a single collection cycle while pending
> list is still small enough to fit into gin_pending_list_limit.
>

That's right, but OTOH, if the user specifies gin_pending_list_limit
as an option during Create Index with a value greater than GUC
gin_pending_list_limit, then also we will face the same problem.  It
seems to me that we haven't thought enough on memory usage during Gin
pending list cleanup and I don't want to independently make a decision
to change it.  So unless the original author/committer or some other
people who have worked in this area share their opinion, we can leave
it as it is and make a parallel vacuum patch adapt to it.

The suggestion from others is welcome.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Masahiko Sawada

Дата:

12 октября 2019 г., 08:18:43

On Fri, Oct 11, 2019 at 5:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 11, 2019 at 7:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Oct 10, 2019 at 6:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > It seems you want to say about commit id
> > > a1b395b6a26ae80cde17fdfd2def8d351872f399.
> >
> > Yeah thanks.
> >
> > >  I wonder why they have not
> > > changed it to gin_penidng_list_limit (at that time
> > > pending_list_cleanup_size) in that commit itself?  I think if we want
> > > to use gin_pending_list_limit in this context then we can replace both
> > > work_mem and maintenance_work_mem with gin_penidng_list_limit.
> >
> > Hmm as far as I can see the discussion, no one mentioned about
> > maintenance_work_mem. Perhaps we just oversighted?
> >
>
> It is possible, but we can't be certain until those people confirm the same.
>
> > I also didn't know
> > that.
> >
> > I also think we can replace at least the work_mem for cleanup of
> > pending list with gin_pending_list_limit. In the following comment in
> > ginfast.c,
> >
>
> Agreed, but that won't solve the original purpose for which we started
> this thread.
>
> > /*
> >  * Force pending list cleanup when it becomes too long. And,
> >  * ginInsertCleanup could take significant amount of time, so we prefer to
> >  * call it when it can do all the work in a single collection cycle. In
> >  * non-vacuum mode, it shouldn't require maintenance_work_mem, so fire it
> >  * while pending list is still small enough to fit into
> >  * gin_pending_list_limit.
> >  *
> >  * ginInsertCleanup() should not be called inside our CRIT_SECTION.
> >  */
> > cleanupSize = GinGetPendingListCleanupSize(index);
> > if (metadata->nPendingPages * GIN_PAGE_FREESIZE > cleanupSize * 1024L)
> >     needCleanup = true;
> >
> > ISTM the gin_pending_list_limit in the above comment corresponds to
> > the following code in ginfast.c,
> >
> > /*
> >  * We are called from regular insert and if we see concurrent cleanup
> >  * just exit in hope that concurrent process will clean up pending
> >  * list.
> >  */
> > if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock))
> >     return;
> > workMemory = work_mem;
> >
> > If work_mem is smaller than gin_pending_list_limit the pending list
> > cleanup would behave against the intention of the above comment that
> > prefers to do all the work in a single collection cycle while pending
> > list is still small enough to fit into gin_pending_list_limit.
> >
>
> That's right, but OTOH, if the user specifies gin_pending_list_limit
> as an option during Create Index with a value greater than GUC
> gin_pending_list_limit, then also we will face the same problem.  It
> seems to me that we haven't thought enough on memory usage during Gin
> pending list cleanup and I don't want to independently make a decision
> to change it.  So unless the original author/committer or some other
> people who have worked in this area share their opinion, we can leave
> it as it is and make a parallel vacuum patch adapt to it.

Yeah I totally agreed.

Apart from the GIN problem can we discuss whether need to change the
current memory usage policy of parallel utility command described in
the doc? We cannot control the memory usage in index AM after all but
we need to generically consider how much memory is used during
parallel vacuum.

Regards,

--
Masahiko Sawada

Re: maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

12 октября 2019 г., 14:44:52

On Sat, Oct 12, 2019 at 10:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Oct 11, 2019 at 5:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > That's right, but OTOH, if the user specifies gin_pending_list_limit
> > as an option during Create Index with a value greater than GUC
> > gin_pending_list_limit, then also we will face the same problem.  It
> > seems to me that we haven't thought enough on memory usage during Gin
> > pending list cleanup and I don't want to independently make a decision
> > to change it.  So unless the original author/committer or some other
> > people who have worked in this area share their opinion, we can leave
> > it as it is and make a parallel vacuum patch adapt to it.
>
> Yeah I totally agreed.
>
> Apart from the GIN problem can we discuss whether need to change the
> current memory usage policy of parallel utility command described in
> the doc? We cannot control the memory usage in index AM after all but
> we need to generically consider how much memory is used during
> parallel vacuum.
>

Do you mean to say change the docs for a parallel vacuum patch in this
regard?  If so, I think we might want to do something for
maintenance_work_mem for parallel vacuum as described in one of the
emails above [1] and then change the docs accordingly.


[1] - https://www.postgresql.org/message-id/CAA4eK1JhpNsTiHj%2BJOy3N8uCGyTBMH8xDhUEtBw8ZeCAPRGp6Q%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Masahiko Sawada

Дата:

16 октября 2019 г., 04:49:51

On Sat, Oct 12, 2019 at 8:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Oct 12, 2019 at 10:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Oct 11, 2019 at 5:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > That's right, but OTOH, if the user specifies gin_pending_list_limit
> > > as an option during Create Index with a value greater than GUC
> > > gin_pending_list_limit, then also we will face the same problem.  It
> > > seems to me that we haven't thought enough on memory usage during Gin
> > > pending list cleanup and I don't want to independently make a decision
> > > to change it.  So unless the original author/committer or some other
> > > people who have worked in this area share their opinion, we can leave
> > > it as it is and make a parallel vacuum patch adapt to it.
> >
> > Yeah I totally agreed.
> >
> > Apart from the GIN problem can we discuss whether need to change the
> > current memory usage policy of parallel utility command described in
> > the doc? We cannot control the memory usage in index AM after all but
> > we need to generically consider how much memory is used during
> > parallel vacuum.
> >
>
> Do you mean to say change the docs for a parallel vacuum patch in this
> regard?  If so, I think we might want to do something for
> maintenance_work_mem for parallel vacuum as described in one of the
> emails above [1] and then change the docs accordingly.
>

Yes agreed. I thought that we can discuss that while waiting for other
opinion on the memory usage of gin index's pending list cleanup. For
example one of your suggestions[1] is simple and maybe acceptable but
I guess that it can deal with only gin indexes but not other index AMs
that might consume more memory.

[1] https://www.postgresql.org/message-id/CAA4eK1JhpNsTiHj%2BJOy3N8uCGyTBMH8xDhUEtBw8ZeCAPRGp6Q%40mail.gmail.com

Regards,

--
Masahiko Sawada

Re: maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

16 октября 2019 г., 09:48:41

On Wed, Oct 16, 2019 at 7:20 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Oct 12, 2019 at 8:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Oct 12, 2019 at 10:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Oct 11, 2019 at 5:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > That's right, but OTOH, if the user specifies gin_pending_list_limit
> > > > as an option during Create Index with a value greater than GUC
> > > > gin_pending_list_limit, then also we will face the same problem.  It
> > > > seems to me that we haven't thought enough on memory usage during Gin
> > > > pending list cleanup and I don't want to independently make a decision
> > > > to change it.  So unless the original author/committer or some other
> > > > people who have worked in this area share their opinion, we can leave
> > > > it as it is and make a parallel vacuum patch adapt to it.
> > >
> > > Yeah I totally agreed.
> > >
> > > Apart from the GIN problem can we discuss whether need to change the
> > > current memory usage policy of parallel utility command described in
> > > the doc? We cannot control the memory usage in index AM after all but
> > > we need to generically consider how much memory is used during
> > > parallel vacuum.
> > >
> >
> > Do you mean to say change the docs for a parallel vacuum patch in this
> > regard?  If so, I think we might want to do something for
> > maintenance_work_mem for parallel vacuum as described in one of the
> > emails above [1] and then change the docs accordingly.
> >
>
> Yes agreed. I thought that we can discuss that while waiting for other
> opinion on the memory usage of gin index's pending list cleanup. For
> example one of your suggestions[1] is simple and maybe acceptable but
> I guess that it can deal with only gin indexes but not other index AMs
> that might consume more memory.
>

It is not that currently, other indexes don't use any additional
memory (except for maintainence_work_mem).  For example, Gist index
can use memory for collecting empty leaf pages and internal pages.  I
am not sure if we can do anything for such cases.  The case for Gin
index seems to be clear and it seems to be having the risk of using
much more memory, so why not try to do something for it?  We can
probably write the code such that it can be extended for other index
methods in future if required.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Masahiko Sawada

Дата:

16 октября 2019 г., 15:04:40

On Wed, Oct 16, 2019 at 3:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 16, 2019 at 7:20 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sat, Oct 12, 2019 at 8:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sat, Oct 12, 2019 at 10:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 11, 2019 at 5:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > That's right, but OTOH, if the user specifies gin_pending_list_limit
> > > > > as an option during Create Index with a value greater than GUC
> > > > > gin_pending_list_limit, then also we will face the same problem.  It
> > > > > seems to me that we haven't thought enough on memory usage during Gin
> > > > > pending list cleanup and I don't want to independently make a decision
> > > > > to change it.  So unless the original author/committer or some other
> > > > > people who have worked in this area share their opinion, we can leave
> > > > > it as it is and make a parallel vacuum patch adapt to it.
> > > >
> > > > Yeah I totally agreed.
> > > >
> > > > Apart from the GIN problem can we discuss whether need to change the
> > > > current memory usage policy of parallel utility command described in
> > > > the doc? We cannot control the memory usage in index AM after all but
> > > > we need to generically consider how much memory is used during
> > > > parallel vacuum.
> > > >
> > >
> > > Do you mean to say change the docs for a parallel vacuum patch in this
> > > regard?  If so, I think we might want to do something for
> > > maintenance_work_mem for parallel vacuum as described in one of the
> > > emails above [1] and then change the docs accordingly.
> > >
> >
> > Yes agreed. I thought that we can discuss that while waiting for other
> > opinion on the memory usage of gin index's pending list cleanup. For
> > example one of your suggestions[1] is simple and maybe acceptable but
> > I guess that it can deal with only gin indexes but not other index AMs
> > that might consume more memory.
> >
>
> It is not that currently, other indexes don't use any additional
> memory (except for maintainence_work_mem).  For example, Gist index
> can use memory for collecting empty leaf pages and internal pages.  I
> am not sure if we can do anything for such cases.  The case for Gin
> index seems to be clear and it seems to be having the risk of using
> much more memory, so why not try to do something for it?

Yeah gin indexes is clear now and I agree that we need to do something
for it. But I'm also concerned third party index AMs. Similar to the
problem related to IndexBulkDeleteResult structure that we're
discussing on another thread I thought that we have the same problem
on this.

Regards,

--
Masahiko Sawada

Re: maintenance_work_mem used by Vacuum

От

Greg Stark

Дата:

17 октября 2019 г., 03:37:17

It's a bit unfortunate that we're doing the pending list flush while the vacuum memory is allocated at all. Is there any reason other than the way the callbacks are defined that gin doesn't do the pending list flush before vacuum does the heap scan and before it allocates any memory using maintenance_work_mem?

(I'm guessing doing it after vacuum is finished would have different problems with tuples in the pending queue not getting vacuumed?)

Re: maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

17 октября 2019 г., 08:10:32

On Thu, Oct 17, 2019 at 6:07 AM Greg Stark <stark@mit.edu> wrote:
>
> It's a bit unfortunate that we're doing the pending list flush while the vacuum memory is allocated at all. Is there
anyreason other than the way the callbacks are defined that gin doesn't do the pending list flush before vacuum does
theheap scan and before it allocates any memory using maintenance_work_mem? 
>

I can't think of any other reason.  Can we think of doing it as a
separate phase for indexes?  That can help in containing the memory
usage to a maximum of maintenance_work_mem for Gin indexes, but I am
not sure how useful it is for other index AM's.  Another idea could be
that we do something special (cleanup of pending list) just for Gin
indexes before heap scan in Vacuum.

> (I'm guessing doing it after vacuum is finished would have different problems with tuples in the pending queue not
gettingvacuumed?) 

Yeah, I also think so.

BTW, good point!

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

17 октября 2019 г., 12:13:19

On Wed, Oct 16, 2019 at 5:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Oct 16, 2019 at 3:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > It is not that currently, other indexes don't use any additional
> > memory (except for maintainence_work_mem).  For example, Gist index
> > can use memory for collecting empty leaf pages and internal pages.  I
> > am not sure if we can do anything for such cases.  The case for Gin
> > index seems to be clear and it seems to be having the risk of using
> > much more memory, so why not try to do something for it?
>
> Yeah gin indexes is clear now and I agree that we need to do something
> for it. But I'm also concerned third party index AMs. Similar to the
> problem related to IndexBulkDeleteResult structure that we're
> discussing on another thread I thought that we have the same problem
> on this.
>

I understand your concern, but I am not sure what is a good way to
deal with it.  I think we can do something generic like divide the
maintainence_work_mem equally among workers, but then the indexes that
use maintainence_work_mem will suffer if the number of such indexes is
much less than the indexes that don't use maintainence_work_mem.
Another idea could be each index AM tell whether it uses
maintainence_work_mem or not and based on that we can do the
computation (divide the maintainence_work_mem by the number of such
indexes during parallel vacuum).  Do you have any other ideas for
this?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Tom Lane

Дата:

17 октября 2019 г., 12:40:26

Amit Kapila <amit.kapila16@gmail.com> writes:
> Another idea could be each index AM tell whether it uses
> maintainence_work_mem or not and based on that we can do the
> computation (divide the maintainence_work_mem by the number of such
> indexes during parallel vacuum).

FWIW, that seems like a perfectly reasonable API addition to me.

            regards, tom lane

Re: maintenance_work_mem used by Vacuum

От

Masahiko Sawada

Дата:

17 октября 2019 г., 15:34:41

On Thu, Oct 17, 2019 at 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Oct 16, 2019 at 5:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Oct 16, 2019 at 3:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > It is not that currently, other indexes don't use any additional
> > > memory (except for maintainence_work_mem).  For example, Gist index
> > > can use memory for collecting empty leaf pages and internal pages.  I
> > > am not sure if we can do anything for such cases.  The case for Gin
> > > index seems to be clear and it seems to be having the risk of using
> > > much more memory, so why not try to do something for it?
> >
> > Yeah gin indexes is clear now and I agree that we need to do something
> > for it. But I'm also concerned third party index AMs. Similar to the
> > problem related to IndexBulkDeleteResult structure that we're
> > discussing on another thread I thought that we have the same problem
> > on this.
> >
>
> I understand your concern, but I am not sure what is a good way to
> deal with it.  I think we can do something generic like divide the
> maintainence_work_mem equally among workers, but then the indexes that
> use maintainence_work_mem will suffer if the number of such indexes is
> much less than the indexes that don't use maintainence_work_mem.
> Another idea could be each index AM tell whether it uses
> maintainence_work_mem or not and based on that we can do the
> computation (divide the maintainence_work_mem by the number of such
> indexes during parallel vacuum).  Do you have any other ideas for
> this?
>

I was thinking the similar idea to the latter idea: ask index AM the
amount of memory (that should be part of maintenance_work_mem) it will
consume and then compute the new limit for both heap scan and index
vacuuming based on that.

Regards,

--
Masahiko Sawada

Re: maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

18 октября 2019 г., 05:41:22

On Thu, Oct 17, 2019 at 6:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Oct 17, 2019 at 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Oct 16, 2019 at 5:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Oct 16, 2019 at 3:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > It is not that currently, other indexes don't use any additional
> > > > memory (except for maintainence_work_mem).  For example, Gist index
> > > > can use memory for collecting empty leaf pages and internal pages.  I
> > > > am not sure if we can do anything for such cases.  The case for Gin
> > > > index seems to be clear and it seems to be having the risk of using
> > > > much more memory, so why not try to do something for it?
> > >
> > > Yeah gin indexes is clear now and I agree that we need to do something
> > > for it. But I'm also concerned third party index AMs. Similar to the
> > > problem related to IndexBulkDeleteResult structure that we're
> > > discussing on another thread I thought that we have the same problem
> > > on this.
> > >
> >
> > I understand your concern, but I am not sure what is a good way to
> > deal with it.  I think we can do something generic like divide the
> > maintainence_work_mem equally among workers, but then the indexes that
> > use maintainence_work_mem will suffer if the number of such indexes is
> > much less than the indexes that don't use maintainence_work_mem.
> > Another idea could be each index AM tell whether it uses
> > maintainence_work_mem or not and based on that we can do the
> > computation (divide the maintainence_work_mem by the number of such
> > indexes during parallel vacuum).  Do you have any other ideas for
> > this?
> >
>
> I was thinking the similar idea to the latter idea: ask index AM the
> amount of memory (that should be part of maintenance_work_mem) it will
> consume and then compute the new limit for both heap scan and index
> vacuuming based on that.
>

Oh, that would be tricky as compared to what I am proposing because it
might not be easy for indexAM to tell upfront the amount of memory it
needs.  I think we can keep it simple for now.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Amit Kapila

Дата:

18 октября 2019 г., 05:43:10

On Thu, Oct 17, 2019 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > Another idea could be each index AM tell whether it uses
> > maintainence_work_mem or not and based on that we can do the
> > computation (divide the maintainence_work_mem by the number of such
> > indexes during parallel vacuum).
>
> FWIW, that seems like a perfectly reasonable API addition to me.
>

Thanks,  Sawada-san, if you also think this API makes sense, then we
can try to write a patch and see how it turns out?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: maintenance_work_mem used by Vacuum

От

Masahiko Sawada

Дата:

18 октября 2019 г., 06:27:43

On Fri, Oct 18, 2019, 11:43 Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 17, 2019 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > Another idea could be each index AM tell whether it uses
> > maintainence_work_mem or not and based on that we can do the
> > computation (divide the maintainence_work_mem by the number of such
> > indexes during parallel vacuum).
>
> FWIW, that seems like a perfectly reasonable API addition to me.
>

Thanks, Sawada-san, if you also think this API makes sense, then we
can try to write a patch and see how it turns out?

Yeah agreed. I can write this patch next week and will

share it.

Regards,