Обсуждение: maintenance_work_mem used by Vacuum
As per docs [1] (see maintenance_work_mem), the maximum amount of memory used by the Vacuum command must be no more than maintenance_work_mem. However, during the review/discussion of the "parallel vacuum" patch [2], we observed that it is not true. Basically, if there is a gin index defined on a table, then the vacuum on that table can consume up to 2 * maintenance_work_mem memory space. The vacuum can use maintenance_work_mem memory space to keep track of dead tuples and another maintenance_work_mem memory space to move tuples from pending pages into regular GIN structure (see ginInsertCleanup). The behavior related to Gin index consuming extra maintenance_work_mem memory is introduced by commit e2c79e14d998cd31f860854bc9210b37b457bb01. It is not clear to me if this is acceptable behavior and if so, shouldn't we document it?
--
We wanted to decide how a parallel vacuum should use memory? Can each worker consume maintenance_work_mem to clean up the gin Index or all workers should use no more than maintenance_work_mem? We were thinking of later but before we decide what is the right behavior for parallel vacuum, I thought it is better to once discuss if the current memory usage model is right.
On Sun, Oct 6, 2019 at 6:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > As per docs [1] (see maintenance_work_mem), the maximum amount of memory used by the Vacuum command must be no more thanmaintenance_work_mem. However, during the review/discussion of the "parallel vacuum" patch [2], we observed that itis not true. Basically, if there is a gin index defined on a table, then the vacuum on that table can consume up to 2* maintenance_work_mem memory space. The vacuum can use maintenance_work_mem memory space to keep track of dead tuplesand another maintenance_work_mem memory space to move tuples from pending pages into regular GIN structure (see ginInsertCleanup). The behavior related to Gin index consuming extra maintenance_work_mem memory is introduced by commit e2c79e14d998cd31f860854bc9210b37b457bb01. It is not clear to me if this is acceptable behavior and if so, shouldn'twe document it? I would say that sucks, because it makes it harder to set maintenance_work_mem correctly. Not sure how hard it would be to fix, though. > We wanted to decide how a parallel vacuum should use memory? Can each worker consume maintenance_work_mem to clean upthe gin Index or all workers should use no more than maintenance_work_mem? We were thinking of later but before we decidewhat is the right behavior for parallel vacuum, I thought it is better to once discuss if the current memory usagemodel is right. Well, I had the idea when we were developing parallel query that we should just ignore the problem of work_mem: every node can use X amount of work_mem, and if there are multiple copies of the node in multiple processes, then you probably end up using more memory. I have been informed by Thomas Munro -- in very polite terminology -- that this was a terrible decision which is causing all kinds of problems for users. I haven't actually encountered that situation myself, but I don't doubt that it's an issue. I think it's a lot easier to do better when we're talking about maintenance commands rather than queries. Maintenance operations typically don't have the problem that queries do with an unknown number of nodes using memory; you typically know all of your memory needs up front. So it's easier to budget that out across workers or whatever. It's a little harder in this case, because you could have any number of GIN indexes (1 to infinity) and the amount of memory you can use depends on not only on how many of them there are but, presumably also, the number of those that are going to be vacuumed at the same time. So you might have 8 indexes, 3 workers, and 2 of the indexes are GIN. In that case, you know that you can't have more than 2 GIN indexes being processed at the same time, but it's likely to be only one, and maybe with proper scheduling you could make it sure it's only one. On the other hand, if you dole out the memory assuming it's only 1, what happens if you start that one, then process all 6 of the non-GIN indexes, and that one isn't done yet. I guess you could wait to start cleanup on the other GIN indexes until the previous index cleanup finishes, but that kinda sucks too. So I'm not really sure how to handle this particular case. I think the principle of dividing up the memory rather than just using more is probably a good one, but figuring out exactly how that should work seems tricky. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 7, 2019 at 12:28 PM Robert Haas <robertmhaas@gmail.com> wrote: > I would say that sucks, because it makes it harder to set > maintenance_work_mem correctly. Not sure how hard it would be to fix, > though. ginInsertCleanup() may now be the worst piece of code in the entire tree, so no surprised that it gets this wrong too. 2016's commit e2c79e14d99 ripped out the following comment about the use of maintenance_work_mem by ginInsertCleanup(): @@ -821,13 +847,10 @@ ginInsertCleanup(GinState *ginstate, * Is it time to flush memory to disk? Flush if we are at the end of * the pending list, or if we have a full row and memory is getting * full. - * - * XXX using up maintenance_work_mem here is probably unreasonably - * much, since vacuum might already be using that much. */ ISTM that the use of maintenance_work_mem wasn't given that much thought originally. -- Peter Geoghegan
On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Oct 7, 2019 at 12:28 PM Robert Haas <robertmhaas@gmail.com> wrote: > > I would say that sucks, because it makes it harder to set > > maintenance_work_mem correctly. Not sure how hard it would be to fix, > > though. > > ginInsertCleanup() may now be the worst piece of code in the entire > tree, so no surprised that it gets this wrong too. > > 2016's commit e2c79e14d99 ripped out the following comment about the > use of maintenance_work_mem by ginInsertCleanup(): > > @@ -821,13 +847,10 @@ ginInsertCleanup(GinState *ginstate, > * Is it time to flush memory to disk? Flush if we are at the end of > * the pending list, or if we have a full row and memory is getting > * full. > - * > - * XXX using up maintenance_work_mem here is probably unreasonably > - * much, since vacuum might already be using that much. > */ > > ISTM that the use of maintenance_work_mem wasn't given that much > thought originally. > One idea to something better could be to check, if there is a GIN index on a table, then use 1/4 (25% or whatever) of maintenance_work_mem for GIN indexes and 3/4 (75%) of maintenance_work_mem for collection dead tuples. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 8, 2019 at 12:57 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Sun, Oct 6, 2019 at 6:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > We wanted to decide how a parallel vacuum should use memory? Can each worker consume maintenance_work_mem to clean upthe gin Index or all workers should use no more than maintenance_work_mem? We were thinking of later but before we decidewhat is the right behavior for parallel vacuum, I thought it is better to once discuss if the current memory usagemodel is right. > > Well, I had the idea when we were developing parallel query that we > should just ignore the problem of work_mem: every node can use X > amount of work_mem, and if there are multiple copies of the node in > multiple processes, then you probably end up using more memory. I > have been informed by Thomas Munro -- in very polite terminology -- > that this was a terrible decision which is causing all kinds of > problems for users. I haven't actually encountered that situation > myself, but I don't doubt that it's an issue. > > I think it's a lot easier to do better when we're talking about > maintenance commands rather than queries. Maintenance operations > typically don't have the problem that queries do with an unknown > number of nodes using memory; you typically know all of your memory > needs up front. So it's easier to budget that out across workers or > whatever. It's a little harder in this case, because you could have > any number of GIN indexes (1 to infinity) and the amount of memory you > can use depends on not only on how many of them there are but, > presumably also, the number of those that are going to be vacuumed at > the same time. So you might have 8 indexes, 3 workers, and 2 of the > indexes are GIN. In that case, you know that you can't have more than > 2 GIN indexes being processed at the same time, but it's likely to be > only one, and maybe with proper scheduling you could make it sure it's > only one. On the other hand, if you dole out the memory assuming it's > only 1, what happens if you start that one, then process all 6 of the > non-GIN indexes, and that one isn't done yet. I guess you could wait > to start cleanup on the other GIN indexes until the previous index > cleanup finishes, but that kinda sucks too. So I'm not really sure how > to handle this particular case. I think the principle of dividing up > the memory rather than just using more is probably a good one, but > figuring out exactly how that should work seems tricky. > Yeah and what if we have workers equal to indexes, so doing the clean up of Gin indexes serially (wait for the prior index to finish before starting the clean up of next Gin index) in that case would be bad too. I think we can do something simple like choose minimum among 'number of Gin Indexes', 'number of workers requested for parallel vacuum' and 'number of max_parallel_maintenance_workers' and then divide the maintenance_work_mem by that to get the memory used by each of the Gin indexes. I think it has some caveats like we might not be able to launch the number of workers we decided and in that case we probably could have computed bigger value of work_mem that can be used by Gin indexes. I think whatever we pick here can be good for some cases and not-so-good for others, so why not pick something general and simple. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > > On Mon, Oct 7, 2019 at 12:28 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > I would say that sucks, because it makes it harder to set > > > maintenance_work_mem correctly. Not sure how hard it would be to fix, > > > though. > > > > ginInsertCleanup() may now be the worst piece of code in the entire > > tree, so no surprised that it gets this wrong too. > > > > 2016's commit e2c79e14d99 ripped out the following comment about the > > use of maintenance_work_mem by ginInsertCleanup(): > > > > @@ -821,13 +847,10 @@ ginInsertCleanup(GinState *ginstate, > > * Is it time to flush memory to disk? Flush if we are at the end of > > * the pending list, or if we have a full row and memory is getting > > * full. > > - * > > - * XXX using up maintenance_work_mem here is probably unreasonably > > - * much, since vacuum might already be using that much. > > */ > > > > ISTM that the use of maintenance_work_mem wasn't given that much > > thought originally. > > > > One idea to something better could be to check, if there is a GIN > index on a table, then use 1/4 (25% or whatever) of > maintenance_work_mem for GIN indexes and 3/4 (75%) of > maintenance_work_mem for collection dead tuples. > I felt that it would not be easy for users to tune maintenance_work_mem which controls more than one things. If this is an index AM(GIN) specific issue we might rather want to control the memory limit of pending list cleanup by a separate GUC parameter like gin_pending_list_limit, say gin_pending_list_work_mem. And we can either set the (the memory for GIN pending list cleanup / # of GIN indexes) to the parallel workers. Regards, -- Masahiko Sawada
On Wed, Oct 9, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > > > > On Mon, Oct 7, 2019 at 12:28 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > I would say that sucks, because it makes it harder to set > > > > maintenance_work_mem correctly. Not sure how hard it would be to fix, > > > > though. > > > > > > ginInsertCleanup() may now be the worst piece of code in the entire > > > tree, so no surprised that it gets this wrong too. > > > > > > 2016's commit e2c79e14d99 ripped out the following comment about the > > > use of maintenance_work_mem by ginInsertCleanup(): > > > > > > @@ -821,13 +847,10 @@ ginInsertCleanup(GinState *ginstate, > > > * Is it time to flush memory to disk? Flush if we are at the end of > > > * the pending list, or if we have a full row and memory is getting > > > * full. > > > - * > > > - * XXX using up maintenance_work_mem here is probably unreasonably > > > - * much, since vacuum might already be using that much. > > > */ > > > > > > ISTM that the use of maintenance_work_mem wasn't given that much > > > thought originally. > > > > > > > One idea to something better could be to check, if there is a GIN > > index on a table, then use 1/4 (25% or whatever) of > > maintenance_work_mem for GIN indexes and 3/4 (75%) of > > maintenance_work_mem for collection dead tuples. > > > > I felt that it would not be easy for users to tune > maintenance_work_mem which controls more than one things. If this is > an index AM(GIN) specific issue we might rather want to control the > memory limit of pending list cleanup by a separate GUC parameter like > gin_pending_list_limit, say gin_pending_list_work_mem. And we can > either set the (the memory for GIN pending list cleanup / # of GIN > indexes) to the parallel workers. > IMHO if we do that then we will loose the meaning of having maintenance_work_mem right? Then user can not control that how much memory the autovacuum worker will use. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 9, 2019 at 2:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Oct 9, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > > > > > > ISTM that the use of maintenance_work_mem wasn't given that much > > > > thought originally. > > > > > > > > > > One idea to something better could be to check, if there is a GIN > > > index on a table, then use 1/4 (25% or whatever) of > > > maintenance_work_mem for GIN indexes and 3/4 (75%) of > > > maintenance_work_mem for collection dead tuples. > > > > > > > I felt that it would not be easy for users to tune > > maintenance_work_mem which controls more than one things. If this is > > an index AM(GIN) specific issue we might rather want to control the > > memory limit of pending list cleanup by a separate GUC parameter like > > gin_pending_list_limit, say gin_pending_list_work_mem. Sure, by having another work_mem parameter for the Gin indexes which controls when we need to flush the pending list will make life easier as a programmer. I think if we have a specific parameter for this purpose, then we can even think of using the same for a clean up during insert operation as well. However, I am not sure how easy it would be for users? Basically, now they need to remember another parameter and for which there is no easy way to know what should be the value. I think one has to check gin_metapage_info->n_pending_pages and then based on that they can configure the value for this parameter to get the maximum benefit possible. Can we think of using work_mem for this? Basically, we use work_mem during insert operation, so why not use it during vacuum operation for this purpose? Another idea could be to try to divide the maintenance_work_mem smartly if we know the value of pending_pages for each Gin index, but I think for that we need to either read the metapage of maybe use some sort of stats which can be used by vacuum. We need to somehow divide it based on the amount of memory required for a number of dead tuples in heap and memory required by tuples in the pending list. I am not sure how feasible is this approach. About difficulty for users tuning one or two parameters for vacuum, I think if they can compute what could be the values for Guc's separately, then why can't they add up and set it as one value. Having said that, I am not denying that having a separate parameter gives better control, and for this specific case using separate parameter can allow us to use it both during vacuum and insert operations. > > And we can > > either set the (the memory for GIN pending list cleanup / # of GIN > > indexes) to the parallel workers. > > > IMHO if we do that then we will loose the meaning of having > maintenance_work_mem right? Then user can not control that how much > memory the autovacuum worker will use. > I am not sure how different it is from the current situation? Basically, now it can use up to 2 * maintenance_work_mem memory and if we do what Sawada-San is proposing, then it will be maintenance_work_mem + gin_*_work_mem. Do you have some other alternative idea in mind or you think the current situation is better than anything else we can do in this area? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 9, 2019 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Oct 9, 2019 at 2:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Oct 9, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > > > > > > > > ISTM that the use of maintenance_work_mem wasn't given that much > > > > > thought originally. > > > > > > > > > > > > > One idea to something better could be to check, if there is a GIN > > > > index on a table, then use 1/4 (25% or whatever) of > > > > maintenance_work_mem for GIN indexes and 3/4 (75%) of > > > > maintenance_work_mem for collection dead tuples. > > > > > > > > > > I felt that it would not be easy for users to tune > > > maintenance_work_mem which controls more than one things. If this is > > > an index AM(GIN) specific issue we might rather want to control the > > > memory limit of pending list cleanup by a separate GUC parameter like > > > gin_pending_list_limit, say gin_pending_list_work_mem. > > Sure, by having another work_mem parameter for the Gin indexes which > controls when we need to flush the pending list will make life easier > as a programmer. I think if we have a specific parameter for this > purpose, then we can even think of using the same for a clean up > during insert operation as well. However, I am not sure how easy it > would be for users? Basically, now they need to remember another > parameter and for which there is no easy way to know what should be > the value. I think one has to check > gin_metapage_info->n_pending_pages and then based on that they can > configure the value for this parameter to get the maximum benefit > possible. > > Can we think of using work_mem for this? Basically, we use work_mem > during insert operation, so why not use it during vacuum operation for > this purpose? > > Another idea could be to try to divide the maintenance_work_mem > smartly if we know the value of pending_pages for each Gin index, but > I think for that we need to either read the metapage of maybe use some > sort of stats which can be used by vacuum. We need to somehow divide > it based on the amount of memory required for a number of dead tuples > in heap and memory required by tuples in the pending list. I am not > sure how feasible is this approach. > > About difficulty for users tuning one or two parameters for vacuum, I > think if they can compute what could be the values for Guc's > separately, then why can't they add up and set it as one value. > Having said that, I am not denying that having a separate parameter > gives better control, and for this specific case using separate > parameter can allow us to use it both during vacuum and insert > operations. > > > > And we can > > > either set the (the memory for GIN pending list cleanup / # of GIN > > > indexes) to the parallel workers. > > > > > IMHO if we do that then we will loose the meaning of having > > maintenance_work_mem right? Then user can not control that how much > > memory the autovacuum worker will use. > > > > I am not sure how different it is from the current situation? > Basically, now it can use up to 2 * maintenance_work_mem memory and if > we do what Sawada-San is proposing, then it will be > maintenance_work_mem + gin_*_work_mem. Do you have some other > alternative idea in mind or you think the current situation is better > than anything else we can do in this area? I think the current situation is not good but if we try to cap it to maintenance_work_mem + gin_*_work_mem then also I don't think it will make the situation much better. However, I think the idea you proposed up-thread[1] is better. At least the maintenance_work_mem will be the top limit what the auto vacuum worker can use. [1] https://www.postgresql.org/message-id/CAA4eK1JhY88BXC%3DZK%3D89MALm%2BLyMkMhi6WG6AZfE4%2BKij6mebg%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 9, 2019 at 7:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Oct 9, 2019 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Oct 9, 2019 at 2:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Wed, Oct 9, 2019 at 10:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > > > > > > > > > > ISTM that the use of maintenance_work_mem wasn't given that much > > > > > > thought originally. > > > > > > > > > > > > > > > > One idea to something better could be to check, if there is a GIN > > > > > index on a table, then use 1/4 (25% or whatever) of > > > > > maintenance_work_mem for GIN indexes and 3/4 (75%) of > > > > > maintenance_work_mem for collection dead tuples. > > > > > > > > > > > > > I felt that it would not be easy for users to tune > > > > maintenance_work_mem which controls more than one things. If this is > > > > an index AM(GIN) specific issue we might rather want to control the > > > > memory limit of pending list cleanup by a separate GUC parameter like > > > > gin_pending_list_limit, say gin_pending_list_work_mem. > > > > Sure, by having another work_mem parameter for the Gin indexes which > > controls when we need to flush the pending list will make life easier > > as a programmer. I think if we have a specific parameter for this > > purpose, then we can even think of using the same for a clean up > > during insert operation as well. However, I am not sure how easy it > > would be for users? Basically, now they need to remember another > > parameter and for which there is no easy way to know what should be > > the value. I think one has to check > > gin_metapage_info->n_pending_pages and then based on that they can > > configure the value for this parameter to get the maximum benefit > > possible. > > > > Can we think of using work_mem for this? Basically, we use work_mem > > during insert operation, so why not use it during vacuum operation for > > this purpose? > > > > Another idea could be to try to divide the maintenance_work_mem > > smartly if we know the value of pending_pages for each Gin index, but > > I think for that we need to either read the metapage of maybe use some > > sort of stats which can be used by vacuum. We need to somehow divide > > it based on the amount of memory required for a number of dead tuples > > in heap and memory required by tuples in the pending list. I am not > > sure how feasible is this approach. > > > > About difficulty for users tuning one or two parameters for vacuum, I > > think if they can compute what could be the values for Guc's > > separately, then why can't they add up and set it as one value. > > Having said that, I am not denying that having a separate parameter > > gives better control, and for this specific case using separate > > parameter can allow us to use it both during vacuum and insert > > operations. > > > > > > And we can > > > > either set the (the memory for GIN pending list cleanup / # of GIN > > > > indexes) to the parallel workers. > > > > > > > IMHO if we do that then we will loose the meaning of having > > > maintenance_work_mem right? Then user can not control that how much > > > memory the autovacuum worker will use. > > > > > > > I am not sure how different it is from the current situation? > > Basically, now it can use up to 2 * maintenance_work_mem memory and if > > we do what Sawada-San is proposing, then it will be > > maintenance_work_mem + gin_*_work_mem. Do you have some other > > alternative idea in mind or you think the current situation is better > > than anything else we can do in this area? > > I think the current situation is not good but if we try to cap it to > maintenance_work_mem + gin_*_work_mem then also I don't think it will > make the situation much better. However, I think the idea you > proposed up-thread[1] is better. At least the maintenance_work_mem > will be the top limit what the auto vacuum worker can use. > I'm concerned that there are other index AMs that could consume more memory like GIN. In principle we can vacuum third party index AMs and will be able to even parallel vacuum them. I expect that maintenance_work_mem is the top limit of memory usage of maintenance command but actually it's hard to set the limit to memory usage of bulkdelete and cleanup by the core. So I thought that since GIN is the one of the index AM it can have a new parameter to make its job faster. If we have that parameter it might not make the current situation much better but user will be able to set a lower value to that parameter to not use the memory much while keeping the number of index vacuums. Regards, -- Masahiko Sawada
On Thu, Oct 10, 2019 at 9:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Oct 9, 2019 at 7:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I think the current situation is not good but if we try to cap it to > > maintenance_work_mem + gin_*_work_mem then also I don't think it will > > make the situation much better. However, I think the idea you > > proposed up-thread[1] is better. At least the maintenance_work_mem > > will be the top limit what the auto vacuum worker can use. > > > > I'm concerned that there are other index AMs that could consume more > memory like GIN. In principle we can vacuum third party index AMs and > will be able to even parallel vacuum them. I expect that > maintenance_work_mem is the top limit of memory usage of maintenance > command but actually it's hard to set the limit to memory usage of > bulkdelete and cleanup by the core. So I thought that since GIN is the > one of the index AM it can have a new parameter to make its job > faster. If we have that parameter it might not make the current > situation much better but user will be able to set a lower value to > that parameter to not use the memory much while keeping the number of > index vacuums. > I can understand your concern why dividing maintenance_work_mem for vacuuming heap and cleaning up the index might be tricky especially because of third party indexes, but introducing new Guc isn't free either. I think that should be the last resort and we need buy-in from more people for that. Did you consider using work_mem for this? And even if we want to go with a new guc, maybe it is better to have some generic name like maintenance_index_work_mem or something along those lines so that it can be used for other index cleanups as well if required. Tom, Teodor, do you have any opinion on this matter? This has been introduced by commit: commit ff301d6e690bb5581502ea3d8591a1600fd87acc Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Tue Mar 24 20:17:18 2009 +0000 Implement "fastupdate" support for GIN indexes, in which we try to accumulate multiple index entries in a holding area before adding them to the main index structure. This helps because bulk insert is (usually) significantly faster than retail insert for GIN. .. .. Teodor Sigaev -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 10, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 10, 2019 at 9:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Oct 9, 2019 at 7:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > I think the current situation is not good but if we try to cap it to > > > maintenance_work_mem + gin_*_work_mem then also I don't think it will > > > make the situation much better. However, I think the idea you > > > proposed up-thread[1] is better. At least the maintenance_work_mem > > > will be the top limit what the auto vacuum worker can use. > > > > > > > I'm concerned that there are other index AMs that could consume more > > memory like GIN. In principle we can vacuum third party index AMs and > > will be able to even parallel vacuum them. I expect that > > maintenance_work_mem is the top limit of memory usage of maintenance > > command but actually it's hard to set the limit to memory usage of > > bulkdelete and cleanup by the core. So I thought that since GIN is the > > one of the index AM it can have a new parameter to make its job > > faster. If we have that parameter it might not make the current > > situation much better but user will be able to set a lower value to > > that parameter to not use the memory much while keeping the number of > > index vacuums. > > > > I can understand your concern why dividing maintenance_work_mem for > vacuuming heap and cleaning up the index might be tricky especially > because of third party indexes, but introducing new Guc isn't free > either. I think that should be the last resort and we need buy-in > from more people for that. Did you consider using work_mem for this? Yeah that seems work too. But I wonder if it could be the similar story to gin_pending_list_limit. I mean that previously we used to use work_mem as the maximum size of GIN pending list. But we concluded that it was not appropriate to control both by one GUC so we introduced gin_penidng_list_limit and the storage parameter at commit 263865a4 (originally it's pending_list_cleanup_size but rename to gin_pending_list_limit at commit c291503b1). I feel that that story is quite similar to this issue. Regards, -- Masahiko Sawada
On Thu, Oct 10, 2019 at 2:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Oct 10, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Oct 10, 2019 at 9:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Wed, Oct 9, 2019 at 7:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > I think the current situation is not good but if we try to cap it to > > > > maintenance_work_mem + gin_*_work_mem then also I don't think it will > > > > make the situation much better. However, I think the idea you > > > > proposed up-thread[1] is better. At least the maintenance_work_mem > > > > will be the top limit what the auto vacuum worker can use. > > > > > > > > > > I'm concerned that there are other index AMs that could consume more > > > memory like GIN. In principle we can vacuum third party index AMs and > > > will be able to even parallel vacuum them. I expect that > > > maintenance_work_mem is the top limit of memory usage of maintenance > > > command but actually it's hard to set the limit to memory usage of > > > bulkdelete and cleanup by the core. So I thought that since GIN is the > > > one of the index AM it can have a new parameter to make its job > > > faster. If we have that parameter it might not make the current > > > situation much better but user will be able to set a lower value to > > > that parameter to not use the memory much while keeping the number of > > > index vacuums. > > > > > > > I can understand your concern why dividing maintenance_work_mem for > > vacuuming heap and cleaning up the index might be tricky especially > > because of third party indexes, but introducing new Guc isn't free > > either. I think that should be the last resort and we need buy-in > > from more people for that. Did you consider using work_mem for this? > > Yeah that seems work too. But I wonder if it could be the similar > story to gin_pending_list_limit. I mean that previously we used to use > work_mem as the maximum size of GIN pending list. But we concluded > that it was not appropriate to control both by one GUC so we > introduced gin_penidng_list_limit and the storage parameter at commit > 263865a4 > It seems you want to say about commit id a1b395b6a26ae80cde17fdfd2def8d351872f399. I wonder why they have not changed it to gin_penidng_list_limit (at that time pending_list_cleanup_size) in that commit itself? I think if we want to use gin_pending_list_limit in this context then we can replace both work_mem and maintenance_work_mem with gin_penidng_list_limit. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 10, 2019 at 6:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Oct 10, 2019 at 2:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Oct 10, 2019 at 3:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Oct 10, 2019 at 9:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Wed, Oct 9, 2019 at 7:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > I think the current situation is not good but if we try to cap it to > > > > > maintenance_work_mem + gin_*_work_mem then also I don't think it will > > > > > make the situation much better. However, I think the idea you > > > > > proposed up-thread[1] is better. At least the maintenance_work_mem > > > > > will be the top limit what the auto vacuum worker can use. > > > > > > > > > > > > > I'm concerned that there are other index AMs that could consume more > > > > memory like GIN. In principle we can vacuum third party index AMs and > > > > will be able to even parallel vacuum them. I expect that > > > > maintenance_work_mem is the top limit of memory usage of maintenance > > > > command but actually it's hard to set the limit to memory usage of > > > > bulkdelete and cleanup by the core. So I thought that since GIN is the > > > > one of the index AM it can have a new parameter to make its job > > > > faster. If we have that parameter it might not make the current > > > > situation much better but user will be able to set a lower value to > > > > that parameter to not use the memory much while keeping the number of > > > > index vacuums. > > > > > > > > > > I can understand your concern why dividing maintenance_work_mem for > > > vacuuming heap and cleaning up the index might be tricky especially > > > because of third party indexes, but introducing new Guc isn't free > > > either. I think that should be the last resort and we need buy-in > > > from more people for that. Did you consider using work_mem for this? > > > > Yeah that seems work too. But I wonder if it could be the similar > > story to gin_pending_list_limit. I mean that previously we used to use > > work_mem as the maximum size of GIN pending list. But we concluded > > that it was not appropriate to control both by one GUC so we > > introduced gin_penidng_list_limit and the storage parameter at commit > > 263865a4 > > > > It seems you want to say about commit id > a1b395b6a26ae80cde17fdfd2def8d351872f399. Yeah thanks. > I wonder why they have not > changed it to gin_penidng_list_limit (at that time > pending_list_cleanup_size) in that commit itself? I think if we want > to use gin_pending_list_limit in this context then we can replace both > work_mem and maintenance_work_mem with gin_penidng_list_limit. Hmm as far as I can see the discussion, no one mentioned about maintenance_work_mem. Perhaps we just oversighted? I also didn't know that. I also think we can replace at least the work_mem for cleanup of pending list with gin_pending_list_limit. In the following comment in ginfast.c, /* * Force pending list cleanup when it becomes too long. And, * ginInsertCleanup could take significant amount of time, so we prefer to * call it when it can do all the work in a single collection cycle. In * non-vacuum mode, it shouldn't require maintenance_work_mem, so fire it * while pending list is still small enough to fit into * gin_pending_list_limit. * * ginInsertCleanup() should not be called inside our CRIT_SECTION. */ cleanupSize = GinGetPendingListCleanupSize(index); if (metadata->nPendingPages * GIN_PAGE_FREESIZE > cleanupSize * 1024L) needCleanup = true; ISTM the gin_pending_list_limit in the above comment corresponds to the following code in ginfast.c, /* * We are called from regular insert and if we see concurrent cleanup * just exit in hope that concurrent process will clean up pending * list. */ if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock)) return; workMemory = work_mem; If work_mem is smaller than gin_pending_list_limit the pending list cleanup would behave against the intention of the above comment that prefers to do all the work in a single collection cycle while pending list is still small enough to fit into gin_pending_list_limit. Regards, -- Masahiko Sawada
On Fri, Oct 11, 2019 at 7:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Oct 10, 2019 at 6:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > It seems you want to say about commit id > > a1b395b6a26ae80cde17fdfd2def8d351872f399. > > Yeah thanks. > > > I wonder why they have not > > changed it to gin_penidng_list_limit (at that time > > pending_list_cleanup_size) in that commit itself? I think if we want > > to use gin_pending_list_limit in this context then we can replace both > > work_mem and maintenance_work_mem with gin_penidng_list_limit. > > Hmm as far as I can see the discussion, no one mentioned about > maintenance_work_mem. Perhaps we just oversighted? > It is possible, but we can't be certain until those people confirm the same. > I also didn't know > that. > > I also think we can replace at least the work_mem for cleanup of > pending list with gin_pending_list_limit. In the following comment in > ginfast.c, > Agreed, but that won't solve the original purpose for which we started this thread. > /* > * Force pending list cleanup when it becomes too long. And, > * ginInsertCleanup could take significant amount of time, so we prefer to > * call it when it can do all the work in a single collection cycle. In > * non-vacuum mode, it shouldn't require maintenance_work_mem, so fire it > * while pending list is still small enough to fit into > * gin_pending_list_limit. > * > * ginInsertCleanup() should not be called inside our CRIT_SECTION. > */ > cleanupSize = GinGetPendingListCleanupSize(index); > if (metadata->nPendingPages * GIN_PAGE_FREESIZE > cleanupSize * 1024L) > needCleanup = true; > > ISTM the gin_pending_list_limit in the above comment corresponds to > the following code in ginfast.c, > > /* > * We are called from regular insert and if we see concurrent cleanup > * just exit in hope that concurrent process will clean up pending > * list. > */ > if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock)) > return; > workMemory = work_mem; > > If work_mem is smaller than gin_pending_list_limit the pending list > cleanup would behave against the intention of the above comment that > prefers to do all the work in a single collection cycle while pending > list is still small enough to fit into gin_pending_list_limit. > That's right, but OTOH, if the user specifies gin_pending_list_limit as an option during Create Index with a value greater than GUC gin_pending_list_limit, then also we will face the same problem. It seems to me that we haven't thought enough on memory usage during Gin pending list cleanup and I don't want to independently make a decision to change it. So unless the original author/committer or some other people who have worked in this area share their opinion, we can leave it as it is and make a parallel vacuum patch adapt to it. The suggestion from others is welcome. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 11, 2019 at 5:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Oct 11, 2019 at 7:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Oct 10, 2019 at 6:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > It seems you want to say about commit id > > > a1b395b6a26ae80cde17fdfd2def8d351872f399. > > > > Yeah thanks. > > > > > I wonder why they have not > > > changed it to gin_penidng_list_limit (at that time > > > pending_list_cleanup_size) in that commit itself? I think if we want > > > to use gin_pending_list_limit in this context then we can replace both > > > work_mem and maintenance_work_mem with gin_penidng_list_limit. > > > > Hmm as far as I can see the discussion, no one mentioned about > > maintenance_work_mem. Perhaps we just oversighted? > > > > It is possible, but we can't be certain until those people confirm the same. > > > I also didn't know > > that. > > > > I also think we can replace at least the work_mem for cleanup of > > pending list with gin_pending_list_limit. In the following comment in > > ginfast.c, > > > > Agreed, but that won't solve the original purpose for which we started > this thread. > > > /* > > * Force pending list cleanup when it becomes too long. And, > > * ginInsertCleanup could take significant amount of time, so we prefer to > > * call it when it can do all the work in a single collection cycle. In > > * non-vacuum mode, it shouldn't require maintenance_work_mem, so fire it > > * while pending list is still small enough to fit into > > * gin_pending_list_limit. > > * > > * ginInsertCleanup() should not be called inside our CRIT_SECTION. > > */ > > cleanupSize = GinGetPendingListCleanupSize(index); > > if (metadata->nPendingPages * GIN_PAGE_FREESIZE > cleanupSize * 1024L) > > needCleanup = true; > > > > ISTM the gin_pending_list_limit in the above comment corresponds to > > the following code in ginfast.c, > > > > /* > > * We are called from regular insert and if we see concurrent cleanup > > * just exit in hope that concurrent process will clean up pending > > * list. > > */ > > if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock)) > > return; > > workMemory = work_mem; > > > > If work_mem is smaller than gin_pending_list_limit the pending list > > cleanup would behave against the intention of the above comment that > > prefers to do all the work in a single collection cycle while pending > > list is still small enough to fit into gin_pending_list_limit. > > > > That's right, but OTOH, if the user specifies gin_pending_list_limit > as an option during Create Index with a value greater than GUC > gin_pending_list_limit, then also we will face the same problem. It > seems to me that we haven't thought enough on memory usage during Gin > pending list cleanup and I don't want to independently make a decision > to change it. So unless the original author/committer or some other > people who have worked in this area share their opinion, we can leave > it as it is and make a parallel vacuum patch adapt to it. Yeah I totally agreed. Apart from the GIN problem can we discuss whether need to change the current memory usage policy of parallel utility command described in the doc? We cannot control the memory usage in index AM after all but we need to generically consider how much memory is used during parallel vacuum. Regards, -- Masahiko Sawada
On Sat, Oct 12, 2019 at 10:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Oct 11, 2019 at 5:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > That's right, but OTOH, if the user specifies gin_pending_list_limit > > as an option during Create Index with a value greater than GUC > > gin_pending_list_limit, then also we will face the same problem. It > > seems to me that we haven't thought enough on memory usage during Gin > > pending list cleanup and I don't want to independently make a decision > > to change it. So unless the original author/committer or some other > > people who have worked in this area share their opinion, we can leave > > it as it is and make a parallel vacuum patch adapt to it. > > Yeah I totally agreed. > > Apart from the GIN problem can we discuss whether need to change the > current memory usage policy of parallel utility command described in > the doc? We cannot control the memory usage in index AM after all but > we need to generically consider how much memory is used during > parallel vacuum. > Do you mean to say change the docs for a parallel vacuum patch in this regard? If so, I think we might want to do something for maintenance_work_mem for parallel vacuum as described in one of the emails above [1] and then change the docs accordingly. [1] - https://www.postgresql.org/message-id/CAA4eK1JhpNsTiHj%2BJOy3N8uCGyTBMH8xDhUEtBw8ZeCAPRGp6Q%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Oct 12, 2019 at 8:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Oct 12, 2019 at 10:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Oct 11, 2019 at 5:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > That's right, but OTOH, if the user specifies gin_pending_list_limit > > > as an option during Create Index with a value greater than GUC > > > gin_pending_list_limit, then also we will face the same problem. It > > > seems to me that we haven't thought enough on memory usage during Gin > > > pending list cleanup and I don't want to independently make a decision > > > to change it. So unless the original author/committer or some other > > > people who have worked in this area share their opinion, we can leave > > > it as it is and make a parallel vacuum patch adapt to it. > > > > Yeah I totally agreed. > > > > Apart from the GIN problem can we discuss whether need to change the > > current memory usage policy of parallel utility command described in > > the doc? We cannot control the memory usage in index AM after all but > > we need to generically consider how much memory is used during > > parallel vacuum. > > > > Do you mean to say change the docs for a parallel vacuum patch in this > regard? If so, I think we might want to do something for > maintenance_work_mem for parallel vacuum as described in one of the > emails above [1] and then change the docs accordingly. > Yes agreed. I thought that we can discuss that while waiting for other opinion on the memory usage of gin index's pending list cleanup. For example one of your suggestions[1] is simple and maybe acceptable but I guess that it can deal with only gin indexes but not other index AMs that might consume more memory. [1] https://www.postgresql.org/message-id/CAA4eK1JhpNsTiHj%2BJOy3N8uCGyTBMH8xDhUEtBw8ZeCAPRGp6Q%40mail.gmail.com Regards, -- Masahiko Sawada
On Wed, Oct 16, 2019 at 7:20 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Sat, Oct 12, 2019 at 8:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Oct 12, 2019 at 10:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Fri, Oct 11, 2019 at 5:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > That's right, but OTOH, if the user specifies gin_pending_list_limit > > > > as an option during Create Index with a value greater than GUC > > > > gin_pending_list_limit, then also we will face the same problem. It > > > > seems to me that we haven't thought enough on memory usage during Gin > > > > pending list cleanup and I don't want to independently make a decision > > > > to change it. So unless the original author/committer or some other > > > > people who have worked in this area share their opinion, we can leave > > > > it as it is and make a parallel vacuum patch adapt to it. > > > > > > Yeah I totally agreed. > > > > > > Apart from the GIN problem can we discuss whether need to change the > > > current memory usage policy of parallel utility command described in > > > the doc? We cannot control the memory usage in index AM after all but > > > we need to generically consider how much memory is used during > > > parallel vacuum. > > > > > > > Do you mean to say change the docs for a parallel vacuum patch in this > > regard? If so, I think we might want to do something for > > maintenance_work_mem for parallel vacuum as described in one of the > > emails above [1] and then change the docs accordingly. > > > > Yes agreed. I thought that we can discuss that while waiting for other > opinion on the memory usage of gin index's pending list cleanup. For > example one of your suggestions[1] is simple and maybe acceptable but > I guess that it can deal with only gin indexes but not other index AMs > that might consume more memory. > It is not that currently, other indexes don't use any additional memory (except for maintainence_work_mem). For example, Gist index can use memory for collecting empty leaf pages and internal pages. I am not sure if we can do anything for such cases. The case for Gin index seems to be clear and it seems to be having the risk of using much more memory, so why not try to do something for it? We can probably write the code such that it can be extended for other index methods in future if required. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 16, 2019 at 3:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Oct 16, 2019 at 7:20 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Sat, Oct 12, 2019 at 8:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Oct 12, 2019 at 10:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Fri, Oct 11, 2019 at 5:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > That's right, but OTOH, if the user specifies gin_pending_list_limit > > > > > as an option during Create Index with a value greater than GUC > > > > > gin_pending_list_limit, then also we will face the same problem. It > > > > > seems to me that we haven't thought enough on memory usage during Gin > > > > > pending list cleanup and I don't want to independently make a decision > > > > > to change it. So unless the original author/committer or some other > > > > > people who have worked in this area share their opinion, we can leave > > > > > it as it is and make a parallel vacuum patch adapt to it. > > > > > > > > Yeah I totally agreed. > > > > > > > > Apart from the GIN problem can we discuss whether need to change the > > > > current memory usage policy of parallel utility command described in > > > > the doc? We cannot control the memory usage in index AM after all but > > > > we need to generically consider how much memory is used during > > > > parallel vacuum. > > > > > > > > > > Do you mean to say change the docs for a parallel vacuum patch in this > > > regard? If so, I think we might want to do something for > > > maintenance_work_mem for parallel vacuum as described in one of the > > > emails above [1] and then change the docs accordingly. > > > > > > > Yes agreed. I thought that we can discuss that while waiting for other > > opinion on the memory usage of gin index's pending list cleanup. For > > example one of your suggestions[1] is simple and maybe acceptable but > > I guess that it can deal with only gin indexes but not other index AMs > > that might consume more memory. > > > > It is not that currently, other indexes don't use any additional > memory (except for maintainence_work_mem). For example, Gist index > can use memory for collecting empty leaf pages and internal pages. I > am not sure if we can do anything for such cases. The case for Gin > index seems to be clear and it seems to be having the risk of using > much more memory, so why not try to do something for it? Yeah gin indexes is clear now and I agree that we need to do something for it. But I'm also concerned third party index AMs. Similar to the problem related to IndexBulkDeleteResult structure that we're discussing on another thread I thought that we have the same problem on this. Regards, -- Masahiko Sawada
It's a bit unfortunate that we're doing the pending list flush while the vacuum memory is allocated at all. Is there any reason other than the way the callbacks are defined that gin doesn't do the pending list flush before vacuum does the heap scan and before it allocates any memory using maintenance_work_mem?
(I'm guessing doing it after vacuum is finished would have different problems with tuples in the pending queue not getting vacuumed?)
On Thu, Oct 17, 2019 at 6:07 AM Greg Stark <stark@mit.edu> wrote: > > It's a bit unfortunate that we're doing the pending list flush while the vacuum memory is allocated at all. Is there anyreason other than the way the callbacks are defined that gin doesn't do the pending list flush before vacuum does theheap scan and before it allocates any memory using maintenance_work_mem? > I can't think of any other reason. Can we think of doing it as a separate phase for indexes? That can help in containing the memory usage to a maximum of maintenance_work_mem for Gin indexes, but I am not sure how useful it is for other index AM's. Another idea could be that we do something special (cleanup of pending list) just for Gin indexes before heap scan in Vacuum. > (I'm guessing doing it after vacuum is finished would have different problems with tuples in the pending queue not gettingvacuumed?) Yeah, I also think so. BTW, good point! -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 16, 2019 at 5:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Oct 16, 2019 at 3:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > It is not that currently, other indexes don't use any additional > > memory (except for maintainence_work_mem). For example, Gist index > > can use memory for collecting empty leaf pages and internal pages. I > > am not sure if we can do anything for such cases. The case for Gin > > index seems to be clear and it seems to be having the risk of using > > much more memory, so why not try to do something for it? > > Yeah gin indexes is clear now and I agree that we need to do something > for it. But I'm also concerned third party index AMs. Similar to the > problem related to IndexBulkDeleteResult structure that we're > discussing on another thread I thought that we have the same problem > on this. > I understand your concern, but I am not sure what is a good way to deal with it. I think we can do something generic like divide the maintainence_work_mem equally among workers, but then the indexes that use maintainence_work_mem will suffer if the number of such indexes is much less than the indexes that don't use maintainence_work_mem. Another idea could be each index AM tell whether it uses maintainence_work_mem or not and based on that we can do the computation (divide the maintainence_work_mem by the number of such indexes during parallel vacuum). Do you have any other ideas for this? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > Another idea could be each index AM tell whether it uses > maintainence_work_mem or not and based on that we can do the > computation (divide the maintainence_work_mem by the number of such > indexes during parallel vacuum). FWIW, that seems like a perfectly reasonable API addition to me. regards, tom lane
On Thu, Oct 17, 2019 at 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Oct 16, 2019 at 5:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Oct 16, 2019 at 3:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > It is not that currently, other indexes don't use any additional > > > memory (except for maintainence_work_mem). For example, Gist index > > > can use memory for collecting empty leaf pages and internal pages. I > > > am not sure if we can do anything for such cases. The case for Gin > > > index seems to be clear and it seems to be having the risk of using > > > much more memory, so why not try to do something for it? > > > > Yeah gin indexes is clear now and I agree that we need to do something > > for it. But I'm also concerned third party index AMs. Similar to the > > problem related to IndexBulkDeleteResult structure that we're > > discussing on another thread I thought that we have the same problem > > on this. > > > > I understand your concern, but I am not sure what is a good way to > deal with it. I think we can do something generic like divide the > maintainence_work_mem equally among workers, but then the indexes that > use maintainence_work_mem will suffer if the number of such indexes is > much less than the indexes that don't use maintainence_work_mem. > Another idea could be each index AM tell whether it uses > maintainence_work_mem or not and based on that we can do the > computation (divide the maintainence_work_mem by the number of such > indexes during parallel vacuum). Do you have any other ideas for > this? > I was thinking the similar idea to the latter idea: ask index AM the amount of memory (that should be part of maintenance_work_mem) it will consume and then compute the new limit for both heap scan and index vacuuming based on that. Regards, -- Masahiko Sawada
On Thu, Oct 17, 2019 at 6:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Oct 17, 2019 at 6:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Oct 16, 2019 at 5:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Wed, Oct 16, 2019 at 3:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > It is not that currently, other indexes don't use any additional > > > > memory (except for maintainence_work_mem). For example, Gist index > > > > can use memory for collecting empty leaf pages and internal pages. I > > > > am not sure if we can do anything for such cases. The case for Gin > > > > index seems to be clear and it seems to be having the risk of using > > > > much more memory, so why not try to do something for it? > > > > > > Yeah gin indexes is clear now and I agree that we need to do something > > > for it. But I'm also concerned third party index AMs. Similar to the > > > problem related to IndexBulkDeleteResult structure that we're > > > discussing on another thread I thought that we have the same problem > > > on this. > > > > > > > I understand your concern, but I am not sure what is a good way to > > deal with it. I think we can do something generic like divide the > > maintainence_work_mem equally among workers, but then the indexes that > > use maintainence_work_mem will suffer if the number of such indexes is > > much less than the indexes that don't use maintainence_work_mem. > > Another idea could be each index AM tell whether it uses > > maintainence_work_mem or not and based on that we can do the > > computation (divide the maintainence_work_mem by the number of such > > indexes during parallel vacuum). Do you have any other ideas for > > this? > > > > I was thinking the similar idea to the latter idea: ask index AM the > amount of memory (that should be part of maintenance_work_mem) it will > consume and then compute the new limit for both heap scan and index > vacuuming based on that. > Oh, that would be tricky as compared to what I am proposing because it might not be easy for indexAM to tell upfront the amount of memory it needs. I think we can keep it simple for now. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 17, 2019 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > Another idea could be each index AM tell whether it uses > > maintainence_work_mem or not and based on that we can do the > > computation (divide the maintainence_work_mem by the number of such > > indexes during parallel vacuum). > > FWIW, that seems like a perfectly reasonable API addition to me. > Thanks, Sawada-san, if you also think this API makes sense, then we can try to write a patch and see how it turns out? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 18, 2019, 11:43 Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Oct 17, 2019 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > Another idea could be each index AM tell whether it uses
> > maintainence_work_mem or not and based on that we can do the
> > computation (divide the maintainence_work_mem by the number of such
> > indexes during parallel vacuum).
>
> FWIW, that seems like a perfectly reasonable API addition to me.
>
Thanks, Sawada-san, if you also think this API makes sense, then we
can try to write a patch and see how it turns out?
Yeah agreed. I can write this patch next week and will
share it.
Regards,