Обсуждение: VACUUM memory management

Поиск
Список
Период
Сортировка

VACUUM memory management

От
Ibrar Ahmed
Дата:
Hi,

The memory consumption of VACUUM has some issues and could be improved. Some of its limitations are recorded in the comments of the “vacuumlazy.c” file. The current design of VACUUM memory usage is that it stores the TID in a fixed-size array which is allocated at the start, based upon maintenance_work_mem. There are three problems with that design

 - If the value of maintenance_work_mem is too large then it is a waste of memory for small tables.
 - If the value of maintenance_work_mem is too small or “TIDs” do not fit in the array then multiple scans happen.
 - In cases where maintainess_work_mem is set too large, and we have a bigger value of vacuume_count, then the system can be out-of-memory.

There are two solutions for these problems. The first is to use a list instead of a fixed size array.  The second solution is to allocate the memory in chunks.
The attached WIP patch creates an array of ItemPointers and allocates memory in chunks by dividing the maintenance_work_mem into multiple chunks.  

Comments?
--
Ibrar Ahmed
Вложения

Re: VACUUM memory management

От
Alvaro Herrera
Дата:
On 2019-Dec-09, Ibrar Ahmed wrote:

> Hi,
> 
> The memory consumption of VACUUM has some issues and could be improved.
> Some of its limitations are recorded in the comments of the “vacuumlazy.c”
> file. The current design of VACUUM memory usage is that it stores the TID
> in a fixed-size array which is allocated at the start, based upon
> maintenance_work_mem. There are three problems with that design

Did you see this thread?
https://postgr.es/m/CAGTBQpbDCaR6vv9=scXzuT8fSbckf=a3NgZdWFWZbdVugVht6Q@mail.gmail.com

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: VACUUM memory management

От
Ibrar Ahmed
Дата:


On Mon, Dec 9, 2019 at 11:54 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Dec-09, Ibrar Ahmed wrote:

> Hi,
>
> The memory consumption of VACUUM has some issues and could be improved.
> Some of its limitations are recorded in the comments of the “vacuumlazy.c”
> file. The current design of VACUUM memory usage is that it stores the TID
> in a fixed-size array which is allocated at the start, based upon
> maintenance_work_mem. There are three problems with that design

Did you see this thread?
https://postgr.es/m/CAGTBQpbDCaR6vv9=scXzuT8fSbckf=a3NgZdWFWZbdVugVht6Q@mail.gmail.com

Yes, and somehow did what is explained.

Robert: "What I think we need to do is make some provision to initially
allocate only a small amount of memory and then grow the allocation
later if needed. For example, instead of having
vacrelstats->dead_tuples be declared as ItemPointer, declare it as
ItemPointer * and allocate the array progressively in segments. I'd
actually argue that the segment size should be substantially smaller
than 1 GB, like say 64MB; there are still some people running systems
which are small enough that allocating 1 GB when we may need only 6
bytes can drive the system into OOM."

I change vacrelstats->dead_tuples to ItemPointer * and allocate small memory and added more when needed. What I did new is divide maintenance_work_mem in fixed-size chunks.



 
--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Ibrar Ahmed

Re: VACUUM memory management

От
Greg Stark
Дата:
On Mon, 9 Dec 2019 at 14:03, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
> I'd
> actually argue that the segment size should be substantially smaller
> than 1 GB, like say 64MB; there are still some people running systems
> which are small enough that allocating 1 GB when we may need only 6
> bytes can drive the system into OOM."

I don't even see why you would allocated as much as 64MB. I would
think something around 1MB would be more sensible. So you might need
an array of segment pointers as long as a few thousand pointers, big
deal. We can handle repalloc on 8kB arrays pretty easily.

-- 
greg



Re: VACUUM memory management

От
Robert Haas
Дата:
On Mon, Dec 9, 2019 at 4:37 PM Greg Stark <stark@mit.edu> wrote:
> On Mon, 9 Dec 2019 at 14:03, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
> > I'd
> > actually argue that the segment size should be substantially smaller
> > than 1 GB, like say 64MB; there are still some people running systems
> > which are small enough that allocating 1 GB when we may need only 6
> > bytes can drive the system into OOM."
>
> I don't even see why you would allocated as much as 64MB. I would
> think something around 1MB would be more sensible. So you might need
> an array of segment pointers as long as a few thousand pointers, big
> deal. We can handle repalloc on 8kB arrays pretty easily.

See https://www.postgresql.org/message-id/9bf3fe70-7aac-cbf7-62f7-acdaa4306ccb%40iki.fi

Another consideration is that, if we have parallel VACUUM, this all
needs to be done using DSM or DSA, neither of which is going to do a
fantastic job with lots of 1MB allocations. If you allocate 1MB DSMs,
you'll run out of DSM slots. If you allocate 1MB chunks from DSA,
it'll allocate progressively larger DSMs and give you 1MB chunks from
them. That's probably OK, but you're just wasting whatever memory from
the chunk you don't end up allocating.

I suggested 64MB because I don't think many people these days run out
of memory because VACUUM overshoots its required memory budget by a
few tens of megabytes.  The problem is when it overruns by hundreds of
megabytes, and people would like large maintenance_work_mem settings
where the overrun might be gigabytes.

Perhaps there are contrary arguments, but I don't think the cost of
repalloc() is really the issue here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: VACUUM memory management

От
Robert Haas
Дата:
On Mon, Dec 9, 2019 at 2:02 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>> Did you see this thread?
>> https://postgr.es/m/CAGTBQpbDCaR6vv9=scXzuT8fSbckf=a3NgZdWFWZbdVugVht6Q@mail.gmail.com
>>
> Yes, and somehow did what is explained.

Did you modify Claudio's patch or write a totally new one? In either
case, why did you choose that approach? If you wrote a totally new
one, have you compared your work with Claudio's, to see if he covered
anything you might need to cover? Please explain why your patch is
better/different than his.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: VACUUM memory management

От
Ibrar Ahmed
Дата:


On Wed, Dec 11, 2019 at 7:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Dec 9, 2019 at 2:02 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>> Did you see this thread?
>> https://postgr.es/m/CAGTBQpbDCaR6vv9=scXzuT8fSbckf=a3NgZdWFWZbdVugVht6Q@mail.gmail.com
>>
> Yes, and somehow did what is explained.

Did you modify Claudio's patch or write a totally new one?
 
I wrote completely new patch. I tried multiple techniques like using a list instead of fixed size array which I thought was most suitable here, but leave that because of conflict with Parallel Vacuum. 
 
In either case, why did you choose that approach?
 
This is the simplest technique. I just divided the maintenance_work_mem in chunks and allocate chunks as needed. This technique change minimum code and do what we want to achieve. 
 
If you wrote a totally new one, have you compared your work with Claudio's, to see if he covered
anything you might need to cover?
 
No, this part I missed, I will do that and will share my thoughts.

 
Please explain why your patch is
better/different than his.

 
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Ibrar Ahmed

Re: VACUUM memory management

От
Ibrar Ahmed
Дата:


On Wed, Dec 11, 2019 at 9:29 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:


On Wed, Dec 11, 2019 at 7:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Dec 9, 2019 at 2:02 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>> Did you see this thread?
>> https://postgr.es/m/CAGTBQpbDCaR6vv9=scXzuT8fSbckf=a3NgZdWFWZbdVugVht6Q@mail.gmail.com
>>
> Yes, and somehow did what is explained.

Did you modify Claudio's patch or write a totally new one?
 
I wrote completely new patch. I tried multiple techniques like using a list instead of fixed size array which I thought was most suitable here, but leave that because of conflict with Parallel Vacuum. 
 
In either case, why did you choose that approach?
 
This is the simplest technique. I just divided the maintenance_work_mem in chunks and allocate chunks as needed. This technique change minimum code and do what we want to achieve. 
 
If you wrote a totally new one, have you compared your work with Claudio's, to see if he covered
anything you might need to cover?
 
No, this part I missed, I will do that and will share my thoughts.

I checked the patch, and it does not do anything special which my patch is not doing except one thing. The patch is claiming to increase the limit of 1GB along with that, but I have not touched that. In my case, we are still under the limit of maintaines_work_mem but allocate memory in chunks. In that case, you have the leverage to set a big value of maintaness_work_mem (even if you don't need that)  because it will not allocate all the memory at the start. 

Secondly, the patch refactors the whole area of code which makes this patch larger than expected. The code changes in the patch are almost doubled from my patch. By the way, now I took the test cases from the patch and included that into my patch (Credit Claudio) 

 Please explain why your patch is
better/different than his.

 
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Ibrar Ahmed


--
Ibrar Ahmed
Вложения

RE: VACUUM memory management

От
"k.jamison@fujitsu.com"
Дата:

Hi Ibrar,

 

Are you still working on this patch?

Currently the patch does not apply mainly because of

recent commits for parallel vacuum have updated the files in this patch.

Kindly rebase it and change the status to "Needs Review" after.

 

Upon quick scan of another thread [1] mentioned above,

I believe the people involved had consensus on the same direction

of allocating mem in chunks, and dynamically alloc when

needed. A point for discussion was the size of chunk allocation.

 

After a brief look of your patch, there's a typo between

declaration and definition of lazy_vacuum_page():

arryindex --> arrindex

 

static int           lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,

-                                                                                   int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);

+                                                                                  int arryindex, int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);

 

static int

lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,

-                                               int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer)

+                                              int arrindex, int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer)

 

Unnecessary change:

-           long                  maxtuples;

-           int                                vac_work_mem = IsAutoVacuumWorkerProcess() &&

+          long        maxtuples;

+          int         vac_work_mem = IsAutoVacuumWorkerProcess() &&

 

Other typo:

+ * pg_bsearch() -- bsearch algorithem for two dimention array.

algorithem --> algorithm

dimention --> dimension

 

I might have missed something more,

but I'll continue reviewing after the rebased patch.

 

Regards,

Kirk Jamison

 

[1] https://www.postgresql.org/message-id/flat/CAGTBQpbDCaR6vv9%3DscXzuT8fSbckf%3Da3NgZdWFWZbdVugVht6Q%40mail.gmail.com

Re: VACUUM memory management

От
Ibrar Ahmed
Дата:


On Wed, Jan 22, 2020 at 11:17 AM k.jamison@fujitsu.com <k.jamison@fujitsu.com> wrote:

Hi Ibrar,

 

Are you still working on this patch?

Currently the patch does not apply mainly because of

recent commits for parallel vacuum have updated the files in this patch.

Kindly rebase it and change the status to "Needs Review" after.

 

Upon quick scan of another thread [1] mentioned above,

I believe the people involved had consensus on the same direction

of allocating mem in chunks, and dynamically alloc when

needed. A point for discussion was the size of chunk allocation.

 

After a brief look of your patch, there's a typo between

declaration and definition of lazy_vacuum_page():

arryindex --> arrindex

 

static int           lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,

-                                                                                   int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);

+                                                                                  int arryindex, int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);

 

static int

lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,

-                                               int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer)

+                                              int arrindex, int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer)

 

Unnecessary change:

-           long                  maxtuples;

-           int                                vac_work_mem = IsAutoVacuumWorkerProcess() &&

+          long        maxtuples;

+          int         vac_work_mem = IsAutoVacuumWorkerProcess() &&

 

Other typo:

+ * pg_bsearch() -- bsearch algorithem for two dimention array.

algorithem --> algorithm

dimention --> dimension

 

I might have missed something more,

but I'll continue reviewing after the rebased patch.

 

Regards,

Kirk Jamison

 

[1] https://www.postgresql.org/message-id/flat/CAGTBQpbDCaR6vv9%3DscXzuT8fSbckf%3Da3NgZdWFWZbdVugVht6Q%40mail.gmail.com

Hi,
Yes, I am working on that. I will send the rebased and updated patch.


--
Ibrar Ahmed

Re: VACUUM memory management

От
David Steele
Дата:
On 1/28/20 1:36 PM, Ibrar Ahmed wrote:
> On Wed, Jan 22, 2020 at 11:17 AM k.jamison@fujitsu.com 
>     I might have missed something more,____
> 
>     but I'll continue reviewing after the rebased patch.____
> 
> Yes, I am working on that. I will send the rebased and updated patch.

This patch has not had any updates in months and now we are halfway 
through the CF so I have marked it Returned with Feedback.

If a patch arrives soon I'll be happy to revive the entry, otherwise 
please submit to a future CF when a new patch is available.

Regards,
-- 
-David
david@pgmasters.net



Re: VACUUM memory management

От
Ibrar Ahmed
Дата:


On Mon, Mar 16, 2020 at 6:35 PM David Steele <david@pgmasters.net> wrote:
On 1/28/20 1:36 PM, Ibrar Ahmed wrote:
> On Wed, Jan 22, 2020 at 11:17 AM k.jamison@fujitsu.com
>     I might have missed something more,____
>
>     but I'll continue reviewing after the rebased patch.____
>
> Yes, I am working on that. I will send the rebased and updated patch.

This patch has not had any updates in months and now we are halfway
through the CF so I have marked it Returned with Feedback.

If a patch arrives soon I'll be happy to revive the entry, otherwise
please submit to a future CF when a new patch is available.

Regards,
--
-David
david@pgmasters.net

Here is the latest patch rebased with master (19db23bcbda99e93321cb0636677ec9c6e121a2a)  Fri Apr 3 12:20:42 2020. Patch fix all the issues, after the parallel vacuum patch. The patch works in case of a non-parallel option and allocates memory in chunks.

--
Ibrar Ahmed
Вложения

Re: VACUUM memory management

От
Justin Pryzby
Дата:
On Fri, Apr 03, 2020 at 09:04:34PM +0500, Ibrar Ahmed wrote:
> Here is the latest patch rebased with master
> (19db23bcbda99e93321cb0636677ec9c6e121a2a)  Fri Apr 3 12:20:42 2020. Patch
> fix all the issues, after the parallel vacuum patch. The patch works in
> case of a non-parallel option and allocates memory in chunks.

This patch seems to break vacuuming.  On unpatched, master it scans the index:

|postgres=# DROP TABLE t; CREATE UNLOGGED TABLE t AS SELECT generate_series(1,499999)a; CREATE INDEX ON t(a); SET
maintenance_work_mem='1024kB';UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t \\ UPDATE t SET a=1+a; VACUUM VERBOSE t;
\dt+t \\ UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t \\ UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t
 
|...
|INFO:  vacuuming "public.t"
|INFO:  scanned index "t_a_idx" to remove 174472 row versions
|...
|INFO:  index "t_a_idx" now contains 499999 row versions in 4119 pages
|DETAIL:  499999 index row versions were removed.
|...
|INFO:  "t": found 499999 removable, 499999 nonremovable row versions in 4425 out of 4425 pages

With this patch, if chunks are in use, it doesn't scan the indexes.  Also, the
table is continuously growing, which means the heap vacuum is broken, too:
 public | t    | table | pryzbyj | unlogged    | 35 MB |
 public | t    | table | pryzbyj | unlogged    | 47 MB |
 public | t    | table | pryzbyj | unlogged    | 59 MB |
 public | t    | table | pryzbyj | unlogged    | 73 MB |

If chunks *aren't* in use (note smaller table), it looks like at least the
displayed output is wrong for "row versions":
|template1=# DROP TABLE t; CREATE UNLOGGED TABLE t AS SELECT generate_series(1,199999)a; CREATE INDEX ON t(a); SET
maintenance_work_mem='1024kB';UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t \\ UPDATE t SET a=1+a; VACUUM VERBOSE t;
\dt+t \\ UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t \\ UPDATE t SET a=1+a; VACUUM VERBOSE t; \dt+ t
 
|...
|UPDATE 199999
|INFO:  vacuuming "public.t"
|INFO:  index "t_a_idx" now contains 0 row versions in 1099 pages
|DETAIL:  0 index row versions were removed.

There's some warnings:
vacuumlazy.c:2882:1: warning: ‘lazy_space_dealloc’ defined but not used [-Wunused-function]
vacuumlazy.c:1883:5: warning: ‘tupindex’ may be used uninitialized in this function [-Wmaybe-uninitialized]

When you call lazy_vacuum_all_indexes() and then lazy_vacuum_heap(), you set
vacrelstats->num_chunks = 0; but not vacrelstats->num_chunks = 0;.  It seems to
me, that means that only the *first* iteration of lazy_vacuum_* benefits from
the chunks, and the 2nd and later iterations not only don't benefit, but impose the cost
of an index scan for each chunk, all but the last of which have nothing to do.

The patch is rebased on top of parallel vacuum implementation, but doesn't
allow parallel vacuum to use "chunks", right ?  I think it's important to
handle parallel vacuum somehow, which is the default for manual (but not auto)
vacuum.

I think max_tuples should be a property of LVRelStats rather than LVDeadTuples,
since it doesn't vary by chunk.  Also, there's a weird thing which seems to be
for initializing new chunks, but happens even if you didn't add a chunk, and
then ends up setting the variable to itself:
|int maxtuples = dead_tuples[num_chunks]->max_tuples;
|...
|dead_tuples[num_chunks]->max_tuples = maxtuples;

Maybe num_tuples should be in LVRelStats, too, indicating the number of tuples
for dead_tuples[num_chunks]->itemptrs.  Looks like that's how it was before
parallel vacuum.  I think now those would need to be in LVShared, too.
Then, LVDeadTuples is nothing but a pointer.  Maybe that would simplify
supporting this for parallel vacuum.

Also, num_tuples is a bad name, since it's also a local variable in
lazy_scan_heap():
|        double          num_tuples,             /* total number of nonremovable tuples */

The patch changes to iterate N times over the indexes in lazy_vacuum_index.
That can be pretty expensive.  If there are many dead tuples, this patch starts
with a small memory allocation and dynamically increases, but at the cost of
doing a multiple as much I/O.  It's seems to be a bad tradeoff: if there's 10
chunks, the first index scan will only handle 10% of what's needed.

I wonder if it's possible to make bsearch() handle the list of lists, to allow
doing a single index scan per iteration, rather than num_batches.  I don't
think it accesses the pointers themselves, but rather just calls the callback.
You could make the callback find offset1 and offset2, and compute which chunk
each is in, and the offset within the chunk, and then do the comparison.  Maybe
that's too clever and we should just include our own bsearch().

vacuum only runs in parallel for index vacuum and cleanup (but not heap scan or
heap vacuum).  Right now, dead_tuples[0] is allocated in
begin_parallel_vacuum(), which is called at the beginning of lazy_scan_heap().
I guess it's not possible to dynamically resize that, but is there any reason
you can't destroy it and recreate it as needed during heap scan?  I guess one
reason is that we want to avoid: 1) allocating a new DSM segment of size 2*N,
in addition to the existing one of size N, then copy the original allocation to
the new allocation, then destroy the original.  That means we have max memory
use of 3*N, not just 2*N :(  Maybe overcommit/lazy allocation by the OS means
that's not always true...  One way to do it would if if you gather N dead
tuples, then trigger an index/heap vacuum, then destroy the dead_tuples and
allocate a new one of twice the size (but no need to copy the old one).  That
still incurs the cost of multiple (additional) index scans during the early
iterations when you have a small allocation, which isn't great.

Marking this patch as RWF.

-- 
Justin



Re: VACUUM memory management

От
Justin Pryzby
Дата:
On Wed, Dec 11, 2019 at 09:29:17PM +0500, Ibrar Ahmed wrote:
> > Did you modify Claudio's patch or write a totally new one?
> 
> I wrote completely new patch. I tried multiple techniques like using a list
> instead of fixed size array which I thought was most suitable here, but
> leave that because of conflict with Parallel Vacuum.

Using a list will hardly work, or certainly not well, since it needs to be
searched by the ambulkdelete callback.

> >> If you wrote a totally new one, have you compared your work with
> >> Claudio's, to see if he covered
> >> anything you might need to cover?
> >
> > I checked the patch, and it does not do anything special which my patch is
> not doing except one thing. The patch is claiming to increase the limit of
> 1GB along with that, but I have not touched that. In my case, we are still
> under the limit of maintaines_work_mem but allocate memory in chunks. In
> that case, you have the leverage to set a big value of maintaness_work_mem
> (even if you don't need that)  because it will not allocate all the memory
> at the start.

After spending a bunch of time comparing them, I disagree.  Claudio's patch
does these:

 - avoid using multiple chunks if there's no indexes, therefore no need to
   avoid the high cost of index scans to avoid;
 - rather than doing an index scan for each chunk (bad), the callback function
   lazy_tid_reaped() does a custom binary search *over* chunks of different
   sizes and then *within* each chunk.  That's maybe slighly over-engineered,
   I'm not convinced that's needed (but I thought it was pretty clever), but
   someone thought that was important.
 - properly keep track of *total* number of dead tuples, eg for progress
   reporting, and for prev_dead_count for pages with no dead tuples;
 - lazy_record_dead_tuple() doubles allocation when running out of space for
   dead tuples; some people disagree with that (myself included) but I'm
   including it here since that's what it does.  This still seems nontrivial
   (to me) to adapt to work with parallel query.

-- 
Justin