Re: Vacuum: allow usage of more than 1GB of work mem

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: Vacuum: allow usage of more than 1GB of work mem
Дата
Msg-id CABOikdPvC=F_ui8TAwQRrDm=B7Ono+qOTN+o3Snen6JeG_DNhA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Vacuum: allow usage of more than 1GB of work mem  (Claudio Freire <klaussfreire@gmail.com>)
Ответы Re: Vacuum: allow usage of more than 1GB of work mem  (Claudio Freire <klaussfreire@gmail.com>)
Re: Vacuum: allow usage of more than 1GB of work mem  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers


On Wed, Sep 7, 2016 at 10:18 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Wed, Sep 7, 2016 at 12:12 PM, Greg Stark <stark@mit.edu> wrote:
> On Wed, Sep 7, 2016 at 1:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 6 September 2016 at 19:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>>> The idea of looking to the stats to *guess* about how many tuples are
>>> removable doesn't seem bad at all.  But imagining that that's going to be
>>> exact is folly of the first magnitude.
>>
>> Yes.  Bear in mind I had already referred to allowing +10% to be safe,
>> so I think we agree that a reasonably accurate, yet imprecise
>> calculation is possible in most cases.
>
> That would all be well and good if it weren't trivial to do what
> Robert suggested. This is just a large unsorted list that we need to
> iterate throught. Just allocate chunks of a few megabytes and when
> it's full allocate a new chunk and keep going. There's no need to get
> tricky with estimates and resizing and whatever.

I agree. While the idea of estimating the right size sounds promising
a priori, considering the estimate can go wrong and over or
underallocate quite severely, the risks outweigh the benefits when you
consider the alternative of a dynamic allocation strategy.

Unless the dynamic strategy has a bigger CPU impact than expected, I
believe it's a superior approach.


How about a completely different representation for the TID array? Now this is probably not something new, but I couldn't find if the exact same idea was discussed before. I also think it's somewhat orthogonal to what we are trying to do here, and will probably be a bigger change. But I thought I'll mention since we are at the topic.

What I propose is to use a simple bitmap to represent the tuples. If a tuple at <block, offset> is dead then the corresponding bit in the bitmap is set. So clearly the searches through dead tuples are O(1) operations, important for very large tables and large arrays.

Challenge really is that a heap page can theoretically have MaxOffsetNumber of line pointers (or to be precise maximum possible offset number). For a 8K block, that comes be about 2048. Having so many bits per page is neither practical nor optimal. But in practice the largest offset on a heap page should not be significantly greater than MaxHeapTuplesPerPage, which is a more reasonable value of 291 on my machine. Again, that's with zero sized tuple and for real life large tables, with much wider tuples, the number may go down even further.

So we cap the offsets represented in the bitmap to some realistic value, computed by looking at page density and then multiplying it by a small factor (not more than two) to take into account LP_DEAD and LP_REDIRECT line pointers. That should practically represent majority of the dead tuples in the table, but we then keep an overflow area to record tuples beyond the limit set for per page. The search routine will do a direct lookup for offsets less than the limit and search in the sorted overflow area for offsets beyond the limit.

For example, for a table with 60 bytes wide tuple (including 24 byte header), each page can approximately have 8192/60 = 136 tuples. Say we provision for 136*2 = 272 bits per page i.e. 34 bytes per page for the bitmap. First 272 offsets in every page are represented in the bitmap and anything greater than are in overflow region. On the other hand, the current representation will need about 16 bytes per page assuming 2% dead tuples, 40 bytes per page assuming 5% dead tuples and 80 bytes assuming 10% dead tuples. So bitmap will take more space for small tuples or when vacuum is run very aggressively, both seems unlikely for very large tables. Of course the calculation does not take into account the space needed by the overflow area, but I expect that too be small.

I guess we can make a choice between two representations at the start looking at various table stats. We can also be smart and change from bitmap to traditional representation as we scan the table and see many more tuples in the overflow region than we provisioned for. There will be some challenges in converting representation mid-way, especially in terms of memory allocation, but I think those can be sorted out if we think that the idea has merit.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Petr Jelinek
Дата:
Сообщение: Re: Logical Replication WIP
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Optimization for lazy_scan_heap