On Mon, Nov 21, 2016 at 2:15 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Nov 18, 2016 at 6:54 AM, Claudio Freire <klaussfreire@gmail.com> wrote: >> On Thu, Nov 17, 2016 at 6:34 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Thu, Nov 17, 2016 at 1:42 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >>>> Attached is patch 0002 with pgindent applied over it >>>> >>>> I don't think there's any other formatting issue, but feel free to >>>> point a finger to it if I missed any >>> >>> Hmm, I had imagined making all of the segments the same size rather >>> than having the size grow exponentially. The whole point of this is >>> to save memory, and even in the worst case you don't end up with that >>> many segments as long as you pick a reasonable base size (e.g. 64MB). >> >> Wastage is bound by a fraction of the total required RAM, that is, >> it's proportional to the amount of required RAM, not the amount >> allocated. So it should still be fine, and the exponential strategy >> should improve lookup performance considerably. > > I'm concerned that it could use repalloc for large memory area when > vacrelstats->dead_tuples.dead_tuple is bloated. It would be overhead > and slow.
How large?
That array cannot be very large. It contains pointers to exponentially-growing arrays, but the repalloc'd array itself is small: one struct per segment, each segment starts at 128MB and grows exponentially.
In fact, IIRC, it can be proven that such a repalloc strategy has an amortized cost of O(log log n) per tuple. If it repallocd the whole tid array it would be O(log n), but since it handles only pointers to segments of exponentially growing tuples it becomes O(log log n).
Furthermore, n there is still limited to MAX_INT, which means the cost per tuple is bound by O(log log 2^32) = 5. And that's an absolute worst case that's ignoring the 128MB constant factor which is indeed relevant. > What about using semi-fixed memroy space without repalloc; > Allocate the array of ItemPointerData array, and each ItemPointerData > array stores the dead tuple locations. The size of ItemPointerData > array starts with small size (e.g. 16MB or 32MB). After we used an > array up, we then allocate next segment with twice size as previous > segment.
That's what the patch does. > But to prevent over allocating memory, it would be better to > set a limit of allocating size of ItemPointerData array to 512MB or > 1GB.
There already is a limit to 1GB (the maximum amount palloc can allocate)