Re: Parallel tuplesort (for parallel B-Tree index creation)

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Parallel tuplesort (for parallel B-Tree index creation)
Дата
Msg-id e8f44b63-4745-b855-7772-e8201906a4a1@iki.fi
обсуждение исходный текст
Ответ на Re: Parallel tuplesort (for parallel B-Tree index creation)  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Parallel tuplesort (for parallel B-Tree index creation)  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On 09/07/2016 12:46 AM, Peter Geoghegan wrote:
> On Tue, Sep 6, 2016 at 12:34 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Why do we reserve the buffer space for all the tapes right at the beginning?
>> Instead of the single USEMEM(maxTapes * TAPE_BUFFER_OVERHEAD) callin
>> inittapes(), couldn't we call USEMEM(TAPE_BUFFER_OVERHEAD) every time we
>> start a new run, until we reach maxTapes?
>
> No, because then you have no way to clamp back memory, which is now
> almost all used (we hold off from making LACKMEM() continually true,
> if at all possible, which is almost always the case). You can't really
> continually shrink memtuples to make space for new tapes, which is
> what it would take.

I still don't get it. When building the initial runs, we don't need 
buffer space for maxTapes yet, because we're only writing to a single 
tape at a time. An unused tape shouldn't take much memory. In 
inittapes(), when we have built all the runs, we know how many tapes we 
actually needed, and we can allocate the buffer memory accordingly.

[thinks a bit, looks at logtape.c]. Hmm, I guess that's wrong, because 
of the way this all is implemented. When we're building the initial 
runs, we're only writing to one tape at a time, but logtape.c 
nevertheless holds onto a BLCKSZ'd currentBuffer, plus one buffer for 
each indirect level, for every tape that has been used so far. What if 
we changed LogicalTapeRewind to free those buffers? Flush out the 
indirect buffers to disk, remembering just the physical block number of 
the topmost indirect block in memory, and free currentBuffer. That way, 
a tape that has been used, but isn't being read or written to at the 
moment, would take very little memory, and we wouldn't need to reserve 
space for them in the build-runs phase.

- Heikki




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: patch: function xmltable
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Parallel tuplesort (for parallel B-Tree index creation)