Re: Parallel tuplesort (for parallel B-Tree index creation)

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Parallel tuplesort (for parallel B-Tree index creation)
Дата
Msg-id 0939f43f-226b-ee57-4a27-2d1b8e18185c@iki.fi
обсуждение исходный текст
Ответ на Parallel tuplesort (for parallel B-Tree index creation)  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Parallel tuplesort (for parallel B-Tree index creation)  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
I'm reviewing patches 1-3 in this series, i.e. those patches that are 
not directly related to parallelism, but are independent improvements to 
merging.

Let's begin with patch 1:

On 08/02/2016 01:18 AM, Peter Geoghegan wrote:
> Cap the number of tapes used by external sorts
>
> Commit df700e6b set merge order based on available buffer space (the
> number of tapes was as high as possible while still allowing at least 32
> * BLCKSZ buffer space per tape), rejecting Knuth's theoretically
> justified "sweet spot" of 7 tapes (a merge order of 6 -- Knuth's P),
> improving performance when the sort thereby completed in one pass.
> However, it's still true that there are unlikely to be benefits from
> increasing the number of tapes past 7 once the amount of data to be
> sorted significantly exceeds available memory; that commit probably
> mostly just improved matters where it enabled all merging to be done in
> a final on-the-fly merge.
>
> One problem with the merge order logic established by that commit is
> that with large work_mem settings and data volumes, the tapes previously
> wasted as much as 8% of the available memory budget; tens of thousands
> of tapes could be logically allocated for a sort that will only benefit
> from a few dozen.

Yeah, wasting 8% of the memory budget on this seems like a bad idea. If 
I understand correctly, that makes the runs shorter than necessary, 
leading to more runs.

> A new quasi-arbitrary cap of 501 is applied on the number of tapes that
> tuplesort will ever use (i.e.  merge order is capped at 500 inclusive).
> This is a conservative estimate of the number of runs at which doing all
> merging on-the-fly no longer allows greater overlapping of I/O and
> computation.

Hmm. Surely there are cases, so that with > 501 tapes you could do it 
with one merge pass, but now you need two? And that would hurt 
performance, no?

Why do we reserve the buffer space for all the tapes right at the 
beginning? Instead of the single USEMEM(maxTapes * TAPE_BUFFER_OVERHEAD) 
callin inittapes(), couldn't we call USEMEM(TAPE_BUFFER_OVERHEAD) every 
time we start a new run, until we reach maxTapes?

- Heikki




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Declarative partitioning - another take
Следующее
От: Ivan Kartyshov
Дата:
Сообщение: Re: make async slave to wait for lsn to be replayed