Re: Tuplesort merge pre-reading

Поиск

Список

Период

Сортировка

От	Heikki Linnakangas
Тема	Re: Tuplesort merge pre-reading
Дата	28 сентября 2016 г. 16:04:56
Msg-id	0c0b80fc-9dea-c031-ce51-2781edefad4d@iki.fi обсуждение
Ответ на	Re: Tuplesort merge pre-reading (Peter Geoghegan <pg@heroku.com>)
Ответы	Re: Tuplesort merge pre-reading
Список	pgsql-hackers

Дерево обсуждения

On 09/28/2016 06:05 PM, Peter Geoghegan wrote:
> On Thu, Sep 15, 2016 at 9:51 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I don't think it makes much difference in practice, because most merge
>> passes use all, or almost all, of the available tapes. BTW, I think the
>> polyphase algorithm prefers to do all the merges that don't use all tapes
>> upfront, so that the last final merge always uses all the tapes. I'm not
>> 100% sure about that, but that's my understanding of the algorithm, and
>> that's what I've seen in my testing.
>
> Not sure that I understand. I agree that each merge pass tends to use
> roughly the same number of tapes, but the distribution of real runs on
> tapes is quite unbalanced in earlier merge passes (due to dummy runs).
> It looks like you're always using batch memory, even for non-final
> merges. Won't that fail to be in balance much of the time because of
> the lopsided distribution of runs? Tapes have an uneven amount of real
> data in earlier merge passes.

How does the distribution of the runs on the tapes matter?

>> +   usedBlocks = 0;
>> +   for (tapenum = 0; tapenum < state->maxTapes; tapenum++)
>> +   {
>> +       int64       numBlocks = blocksPerTape + (tapenum < remainder ? 1 : 0);
>> +
>> +       if (numBlocks > MaxAllocSize / BLCKSZ)
>> +           numBlocks = MaxAllocSize / BLCKSZ;
>> +       LogicalTapeAssignReadBufferSize(state->tapeset, tapenum,
>> +                                       numBlocks * BLCKSZ);
>> +       usedBlocks += numBlocks;
>> +   }
>> +   USEMEM(state, usedBlocks * BLCKSZ);
>
> I'm basically repeating myself here, but: I think it's incorrect that
> LogicalTapeAssignReadBufferSize() is called so indiscriminately (more
> generally, it is questionable that it is called in such a high level
> routine, rather than the start of a specific merge pass -- I said so a
> couple of times already).

You can't release the tape buffer at the end of a pass, because the 
buffer of a tape will already be filled with data from the next run on 
the same tape.

- Heikki

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Tuplesort merge pre-reading