Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Дата
Msg-id 6f918bd0-2a8c-c8c0-664f-78b25135c033@iki.fi
обсуждение исходный текст
Ответ на Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 12/21/2016 12:53 AM, Robert Haas wrote:
>> That leaves one problem, though: reusing space in the final merge phase. If
>> the tapes being merged belong to different LogicalTapeSets, and create one
>> new tape to hold the result, the new tape cannot easily reuse the space of
>> the input tapes because they are on different tape sets.
>
> If the worker is always completely finished with the tape before the
> leader touches it, couldn't the leader's LogicalTapeSet just "adopt"
> the tape and overwrite it like any other?

Currently, the logical tape code assumes that all tapes in a single 
LogicalTapeSet are allocated from the same BufFile. The logical tape's 
on-disk format contains block numbers, to point to the next/prev block 
of the tape [1], and they're assumed to refer to the same file. That 
allows reusing space efficiently during the merge. After you have read 
the first block from tapes A, B and C, you can immediately reuse those 
three blocks for output tape D.

Now, if you read multiple tapes, from different LogicalTapeSet, hence 
backed by different BufFiles, you cannot reuse the space from those 
different tapes for a single output tape, because the on-disk format 
doesn't allow referring to blocks in other files. You could reuse the 
space of *one* of the input tapes, by placing the output tape in the 
same LogicalTapeSet, but not all of them.

We could enhance that, by using "filename + block number" instead of 
just block number, in the pointers in the logical tapes. Then you could 
spread one logical tape across multiple files. Probably not worth it in 
practice, though.


[1] As the code stands, there are no next/prev pointers, but a tree of 
"indirect" blocks. But I'm planning to change that to simpler next/prev 
pointers, in 
https://www.postgresql.org/message-id/flat/55b3b7ae-8dec-b188-b8eb-e07604052351%40iki.fi

- Heikki




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joel Jacobson
Дата:
Сообщение: Re: [HACKERS] SET NOT NULL [NOT VALID / CONCURRENTLY]?
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: [HACKERS] Replication slot xmin is not reset if HS feedback isturned off while standby is shut down