Re: Proposal: speeding up GIN build with parallel workers

Поиск

Список

Период

Сортировка

От	Constantin S. Pan
Тема	Re: Proposal: speeding up GIN build with parallel workers
Дата	17 января 2016 г. 19:17:04
Msg-id	20160117221635.1e7905d6@thought обсуждение
Ответ на	Re: Proposal: speeding up GIN build with parallel workers (Peter Geoghegan <pg@heroku.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, 15 Jan 2016 15:29:51 -0800
Peter Geoghegan <pg@heroku.com> wrote:

> On Fri, Jan 15, 2016 at 2:38 PM, Constantin S. Pan <kvapen@gmail.com>
> wrote:
> Even without parallelism, wouldn't it be better if GIN indexes were
> built using tuplesort? I know way way less about the gin am than the
> nbtree am, but I imagine that a prominent cost for GIN index builds is
> constructing the main B-Tree (the one that's constructed over key
> values) itself. Couldn't tuplesort.c be adapted to cover this case?
> That would be much faster in general, particularly with the recent
> addition of abbreviated keys, while also leaving a clear path forward
> to performing the build in parallel.

While building GIN we need a quick way to update the posting list of
the same key, this is where rbtree comes to rescue. Using tuplesort will
require a completely different approach to building the index: dump
(key, itempointer) pairs into a tuplesort heap, then sort the heap and
merge the itempointers for the same key values.

Both rbtree and sorting require NlogN operations, and abbreviated keys
will not help here, because GIN is used for the case where there are
lots of repeated keys. The benefit of tuplesort is that it would be
better for huge data that does not fit into memory, but on the other
hand it would need twice as much free disk space for sorting as the
data itself took. Are we ready for such cost?

I think we have to experiment with both approaches, and see how it goes.

What are your thoughts?

Regards,

Constantin S. Pan
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Proposal: speeding up GIN build with parallel workers