Re: Yet another fast GiST build

Поиск
Список
Период
Сортировка
От Darafei "Komяpa" Praliaskouski
Тема Re: Yet another fast GiST build
Дата
Msg-id CAC8Q8t+WqGkgKh3ptarkiaE7unxJR-RFoTNsy=0=4u5F9wo8Bw@mail.gmail.com
обсуждение исходный текст
Ответ на Yet another fast GiST build  (Andrey Borodin <x4mmm@yandex-team.ru>)
Список pgsql-hackers
Hello,

This is very interesting. In my pipeline currently GiST index rebuild is the biggest time consuming step.

I believe introducing optional concept of order in the GiST opclass will be beneficial not only for fast build, but for other tasks later:
 - CLUSTER can order the table using that notion, in parallel way.
 - btree_gist can be even closer to btree by getting the tuples sorted inside page.
 - tree descend on insertion in future can traverse the list in more opportunistic way, calculating penalty for siblings-by-order first.

I believe everywhere the idea of ordering is needed it's provided by giving a btree opclass.

How about giving a link to btree opclass inside a gist opclass?


On Mon, Aug 26, 2019 at 10:59 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Hi!

In many cases GiST index can be build fast using z-order sorting.

I've looked into proof of concept by Nikita Glukhov [0] and it looks very interesting.
So, I've implemented yet another version of B-tree-like GiST build.
It's main use case and benefits can be summarized with small example:

postgres=# create table x as select point (random(),random()) from generate_series(1,3000000,1);
SELECT 3000000
Time: 5061,967 ms (00:05,062)
postgres=# create index ON x using gist (point ) with (fast_build_sort_function=gist_point_sortsupport);
CREATE INDEX
Time: 6140,227 ms (00:06,140)
postgres=# create index ON x using gist (point );
CREATE INDEX
Time: 32061,200 ms (00:32,061)

As you can see, Z-order build is on order of magnitude faster. Select performance is roughly the same. Also, index is significantly smaller.

Nikita's PoC is faster because it uses parallel build, but it intervenes into B-tree code a lot (for reuse). This patchset is GiST-isolated.
My biggest concern is that passing function to relation option seems a bit hacky. You can pass there any function matching sort support signature.
Embedding this function into opclass makes no sense: it does not affect scan anyhow.

In current version, docs and tests are not implemented. I want to discuss overall design. Do we really want yet another GiST build, if it is 3-10 times faster?

Thanks!

Best regards, Andrey Borodin.

[0] https://github.com/postgres/postgres/compare/master...glukhovn:gist_btree_build



--
Darafei Praliaskouski

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: A problem about partitionwise join
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Yet another fast GiST build