Re: [PATCH] reduce page overlap of GiST indexes built using sorted method

Поиск
Список
Период
Сортировка
От Aliaksandr Kalenik
Тема Re: [PATCH] reduce page overlap of GiST indexes built using sorted method
Дата
Msg-id CAHqSB9hq=r0tmivQGkTE0JRgp1qJSWV+7qGqLXPu5XLfEWXUOg@mail.gmail.com
обсуждение исходный текст
Ответ на [PATCH] reduce page overlap of GiST indexes built using sorted method  (Aliaksandr Kalenik <akalenik@kontur.io>)
Ответы Re: [PATCH] reduce page overlap of GiST indexes built using sorted method  ("sergei sh." <sshoulbakov@kontur.io>)
Re: [PATCH] reduce page overlap of GiST indexes built using sorted method  ("sergei sh." <sshoulbakov@kontur.io>)
Список pgsql-hackers
After further testing, here is v2 where the issue that rightlink can be set when an index page is already flushed is fixed.

On Sat, Dec 25, 2021 at 4:35 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi Aliaksandr!

Thanks for working on this!

> Benchmark summary:
>
> create index roads_rdr_idx on roads_rdr using gist (geom);
>
> with sort support before patch / CREATE INDEX 76.709 ms
>
> with sort support after patch / CREATE INDEX 225.238 ms
>
> without sort support / CREATE INDEX 446.071 ms
>
> select count(*) from roads_rdr a, roads_rdr b where a.geom && b.geom;
>
> with sort support before patch / SELECT 5766.526 ms
>
> with sort support after patch / SELECT 2646.554 ms
>
> without sort support / SELECT 2721.718 ms
>
> index size
>
> with sort support before patch / IDXSIZE 2940928 bytes
>
> with sort support after patch / IDXSIZE 4956160 bytes
>
> without sort support / IDXSIZE 5447680 bytes

The numbers are impressive, newly build index is actually performing better!
I've conducted some tests over points, not PostGIS geometry. For points build is 2x slower now, but this is the cost of faster scans.

Some strong points of this index building technology.
The proposed algorithm is not randomly chosen as anything that performs better than the original sorting build. We actually researched every idea we knew from literature and intuition. Although we consciously limited the search area by existing GiST API.
Stuff like penalty-based choose-subtree and split in equal halves seriously limit possible solutions. If anyone knows an any better way to build GiST faster or with better scan performance - please let us know.
The proposed algorithm contains the current algorithm as a special case: there is a parameter - the number of buffers accumulated before calling Split. If this parameter is 1 proposed algorithm will produce exactly the same output.

At this stage, we would like to hear some feedback from Postgres and PostGIS community. What other performance aspects should we test?

Current patch implementation has some known deficiencies:
1. We need a GUC instead of the hard-coded buffer of 8 pages.
2. Is GiST sorting build still deterministic? If not - we should add a fixed random seed into pageinspect tests.
3. It would make sense to check the resulting indexes with amcheck [0], although it's not committed.
4. We cannot make an exact fillfactor due to the behavior of picksplit. But can we improve anything here? I think if not - it's still OK.
5. GistSortedBuildPageState is no more page state. It's Level state or something like that.
6. The patch desperately needs comments.


Thanks!

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/59D0DA6B-1652-4D44-B0EF-A582D5824F83%40yandex-team.ru
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jake North
Дата:
Сообщение: [feature request] ts_headline should have an option to highlight only full matches of <-> expressions
Следующее
От: Jake North
Дата:
Сообщение: Re: [feature request] ts_headline should have an option to highlight only full matches of <-> expressions