Re: [PATCH] reduce page overlap of GiST indexes built using sorted method

Поиск
Список
Период
Сортировка
От sergei sh.
Тема Re: [PATCH] reduce page overlap of GiST indexes built using sorted method
Дата
Msg-id 3aa6ba30-e9d8-10ef-753f-8deea5f196d0@kontur.io
обсуждение исходный текст
Ответ на Re: [PATCH] reduce page overlap of GiST indexes built using sorted method  (Aliaksandr Kalenik <akalenik@kontur.io>)
Список pgsql-hackers
Hi,

here are some benchmark results for GiST patch: 
https://gist.github.com/mngr777/88ae200c9c30ba5656583d92e8d2cf9e

Code used for benchmarking: https://github.com/mngr777/pg_index_bm2,
see README for the list of test queries.

Results show query performance close to indexes built with no 
sortsupport function, with index creation time reduced approx. by half.

On 1/8/22 10:20 PM, Aliaksandr Kalenik wrote:
> After further testing, here is v2 where the issue that rightlink can be 
> set when an index page is already flushed is fixed.
> 
> On Sat, Dec 25, 2021 at 4:35 PM Andrey Borodin <x4mmm@yandex-team.ru 
> <mailto:x4mmm@yandex-team.ru>> wrote:
> 
> 
>     Hi Aliaksandr!
> 
>     Thanks for working on this!
> 
>      > Benchmark summary:
>      >
>      > create index roads_rdr_idx on roads_rdr using gist (geom);
>      >
>      > with sort support before patch / CREATE INDEX 76.709 ms
>      >
>      > with sort support after patch / CREATE INDEX 225.238 ms
>      >
>      > without sort support / CREATE INDEX 446.071 ms
>      >
>      > select count(*) from roads_rdr a, roads_rdr b where a.geom && b.geom;
>      >
>      > with sort support before patch / SELECT 5766.526 ms
>      >
>      > with sort support after patch / SELECT 2646.554 ms
>      >
>      > without sort support / SELECT 2721.718 ms
>      >
>      > index size
>      >
>      > with sort support before patch / IDXSIZE 2940928 bytes
>      >
>      > with sort support after patch / IDXSIZE 4956160 bytes
>      >
>      > without sort support / IDXSIZE 5447680 bytes
> 
>     The numbers are impressive, newly build index is actually performing
>     better!
>     I've conducted some tests over points, not PostGIS geometry. For
>     points build is 2x slower now, but this is the cost of faster scans.
> 
>     Some strong points of this index building technology.
>     The proposed algorithm is not randomly chosen as anything that
>     performs better than the original sorting build. We actually
>     researched every idea we knew from literature and intuition.
>     Although we consciously limited the search area by existing GiST API.
>     Stuff like penalty-based choose-subtree and split in equal halves
>     seriously limit possible solutions. If anyone knows an any better
>     way to build GiST faster or with better scan performance - please
>     let us know.
>     The proposed algorithm contains the current algorithm as a special
>     case: there is a parameter - the number of buffers accumulated
>     before calling Split. If this parameter is 1 proposed algorithm will
>     produce exactly the same output.
> 
>     At this stage, we would like to hear some feedback from Postgres and
>     PostGIS community. What other performance aspects should we test?
> 
>     Current patch implementation has some known deficiencies:
>     1. We need a GUC instead of the hard-coded buffer of 8 pages.
>     2. Is GiST sorting build still deterministic? If not - we should add
>     a fixed random seed into pageinspect tests.
>     3. It would make sense to check the resulting indexes with amcheck
>     [0], although it's not committed.
>     4. We cannot make an exact fillfactor due to the behavior of
>     picksplit. But can we improve anything here? I think if not - it's
>     still OK.
>     5. GistSortedBuildPageState is no more page state. It's Level state
>     or something like that.
>     6. The patch desperately needs comments.
> 
> 
>     Thanks!
> 
>     Best regards, Andrey Borodin.
> 
>     [0]
>     https://www.postgresql.org/message-id/flat/59D0DA6B-1652-4D44-B0EF-A582D5824F83%40yandex-team.ru
>     <https://www.postgresql.org/message-id/flat/59D0DA6B-1652-4D44-B0EF-A582D5824F83%40yandex-team.ru>
> 



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Lakhin
Дата:
Сообщение: Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Следующее
От: Dmitry Dolgov
Дата:
Сообщение: Re: Multiple Query IDs for a rewritten parse tree