Re: [PATCH] reduce page overlap of GiST indexes built using sorted method
От | sergei sh. |
---|---|
Тема | Re: [PATCH] reduce page overlap of GiST indexes built using sorted method |
Дата | |
Msg-id | 3aa6ba30-e9d8-10ef-753f-8deea5f196d0@kontur.io обсуждение исходный текст |
Ответ на | Re: [PATCH] reduce page overlap of GiST indexes built using sorted method (Aliaksandr Kalenik <akalenik@kontur.io>) |
Список | pgsql-hackers |
Hi, here are some benchmark results for GiST patch: https://gist.github.com/mngr777/88ae200c9c30ba5656583d92e8d2cf9e Code used for benchmarking: https://github.com/mngr777/pg_index_bm2, see README for the list of test queries. Results show query performance close to indexes built with no sortsupport function, with index creation time reduced approx. by half. On 1/8/22 10:20 PM, Aliaksandr Kalenik wrote: > After further testing, here is v2 where the issue that rightlink can be > set when an index page is already flushed is fixed. > > On Sat, Dec 25, 2021 at 4:35 PM Andrey Borodin <x4mmm@yandex-team.ru > <mailto:x4mmm@yandex-team.ru>> wrote: > > > Hi Aliaksandr! > > Thanks for working on this! > > > Benchmark summary: > > > > create index roads_rdr_idx on roads_rdr using gist (geom); > > > > with sort support before patch / CREATE INDEX 76.709 ms > > > > with sort support after patch / CREATE INDEX 225.238 ms > > > > without sort support / CREATE INDEX 446.071 ms > > > > select count(*) from roads_rdr a, roads_rdr b where a.geom && b.geom; > > > > with sort support before patch / SELECT 5766.526 ms > > > > with sort support after patch / SELECT 2646.554 ms > > > > without sort support / SELECT 2721.718 ms > > > > index size > > > > with sort support before patch / IDXSIZE 2940928 bytes > > > > with sort support after patch / IDXSIZE 4956160 bytes > > > > without sort support / IDXSIZE 5447680 bytes > > The numbers are impressive, newly build index is actually performing > better! > I've conducted some tests over points, not PostGIS geometry. For > points build is 2x slower now, but this is the cost of faster scans. > > Some strong points of this index building technology. > The proposed algorithm is not randomly chosen as anything that > performs better than the original sorting build. We actually > researched every idea we knew from literature and intuition. > Although we consciously limited the search area by existing GiST API. > Stuff like penalty-based choose-subtree and split in equal halves > seriously limit possible solutions. If anyone knows an any better > way to build GiST faster or with better scan performance - please > let us know. > The proposed algorithm contains the current algorithm as a special > case: there is a parameter - the number of buffers accumulated > before calling Split. If this parameter is 1 proposed algorithm will > produce exactly the same output. > > At this stage, we would like to hear some feedback from Postgres and > PostGIS community. What other performance aspects should we test? > > Current patch implementation has some known deficiencies: > 1. We need a GUC instead of the hard-coded buffer of 8 pages. > 2. Is GiST sorting build still deterministic? If not - we should add > a fixed random seed into pageinspect tests. > 3. It would make sense to check the resulting indexes with amcheck > [0], although it's not committed. > 4. We cannot make an exact fillfactor due to the behavior of > picksplit. But can we improve anything here? I think if not - it's > still OK. > 5. GistSortedBuildPageState is no more page state. It's Level state > or something like that. > 6. The patch desperately needs comments. > > > Thanks! > > Best regards, Andrey Borodin. > > [0] > https://www.postgresql.org/message-id/flat/59D0DA6B-1652-4D44-B0EF-A582D5824F83%40yandex-team.ru > <https://www.postgresql.org/message-id/flat/59D0DA6B-1652-4D44-B0EF-A582D5824F83%40yandex-team.ru> >
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Alexander LakhinДата:
Сообщение: Re: Why is src/test/modules/committs/t/002_standby.pl flaky?