Re: GiST index performance
От | Yeb Havinga |
---|---|
Тема | Re: GiST index performance |
Дата | |
Msg-id | 4BA3DCA1.2010808@gmail.com обсуждение исходный текст |
Ответ на | Re: GiST index performance (Yeb Havinga <yebhavinga@gmail.com>) |
Ответы |
Re: GiST index performance
(Yeb Havinga <yebhavinga@gmail.com>)
|
Список | pgsql-performance |
Yeb Havinga wrote: >>>>> Matthew Wakeling wrote: >>>>>> A second quite distinct issue is the general performance of GiST >>>>>> indexes >>>>>> which is also mentioned in the old thread linked from Open Items. >>>>>> For >>>>>> that, we have a test case at >>>>>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php >>>>>> for >>>>>> btree_gist indexes. I have a similar example with the bioseg GiST >>>>>> index. I >>>>>> have completely reimplemented the same algorithms in Java for >>>>>> algorithm >>>>>> investigation and instrumentation purposes, and it runs about a >>>>>> hundred >>>>>> times faster than in Postgres. I think this is a problem, and I'm >>>>>> willing >>>>>> to do some investigation to try and solve it. More gist performance.. Some background: I've been involved in developing several datatypes that make use of gist indexes (we just published a paper on it, see http://arxiv.org/abs/1003.3370), that's the main reason I'm very much interested in possible improvements in gist index speed. One of the datatypes was 'very badly' indexable: non leaf pages were getting very general keys, so in the end we could see from the scanning times compared to sequential scans that the whole index was being scanned. One thing I remember was the idea that somehow it would be nice if the dept of the gist tree could be fiddled with: in that case keys of non leaf nodes would be more specific again. In the original Guttman R-tree paper there was mention of a parameter that determined the size of entries in nodes and thereby indirectly the depth of the tree. I missed that in the PostgreSQL gist api. One thing Gist scanning does very often is key comparisons. So another approach is to try to limit those and again this might be possible by increasing the depth / decrease number of entries per page. I just did a test where in gistfitpage the gistpagesize was divided by 5 and something similar in gistnospace. Scantime before adjustment: about 70 seconds. After adjustment: 45 seconds. With gist_stat from the gevel package confirmed that the depth was now 4 (original 3). Drawback: bigger index size because pages are not filled completely anymore. The explain shows (actual time=0.030..0.032) for the inner loop times, which is less then half of the original scan time. Since the gistpagesize is derived from the database blocksize, it might be wise to set the blocksize low for this case, I'm going to play with this a bit more. regards, Yeb Havinga
В списке pgsql-performance по дате отправления: