Re: WIP: SP-GiST, Space-Partitioned GiST

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: WIP: SP-GiST, Space-Partitioned GiST
Дата
Msg-id CAPpHfduCoiSWL7=EQ9XgaQifWVubz8xDiA6feQvix+6PqH+eRw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: SP-GiST, Space-Partitioned GiST  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-hackers
On Thu, Sep 22, 2011 at 2:05 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
Regarding the quadtree, have you compared the performance of that with Alexander's improved split algorithm? I ran some tests using the test harness I still had lying around from the fast GiST index build tests:

       testname         |      time       | accesses | indexsize
-------------------------+-----------------+----------+-----------
 points unordered auto   | 00:03:58.188866 |   378779 | 522 MB
 points ordered auto     | 00:07:14.362355 |   177534 | 670 MB
 points unordered auto   | 00:02:59.130176 |    46561 | 532 MB
 points ordered auto     | 00:04:00.50756  |    45066 | 662 MB
 points unordered spgist | 00:03:05.569259 |    78871 | 394 MB
 points ordered spgist   | 00:01:46.06855  |   422104 | 417 MB
(8 rows)
I assume first two rows to be produced by new linear split algorithm(current) and secound two rows by double sorting split algorithm(my patch).
 
These tests were with a table with 7500000 random points. In the ordered-tests, the table is sorted by x,y coordinates. 'time' is the time used to build the index on it, and 'accesses' is the total number of index blocks hit by a series of 10000 bounding box queries, measured from pg_statio_user_indexes.idx_blks_hit + idx_blks_read.

The first two tests in the list are with a GiST index on unpatched PostgreSQL. The next six tests are with Alexander's double-sorting split patch. The last two tests are with an SP-GiST index.

It looks like the query performance with GiST using the double-sorting split is better than SP-GiST, although the SP-GiST index is somewhat smaller. The ordered case seems pathologically bad, is that some sort of a worst-case scenario for quadtrees?
Comparison of search speed using number of page accesses is quite comprehensive for various GiST indexes. But when we're comparing  SP-GiST vs GiST we should take into accoung that they have different CPU/IO ratio. GiST scans whole page which it accesses. SP-GiST can scan only fraction of page because several nodes can be packed into single page. Thereby it would be interesting to compare also CPU load GiST vs. SP-GiST. Also, there is some hope to reduce number of page accesses in SP-GiST by improving clustering algorithm.

------
With best regards,
Alexander Korotkov. 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: MUHAMMAD ASIF
Дата:
Сообщение: Re: PostgreSQL X/Open Socket / BSD Socket Issue on HP-UX
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Double sorting split patch