Re: GIN improvements part 3: ordering in index

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: GIN improvements part 3: ordering in index
Дата
Msg-id CAPpHfdtthB9Pooks7m4XTi=mBmTrfhBBYBOBfe4F8Uv1MiHgnA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GIN improvements part 3: ordering in index  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: GIN improvements part 3: ordering in index  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: GIN improvements part 3: ordering in index  (Tomas Vondra <tv@fuzzy.cz>)
Список pgsql-hackers

On Wed, Jun 19, 2013 at 1:21 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Mon, Jun 17, 2013 at 10:27 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 17.06.2013 15:56, Alexander Korotkov wrote:
On Sat, Jun 15, 2013 at 3:02 AM, Alexander Korotkov<aekorotkov@gmail.com>wrote:

This patch introduces new interface method of GIN which takes same
arguments as consistent but returns float8.
float8 gin_ordering(bool check[], StrategyNumber n, Datum query, int32
nkeys, Pointer extra_data[], bool *recheck, Datum queryKeys[], bool
nullFlags[], Datum addInfo[], bool addInfoIsNull[])
This patch implements gingettuple method which can return ordering data
using KNN infrastructure. Also it introduces><  operator for fts which
support ordering in GIN index. Some example:

postgres=# explain analyze select * from dblp_titles2 where tsvector @@
to_tsquery('english', 'statistics') order by tsvector><
to_tsquery('english', 'statistics') limit 10;
                                                                    QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=12.00..48.22 rows=10 width=136) (actual time=6.999..7.120
rows=10 loops=1)
    ->   Index Scan using dblp_titles2_idx on dblp_titles2
  (cost=12.00..43003.03 rows=11868 width=136) (actual time=6.996..7.115
rows=10 loops=1)
          Index Cond: (tsvector @@ '''statist'''::tsquery)
          Order By: (tsvector><  '''statist'''::tsquery)
  Total runtime: 7.556 ms
(5 rows)


Attached version of patch has some refactoring and bug fixes.

Thanks. There are no docs changes and not many comments, that needs to be fixed, but I think I understand how it works:

On the first call to gingettuple, the index is first scanned for all the matches, which are collected in an array in memory. Then, the array is sorted with qsort(), and the matches are returned one by one from the sorted array.
 
Right.

That has some obvious limitations. First of all, you can run out of memory.
 
Yes, it is so. qsort should be replaced with tuplesort.

In attached patch qsort is replaced with tuplesort. As expected, it leads to some performance drawback, but it's not dramatic.
Also, some doc is added for new distance method of GIN.
 
------
With best regards,
Alexander Korotkov.  
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Kirkwood
Дата:
Сообщение: Re: [9.4 CF 1] The Commitfest Slacker List
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Patch for removng unused targets