Re: PoC: Partial sort

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: PoC: Partial sort
Дата
Msg-id CAPpHfdvih8Q04Kd_ezcGoFOFa4mYEgw8Ci5eT3gS41+BVhHa-Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PoC: Partial sort  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On Sat, Dec 14, 2013 at 7:04 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,

> > >  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
> > > time=0.097..0.099 rows=10 loops=1)
> > >    ->  Sort  (cost=69214.06..71714.06 rows=1000000 width=16) (actual
> > > time=0.096..0.097 rows=10 loops=1)
> > >          Sort Key: v1, v2
> > >          Sort Method: top-N heapsort  Memory: 25kB
> > >          ->  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
> > > rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
> > >  Total runtime: 0.125 ms
> > > (6 rows)
> >
> > Is that actually all that beneficial when sorting with a bog standard
> > qsort() since that doesn't generally benefit from data being pre-sorted?
> > I think we might need to switch to a different algorithm to really
> > benefit from mostly pre-sorted input.
> >
>
> In this patch I don't do full sort of dataset. For instance, index returns
> data ordered by first column and we need to order them also by second
> column.

Ah, that makes sense.

> But, I don't think we should expect pre-sorted values of second column
> inside a group.

Yes, if you do it that way, there doesn't seem to any need to assume
that any more than we usually do.

I think you should make the explain output reflect the fact that we're
assuming v1 is presorted and just sorting v2. I'd be happy enough with:
Sort Key: v1, v2
Partial Sort: v2
or even just
"Partial Sort Key: [v1,] v2"
but I am sure others disagree.

Sure, I just didn't change explain output yet. It should look like what you propose. 

> > > *partial-knn-1.patch*

> > Rechecking from the heap means adding a sort node though, which I don't
> > see here? Or am I misunderstanding something?

> KNN-GiST contain RB-tree of scanned items. In this patch item is rechecked
> inside GiST and reinserted into same RB-tree. It appears to be much easier
> implementation for PoC and also looks very efficient. I'm not sure what is
> actually right design for it. This is what I like to discuss.

I don't have enough clue about gist to say wether it's the right design,
but it doesn't look wrong to my eyes. It'd probably be useful to export
the knowledge that we are rechecking and how often that happens to the
outside.
While I didn't really look into the patch, I noticed in passing that you
pass a all_dead variable to heap_hot_search_buffer without using the
result - just pass NULL instead, that performs a bit less work.

Useful notice, thanks.

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: PoC: Partial sort
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: [PATCH] SQL assertions prototype