Re: GIN improvements part2: fast scan

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: GIN improvements part2: fast scan
Дата
Msg-id CAPpHfdvs-EHiwmTgfK=uJ4W3OjskVHyfkuehtyrKfKY_KmMpVw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GIN improvements part2: fast scan  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: GIN improvements part2: fast scan  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Re: GIN improvements part2: fast scan  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On Thu, Nov 21, 2013 at 12:14 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Wed, Nov 20, 2013 at 3:06 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Fri, Nov 15, 2013 at 11:19 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Fri, Nov 15, 2013 at 12:34 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 14.11.2013 19:26, Alexander Korotkov wrote:
On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:

On 28.06.2013 22:31, Alexander Korotkov wrote:

Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and exact
false values will be passed in the case of current patch consistent; exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.


I'm going to mark this as "returned with feedback". For the next version,
I'd like to see the API changed per above. Also, I'd like us to do
something about the tidbitmap overhead, as a separate patch before this, so
that we can assess the actual benefit of this patch. And a new test case
that demonstrates the I/O benefits.


Revised version of patch is attached.
Changes are so:
1) Patch rebased against packed posting lists, not depends on additional
information now.
2) New API with tri-state logic is introduced.

Thanks! A couple of thoughts after a 5-minute glance:

* documentation

Will provide documented version this week.
 
* How about defining the tri-state consistent function to also return a tri-state? True would mean that the tuple definitely matches, false means the tuple definitely does not match, and Unknown means it might match. Or does return value true with recheck==true have the same effect? If I understood the patch, right, returning Unknown or True wouldn't actually make any difference, but it's conceivable that we might come up with more optimizations in the future that could take advantage of that. For example, for a query like "foo OR (bar AND baz)", you could immediately return any tuples that match foo, and not bother scanning for bar and baz at all.

The meaning of recheck flag when input contains unknown is undefined now. :)
For instance, we could define it in following ways:
1) Like returning Unknown meaning that consistent with true of false instead of input Unknown could return either true or false.
2) Consistent with true of false instead of input Unknown could return recheck. This meaning is probably logical, but I don't see any usage of it.

I'm not against idea of tri-state returning value for consisted, because it's logical continuation of its tri-state input. However, I don't see usage of distinguish True and Unknown in returning value for now :)

In example you give we can return foo immediately, but we have to create full bitmap. So we anyway will have to scan (bar AND baz). We could skip part of trees for bar and baz. But it's possible only when foo contains large amount of sequential TIDS so we can be sure that we didn't miss any TIDs. This seems to be very narrow use-case for me.

Another point is that one day we probably could immediately return tuples in gingettuple. And with LIMIT clause and no sorting we can don't search for other tuples. However, gingettuple was removed because of reasons of concurrency. And my patches for index-based ordering didn't return it in previous manner: it collects all the results and then returns them one-by-one.

I'm trying to make fastscan work with GinFuzzySearchLimit. Then I figure out that I don't understand how GinFuzzySearchLimit works. Why with GinFuzzySearchLimit startScan can return without doing startScanKey? Is it a bug?

Revised version of patch is attached. Changes are so:
1) Support for GinFuzzySearchLimit.
2) Some documentation.
Question about GinFuzzySearchLimit is still relevant.

Attached version is rebased against last version of packed posting lists.

------
With best regards,
Alexander Korotkov.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: extension_control_path
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: plpgsql.consistent_into