Re: GIN improvements part2: fast scan

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: GIN improvements part2: fast scan
Дата
Msg-id CAPpHfds0sY4qK61jtWZwRe5DryvQJNNHN0seAHVT-ZiOqQoPHw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GIN improvements part2: fast scan  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: GIN improvements part2: fast scan  (Robert Haas <robertmhaas@gmail.com>)
Re: GIN improvements part2: fast scan  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On Wed, Mar 12, 2014 at 8:02 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 02/26/2014 11:25 PM, Alexander Korotkov wrote:
On Thu, Feb 27, 2014 at 1:07 AM, Alexander Korotkov <aekorotkov@gmail.com>wrote:

On Thu, Feb 20, 2014 at 1:48 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:

On 02/09/2014 12:11 PM, Alexander Korotkov wrote:

I've rebased catalog changes with last master. Patch is attached. I've
rerun my test suite with both last master ('committed') and attached
patch ('ternary-consistent').


Thanks!


            method         |       sum
------------------------+------------------
   committed              | 143491.715000001
   fast-scan-11           | 126916.111999999
   fast-scan-light        |       137321.211
   fast-scan-light-heikki | 138168.028000001
   master                 |       446976.288
   ternary-consistent     |       125923.514

I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8
to 4.


Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make
sure we get similar behavior in Tomas' tests that used 6 search terms. But
I always felt that it was too large for real queries, once we have the
catalog changes, that's why I lowered to 4 when committing. If an opclass
benefits greatly from fast scan, it should provide the ternary consistent
function, and not rely on the shim implementation.


  I'm not sure about decision to reserve separate procedure number for
ternary consistent. Probably, it would be better to add ginConfig method.
It would be useful for post 9.4 improvements.


Hmm, it might be useful for an opclass to provide both, a boolean and
ternary consistent function, if the boolean version is significantly more
efficient when all the arguments are TRUE/FALSE. OTOH, you could also do a
quick check through the array to see if there are any MAYBE arguments,
within the consistent function. But I'm inclined to keep the possibility to
provide both versions. As long as we support the boolean version at all,
there's not much difference in terms of the amount of code to support
having them both for the same opclass.

A ginConfig could be useful for many other things, but I don't think it's
worth adding it now.


What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck?
We discussed that earlier, but didn't reach any conclusion. That needs to
be clarified in the docs. One possibility is to document that they're
equivalent. Another is to forbid one of them. Yet another is to assign a
different meaning to each.

I've been thinking that it might be useful to define them so that a MAYBE
result from the tri-consistent function means that it cannot decide if you
have a match or not, because some of the inputs were MAYBE. And
TRUE+recheck means that even if all the MAYBE inputs were passed as TRUE or
FALSE, the result would be the same, TRUE+recheck. The practical difference
would be that if the tri-consistent function returns TRUE+recheck, ginget.c
wouldn't need to bother fetching the other entries, it could just return
the entry with recheck=true immediately. While with MAYBE result, it would
fetch the other entries and call tri-consistent again. ginget.c doesn't
currently use the tri-consistent function that way - it always fetches all
the entries for a potential match before calling tri-consistent, but it
could. I had it do that in some of the patch versions, but Tomas' testing
showed that it was a big loss on some queries, because the consistent
function was called much more often. Still, something like that might be
sensible in the future, so it might be good to distinguish those cases in
the API now. Note that ginarrayproc is already using the return values like
that: in GinContainedStrategy, it always returns TRUE+recheck regardless of
the inputs, but in other cases it uses GIN_MAYBE.


Next revision of patch is attached.

In this version opclass should provide at least one consistent function:
binary or ternary. It's expected to achieve best performance when opclass
provide both of them. However, tests shows opposite :( I've to recheck it
carefully.


However, it's not!
This revision of patch completes my test case in 123330 ms. This is
slightly faster than previous revision.

Great. Committed, I finally got around to it.

Some minor changes: I reworded the docs and also updated the table of support functions in xindex.sgml. I refactored the query in opr_sanity.sql, to make it easier to read, and to check more carefully that the required support functions are present. I also added a runtime check to avoid crashing if neither is present.

A few things we ought to still discuss:

* I just noticed that the dummy trueTriConsistentFn returns GIN_MAYBE, rather than GIN_TRUE. The equivalent boolean version returns 'true' without recheck. Is that a typo, or was there some reason for the discrepancy?

Actually, there is not difference in current implementation, But I implemented it so that trueTriConsistentFn can correctly work with shimBoolConsistentFn. In this case it should return GIN_MAYBE in case when it have no GIN_MAYBE in the input (as analogue of setting recheck flag). So, it could return GIN_TRUE only if it checked that input has GIN_MAYBE. However, checking would be just wasting of cycles. So I end up with just GIN_MAYBE :-)
 
* Come to think of it, I'm not too happy with the name GinLogicValue. It's too vague. It ought to include "ternary" or "tri-value" or something like that. How about renaming it to "GinTernaryValue" or just "GinTernary"? Any better suggestion for the name?
 
Probably we could call this just "TernaryValue" hoping that one day it would be useful outside of gin?

* This patch added a triConsistent function for array and tsvector opclasses. Were you planning to submit a patch to do that for the rest of the opclasses, like pg_trgm? (it's getting awfully late for that...)

Yes. I can try to get it into 9.4 if it's possible. 

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: COPY table FROM STDIN doesn't show count tag
Следующее
От: Robert Haas
Дата:
Сообщение: Re: GIN improvements part2: fast scan