Re: Avoid full GIN index scan when possible

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Avoid full GIN index scan when possible
Дата
Msg-id CAPpHfdvfK+Zk=MjjC1M_66+ALD+vdR6BG3quoFAZzjL6hNv-NQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Avoid full GIN index scan when possible  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Avoid full GIN index scan when possible  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, Aug 1, 2019 at 9:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Thu, Aug 1, 2019 at 4:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Eyeing this a bit further ... doesn't scanPendingInsert also need
> >> to honor so->forcedRecheck?  Something along the lines of
>
> > I think so.
>
> Yeah, it does --- the updated pg_trgm test attached fails if it doesn't.
>
> Also, I found that Alexander's concern upthread:
>
> >> What would happen when two-columns index have GIN_SEARCH_MODE_DEFAULT
> >> scan on first column and GIN_SEARCH_MODE_ALL on second?  I think even
> >> if triconsistent() for second column returns GIN_TRUE, we still need
> >> to recheck to verify second columns is not NULL.
>
> is entirely on-point.  This patch generates the wrong answer in the
> case I added to gin.sql below.  (The expected output was generated
> with HEAD and seems correct, but with these code changes, we incorrectly
> report the row with NULL as matching.  So I expect the cfbot is going
> to complain about the patch in this state.)
>
> While I've not attempted to fix that here, I wonder whether we shouldn't
> fix it by just forcing forcedRecheck to true in any case where we discard
> an ALL qualifier.  That would get rid of all the ugliness around
> ginFillScanKey, which I'd otherwise really want to refactor to avoid
> this business of adding and then removing a scan key.  It would also
> get rid of the bit about "XXX Need to use ALL mode instead of EVERYTHING
> to skip NULLs if ALL mode has been seen", which aside from being ugly
> seems to be dead wrong for multi-column-index cases.

+1 for setting forcedRecheck in any case we discard ALL qualifier.
ISTM, real life number of cases we can skip recheck here is
negligible.  And it doesn't justify complexity.

> BTW, it's not particularly the fault of this patch, but: what does it
> even mean to specify GIN_SEARCH_MODE_ALL with a nonzero number of keys?

It might mean we would like to see all the results, which don't
contain given key.

> Should we decide to treat that as an error?  It doesn't look to me like
> any of the in-tree opclasses will return such a case, and I'm not at
> all convinced what the GIN scan code would actually do with it, except
> that I doubt it matches the documentation.

I think tsvector_ops behaves so.  See gin_extract_tsquery().

        /*
         * If the query doesn't have any required positive matches (for
         * instance, it's something like '! foo'), we have to do a full index
         * scan.
         */
        if (tsquery_requires_match(item))
            *searchMode = GIN_SEARCH_MODE_DEFAULT;
        else
            *searchMode = GIN_SEARCH_MODE_ALL;

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Avoid full GIN index scan when possible