Re: NOT LIKE much faster than LIKE?

От: Tom Lane
Тема: Re: NOT LIKE much faster than LIKE?
Дата: ,
Msg-id: 1625.1136915395@sss.pgh.pa.us
(см: обсуждение, исходный текст)
Ответ на: Re: NOT LIKE much faster than LIKE?  (Matteo Beccati)
Ответы: Re: NOT LIKE much faster than LIKE?  (Simon Riggs)
Список: pgsql-performance

Скрыть дерево обсуждения

NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
 Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
  Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  (Christopher Kings-Lynne, )
    Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
    Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
     Re: NOT LIKE much faster than LIKE?  (Greg Stark, )
    Re: NOT LIKE much faster than LIKE?  (Matteo Beccati, )
     Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
      Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
       Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
        Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
         Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
          Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
           Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
        Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
   Re: NOT LIKE much faster than LIKE?  (Stephan Szabo, )
 Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
  Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
   Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
 Re: NOT LIKE much faster than LIKE?  ("Jim C. Nasby", )
  Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  ("Jim C. Nasby", )
    Re: NOT LIKE much faster than LIKE?  (Tom Lane, )

Matteo Beccati <> writes:
>> I did just think of something we could improve though.  The pattern
>> selectivity code doesn't make any use of the statistics about "most
>> common values".  For a constant pattern, we could actually apply the
>> pattern test with each common value and derive answers that are exact
>> for the portion of the population represented by the most-common-values
>> list.

> This reminds me what I did in a patch which is currently on hold for the
> next release:

I've applied a patch to make patternsel() compute the exact result for
the MCV list, and use its former heuristics only for the portion of the
column population not included in the MCV list.

After finishing that work it occurred to me that we could go a step
further: if the MCV list accounts for a substantial fraction of the
population, we could assume that the MCV list is representative of the
whole population, and extrapolate the pattern's selectivity over the MCV
list to the whole population instead of using the existing heuristics at
all.  In a situation like Andreas' example this would win big, although
you can certainly imagine cases where it would lose too.

Any thoughts about this?  What would be a reasonable threshold for
"substantial fraction"?  It would probably make sense to have different
thresholds depending on whether the pattern is left-anchored or not,
since the range heuristic only works for left-anchored patterns.

            regards, tom lane


В списке pgsql-performance по дате сообщения:

От: Simon Riggs
Дата:
Сообщение: Re: NOT LIKE much faster than LIKE?
От: Mark Lewis
Дата:
Сообщение: Re: help tuning queries on large database