Re: NOT LIKE much faster than LIKE?

От: Tom Lane
Тема: Re: NOT LIKE much faster than LIKE?
Дата: ,
Msg-id: 6398.1136931685@sss.pgh.pa.us
(см: обсуждение, исходный текст)
Ответ на: Re: NOT LIKE much faster than LIKE?  (Simon Riggs)
Ответы: Re: NOT LIKE much faster than LIKE?  (Simon Riggs)
Re: NOT LIKE much faster than LIKE?  (Simon Riggs)
Список: pgsql-performance

Скрыть дерево обсуждения

NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
 Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
  Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  (Christopher Kings-Lynne, )
    Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
    Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
     Re: NOT LIKE much faster than LIKE?  (Greg Stark, )
    Re: NOT LIKE much faster than LIKE?  (Matteo Beccati, )
     Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
      Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
       Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
        Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
         Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
          Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
           Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
        Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
   Re: NOT LIKE much faster than LIKE?  (Stephan Szabo, )
 Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
  Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
   Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
 Re: NOT LIKE much faster than LIKE?  ("Jim C. Nasby", )
  Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  ("Jim C. Nasby", )
    Re: NOT LIKE much faster than LIKE?  (Tom Lane, )

Simon Riggs <> writes:
> I think its OK to use the MCV, but I have a problem with the current
> heuristics: they only work for randomly generated strings, since the
> selectivity goes down geometrically with length.

We could certainly use a less aggressive curve for that.  You got a
specific proposal?

>> After finishing that work it occurred to me that we could go a step
>> further: if the MCV list accounts for a substantial fraction of the
>> population, we could assume that the MCV list is representative of the
>> whole population, and extrapolate the pattern's selectivity over the MCV
>> list to the whole population instead of using the existing heuristics at
>> all.  In a situation like Andreas' example this would win big, although
>> you can certainly imagine cases where it would lose too.

> I don't think that can be inferred with any confidence, unless a large
> proportion of the MCV list were itself selected. Otherwise it might
> match only a single MCV that just happens to have a high proportion,
> then we assume all others have the same proportion.

Well, of course it can't be inferred "with confidence".  Sometimes
you'll win and sometimes you'll lose.  The question is, is this a
better heuristic than what we use otherwise?  The current estimate
for non-anchored patterns is really pretty crummy, and even with a
less aggressive length-vs-selectivity curve it's not going to be great.

Another possibility is to merge the two estimates somehow.

> I would favour the idea of dynamic sampling using a block sampling
> approach; that was a natural extension of improving ANALYZE also.

One thing at a time please.  Obtaining better statistics is one issue,
but the one at hand here is what to do given particular statistics.

            regards, tom lane


В списке pgsql-performance по дате сообщения:

От: David Lang
Дата:
Сообщение: Re: How to handle a large DB and simultaneous accesses?
От: Tom Lane
Дата:
Сообщение: Re: Left Join Performance vs Inner Join Performance