Re: NOT LIKE much faster than LIKE?

От: Andrea Arcangeli
Тема: Re: NOT LIKE much faster than LIKE?
Дата: ,
Msg-id: 20060110022303.GA20168@opteron.random
(см: обсуждение, исходный текст)
Ответ на: Re: NOT LIKE much faster than LIKE?  (Tom Lane)
Ответы: Re: NOT LIKE much faster than LIKE?  (Christopher Kings-Lynne)
Re: NOT LIKE much faster than LIKE?  (Tom Lane)
Re: NOT LIKE much faster than LIKE?  (Stephan Szabo)
Список: pgsql-performance

Скрыть дерево обсуждения

NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
 Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
  Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  (Christopher Kings-Lynne, )
    Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
    Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
     Re: NOT LIKE much faster than LIKE?  (Greg Stark, )
    Re: NOT LIKE much faster than LIKE?  (Matteo Beccati, )
     Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
      Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
       Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
        Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
         Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
          Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
           Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
        Re: NOT LIKE much faster than LIKE?  (Simon Riggs, )
   Re: NOT LIKE much faster than LIKE?  (Stephan Szabo, )
 Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
  Re: NOT LIKE much faster than LIKE?  (Tom Lane, )
   Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
 Re: NOT LIKE much faster than LIKE?  ("Jim C. Nasby", )
  Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  (Andrea Arcangeli, )
   Re: NOT LIKE much faster than LIKE?  ("Jim C. Nasby", )
    Re: NOT LIKE much faster than LIKE?  (Tom Lane, )

On Mon, Jan 09, 2006 at 09:04:48PM -0500, Tom Lane wrote:
> Andrea Arcangeli <> writes:
> > It just makes no sense to me that the planner takes a difference
> > decision based on a "not".
>
> Why in the world would you think that?  In general a NOT will change the
> selectivity of the WHERE condition tremendously.  If the planner weren't
> sensitive to that, *that* would be a bug.  The only case where it's
> irrelevant is if the selectivity of the base condition is exactly 50%,
> which is not a very reasonable default guess for LIKE.

How do you know that "LIKE" will have a selectivity above 50% in the
first place? I think 50% should be the default unless the selectively is
measured at runtime against the data being queried.

If you don't know the data, I think it's a bug that LIKE is assumed to
have a selectivity above 50%. You can't know that, only the author of
the code can know that and that's why I talked about hints. It'd be fine
to give hints like:

    UNLIKELY string LIKE '%% PREEMPT %%'

or:

    LIKELY string NOT LIKE '%% PREEMPT %%'

Then you could assume that very little data will be returned or a lot of
data will be returned.

If you don't get hints NOT LIKE or LIKE should be assumed to have the
same selectivity.

> It sounds to me that the problem is misestimation of the selectivity
> of the LIKE pattern --- the planner is going to think that
> LIKE '%% PREEMPT %%' is fairly selective because of the rather long
> match text, when in reality it's probably not so selective on your
> data.  But we don't keep any statistics that would allow the actual

True, there's a lot of data that matches %% PREEMPT %% (even if less
than the NOT case).

> number of matching rows to be estimated well.  You might want to think
> about changing your data representation so that the pattern-match can be
> replaced by a boolean column, or some such, so that the existing
> statistics code can make a more reasonable estimate how many rows are
> selected.

I see. I can certainly fix it by stopping using LIKE. But IMHO this
remains a bug, since until the statistics about the numberof matching
rows isn't estimated well, you should not make assumptions on LIKE/NOT
LIKE. I think you can change the code in a way that I won't have to
touch anything, and this will lead to fewer surprises in the future IMHO.

Thanks!


В списке pgsql-performance по дате сообщения:

От: Alessandro Baretta
Дата:
Сообщение: Re: 500x speed-down: Wrong statistics!
От: Matteo Beccati
Дата:
Сообщение: Re: NOT LIKE much faster than LIKE?