Re: [HACKERS] Interesting index/LIKE/join slowness problems

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Interesting index/LIKE/join slowness problems
Дата
Msg-id 9517.932134448@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Interesting index/LIKE/join slowness problems  (Ole Gjerde <gjerde@icebox.org>)
Ответы Re: [HACKERS] Interesting index/LIKE/join slowness problems  (Ole Gjerde <gjerde@icebox.org>)
Список pgsql-hackers
Ole Gjerde <gjerde@icebox.org> writes:
> Ok.. I get that..  But why does LIKE 'AN914' have the same problem?  The %
> doesn't have to be there as long as it's either LIKE or ~*(or ~ etc)
> query.

A pure "where field LIKE constant" doesn't have the problem; it's the
OR that does it.  More specifically it's an OR of ANDs that doesn't work
very well.

By the time the parser gets done with it, your query looks like

select * from mcrl1 wherereference = 'AN914' OR(reference LIKE 'AN914-%' AND reference >= 'AN914-' AND reference <=
'AN914-\377');

(ugly, ain't it?)  Those comparison clauses are what need to be pulled
out and fed to the indexscan mechanism, so that only part of the table
gets scanned, not the whole table.  Indexscan doesn't know anything
about LIKE, but it does grok >= and <=.

Unfortunately the current optimizer doesn't do it right.  I looked into
a very similar bug report from Hiroshi Inoue (see his message of 3/19/99
and my response of 4/3 in the hackers archives), and what I found was
that the cause is a fairly fundamental optimizer design choice.  The
ANDed conditions get split into separate top-level clauses and there's
no easy way to put them back together.  The optimizer ends up passing
only one of them to the indexscan executor.  That's better than nothing,
but on average you still end up scanning half the table rather than
just a small range of it.

> I haven't been able to find a discussion on this topic last few months, I
> found discussion about something similar in March, but that didn't explain
> it very well..  I'll just have to look some more :)

I was referring to the discussion around 4/15/99 about why LIKE needs a
smarter way to generate the upper comparison clause.  That's not
directly your problem, but it is causing the same kind of slowdown for
everyone who does use LOCALE...

>> When you throw in the OR, the indexqual logic basically breaks down
>> completely; I think you end up scanning the entire table.  (This could
>> be made smarter, perhaps, but right now I don't believe the system is
>> able to figure out the union of indexqual conditions.)

I was wrong about that --- the executor *does* handle OR'd indexqual
conditions, basically by performing a new indexscan for each OR'd
condition.  (That's why EXPLAIN is listing the index multiple times.)
The trouble with OR-of-ANDs is entirely the optimizer's fault; the
executor would do them fine if the optimizer would only hand them over
in that form.

> What would be the outlook on fixing the problem and not the symptom? :)

I plan to look into fixing this for 6.6, but don't hold your breath
waiting...
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] final #include cleanup
Следующее
От: Vince Vielhaber
Дата:
Сообщение: RE: Security WAS RE: [HACKERS] Updated TODO list