Re: Planner should use index on a LIKE 'foo%' query

От: Matthew Wakeling
Тема: Re: Planner should use index on a LIKE 'foo%' query
Дата: ,
Msg-id: Pine.LNX.4.64.0806301713590.4085@aragorn.flymine.org
(см: обсуждение, исходный текст)
Ответ на: Re: Planner should use index on a LIKE 'foo%' query  (Moritz Onken)
Ответы: Re: Planner should use index on a LIKE 'foo%' query  (Moritz Onken)
Список: pgsql-performance

Скрыть дерево обсуждения

Planner should use index on a LIKE 'foo%' query  (Moritz Onken, )
 Re: Planner should use index on a LIKE 'foo%' query  ("Steinar H. Gunderson", )
  Re: Planner should use index on a LIKE 'foo%' query  (Moritz Onken, )
   Re: Planner should use index on a LIKE 'foo%' query  (Moritz Onken, )
    Re: Planner should use index on a LIKE 'foo%' query  (Matthew Wakeling, )
     Re: Planner should use index on a LIKE 'foo%' query  (Moritz Onken, )
      Re: Planner should use index on a LIKE 'foo%' query  (Matthew Wakeling, )
       Re: Planner should use index on a LIKE 'foo%' query  (Moritz Onken, )
 Re: Planner should use index on a LIKE 'foo%' query  (Dimitri Fontaine, )
 Re: Planner should use index on a LIKE 'foo%' query  (Moritz Onken, )
  Re: Planner should use index on a LIKE 'foo%' query  (Matthew Wakeling, )
   Re: Planner should use index on a LIKE 'foo%' query  (Moritz Onken, )

On Mon, 30 Jun 2008, Moritz Onken wrote:
>>> select count(1) from result where url in (select shorturl from item
>>> where shorturl = result.url);
>>
>> I really don't see what your query tries to accomplish. Why would you want
>> "url IN (... where .. = url)"? Wouldn't you want a different qualifier
>> somehow?
>
> well, it counts the number of rows with urls which already exist in another
> table.
> How would you describe the query?
> If the "(select shorturl from item where shorturl = result.url)"
> clause is empty the row is not counted, that's what I want...

The thing here is that you are effectively causing Postgres to run a
sub-select for each row of the "result" table, each time generating either
an empty list or a list with one or more identical URLs. This is
effectively forcing a nested loop. In a way, you have two constraints
where you only need one.

You can safely take out the constraint in the subquery, so it is like
this:

SELECT COUNT(*) FROM result WHERE url IN (SELECT shorturl FROM item);

This will generate equivalent results, because those rows that didn't
match the constraint wouldn't have affected the IN anyway. However, it
will alter the performance, because the subquery will contain more
results, but it will only be run once, rather than multiple times. This is
effectively forcing a hash join (kind of).

Whereas if you rewrite the query as I demonstrated earlier, then you allow
Postgres to make its own choice about which join algorithm will work best.

Matthew

--
Anyone who goes to a psychiatrist ought to have his head examined.


В списке pgsql-performance по дате сообщения:

От: Mark Roberts
Дата:
Сообщение: Re: Does max size of varchar influence index size
От: John Beaver
Дата:
Сообщение: Re: sequence scan problem