Re: Performance and IN clauses

От: Tomas Vondra
Тема: Re: Performance and IN clauses
Дата: ,
Msg-id: 49260A50.9040904@fuzzy.cz
(см: обсуждение, исходный текст)
Ответ на: Re: Performance and IN clauses  (Mark Roberts)
Список: pgsql-performance

Скрыть дерево обсуждения

Performance and IN clauses  ("Kynn Jones", )
 Re: Performance and IN clauses  (Matthew Wakeling, )
  Re: Performance and IN clauses  (, )
   Re: Performance and IN clauses  (Mark Roberts, )
    Re: Performance and IN clauses  (Tomas Vondra, )
    Re: Performance and IN clauses  (Tomas Vondra, )

Mark Roberts napsal(a):
> On Tue, 2008-11-18 at 17:38 +0100,  wrote:
>> I bet there is no 'critical' length - this is just another case of
>> index
>> scan vs. seqscan. The efficiency depends on the size of the table /
>> row,
>> amount of data in the table, variability of the column used in the IN
>> clause, etc.
>>
>> Splitting the query with 1000 items into 10 separate queries, the
>> smaller
>> queries may be faster but the total time consumed may be actually
>> higher.
>> Something like
>>
>> 10 * (time of small query) + (time to combine them) > (time of large
>> query)
>>
>> If the performance of the 'split' solution is actually better than the
>> original query, it just means that the planner does not use index scan
>> when it actually should. That means that either
>>
>> (a) the planner is not smart enough
>> (b) it has not current statistics of the table (run ANALYZE on the
>> table)
>> (c) the statistics are not detailed enough (ALTER TABLE ... SET
>> STATICTICS)
>> (d) the cost variables are not set properly (do not match the hardware
>> -
>> decreate index scan cost / increase seq scan cost)
>>
>> regards
>> Tomas
>
> I know that it's much faster (for us) to run many smaller queries than
> one large query, and I think that it's primarily because of your reason
> a.  Most of our problems come from Pg misunderstanding the results of a
> join and making a bad plan decision.  Batching dramatically reduces the
> liklihood of this.

As I already said - even the smartest planner won't work without correct
input data. Have you tried fixing the points (b), (c) and (d)?

Fixing them might improve the planner performance so that you don't need
the batchning at all.

regards
Tomas


В списке pgsql-performance по дате сообщения:

От: Tomas Vondra
Дата:
Сообщение: Re: Performance and IN clauses
От: "Andrus"
Дата:
Сообщение: Re: Hash join on int takes 8..114 seconds