Обсуждение: performance problem with LIMIT (order BY in DESC order). Wrong index used?

Поиск
Список
Период
Сортировка

performance problem with LIMIT (order BY in DESC order). Wrong index used?

От
Dieter Rehbein
Дата:
Hi everybody,

I have a performance-problem with a query using a LIMIT. There are other threads rergading performance issues with
LIMIT,but I didn't find useful hints for our problem and it might 
be interesting for other postgres-users.


There are only 2 simple tables:

CREATE TABLE newsfeed
(
id             varchar(32) PRIMARY KEY,
version        int4        NOT NULL,
newsfeed_type  varchar(20) NOT NULL,
new_item_count int4        NOT NULL
);
CREATE  INDEX IDX_NEWSFEED_TYPE ON newsfeed (newsfeed_type);


CREATE TABLE newsfeed_item
(
id            varchar(32)   PRIMARY NOT NULL,
item_type     varchar(35)   NOT NULL,
version       int4          NOT NULL,
category      varchar(25)   NULL,
data1         bytea         NULL,
data2         bytea         NULL,
date_time     timestamp     NOT NULL,
guid1         varchar(32)   NULL,
guid2         varchar(32)   NULL,
guid3         varchar(32)   NULL,
id1           int8          NULL,
id2           int8          NULL,
long_value1   int8          NULL,
long_value2   int8          NULL,
long_value3   int8          NULL,
string_value1 varchar(4000) NULL,
string_value2 varchar(500)  NULL,
string_value3 varchar(500)  NULL,
string_value4 varchar(500)  NULL,
string_value5 varchar(500)  NULL,
string_value6 varchar(500)  NULL,
newsfeed      varchar(32)   NOT NULL
);
CREATE UNIQUE INDEX newsfeed_item_pkey ON newsfeed_item (id);
CREATE  INDEX idx_nfi_guid1 ON newsfeed_item (guid1);
CREATE  INDEX idx_nfi_guid2 ON newsfeed_item (guid2);
CREATE  INDEX idx_nfi_guid3 ON newsfeed_item (guid3);
CREATE  INDEX idx_nfi_id1 ON newsfeed_item (id1);
CREATE  INDEX idx_nfi_id2 ON newsfeed_item (id2);
CREATE  INDEX idx_nfi_newsfeed ON newsfeed_item (newsfeed);
CREATE  INDEX idx_nfi_type ON newsfeed_item (item_type);
CREATE  INDEX idx_nfi_datetime ON newsfeed_item (date_time);

newsfeed contains 457036 rows
newsweed_item contains 5169727 rows

postgres version: 9.0.2
OS: CentOS release 5.5 (Final)


The following query took 4.2 seconds:

-------------------------
select *
from newsfeed_item
where newsfeed in
          (
          '173ee4dcec0d11de9f4f12313c0018c1','10dabde0f70211df816612313b02054e',

'17841c9af70211df874b12313b02054e','1783fce2f70211df814412313b02054e','1783fdd2f70211df8c1d12313b02054e','178405a2f70211df829212313b02054e',

'178440c6f70211df97c812313b02054e','178416e6f70211dfac3412313b02054e','1783e4aaf70211df9acd12313b02054e','178437e8f70211df8b8512313b02054e',
          '1783f54ef70211df81e012313b02054e','178415c4f70211df8f8112313b02054e'
          )
order by date_time desc

limit 25
-------------------------

If the LIMIT was removed, the query took 60 milliseconds!  If the sorting order was changed to ASC, the query took
44ms,even with the LIMIT. 

Then I tried to create the index on date_time in DESC order (because the result is sorted in descending order), but
thatdid not change anything.   


Then I removed the index on date_time with the following results:

query with the limit:     40 ms
query without the limit:  60 ms

=> the optimizer seems to use a wrong index (I did perform an ANALYZE on newsfeed_item and a REINDEX before I did the
test).Since I currently don't need  
the index on date_time (but will need it in the near future), I removed the index on date_time, which is ok for now.

------------------------

here are the explain analyze results:

1) the query in descending order with the limit and index on date_time (the slow one):

Limit  (cost=0.00..980.09 rows=25 width=963) (actual time=48.592..4060.779 rows=25 loops=1)
  ->  Index Scan Backward using "IDX_NFI_DATETIME" on newsfeed_item  (cost=0.00..409365.16 rows=10442 width=963)
(actualtime=48.581..4060.542 rows=25 loops=1) 
        Filter: ((newsfeed)::text = ANY
('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
Total runtime: 4060.959 ms


2) the query in descending order without the limit (which is much faster):

Sort  (cost=39575.23..39601.33 rows=10442 width=963) (actual time=15.014..17.038 rows=477 loops=1)
  Sort Key: date_time
  Sort Method:  quicksort  Memory: 287kB
  ->  Bitmap Heap Scan on newsfeed_item  (cost=421.41..34450.72 rows=10442 width=963) (actual time=0.644..12.601
rows=477loops=1) 
        Recheck Cond: ((newsfeed)::text = ANY
('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
        ->  Bitmap Index Scan on idx_nfi_newsfeed  (cost=0.00..418.80 rows=10442 width=0) (actual time=0.555..0.555
rows=477loops=1) 
              Index Cond: ((newsfeed)::text = ANY
('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
Total runtime: 19.065 ms

3) the query in ascending order with the limit (which is fast):

Limit  (cost=0.00..980.09 rows=25 width=963) (actual time=0.261..3.704 rows=25 loops=1)
  ->  Index Scan using "IDX_NFI_DATETIME" on newsfeed_item  (cost=0.00..409365.16 rows=10442 width=963) (actual
time=0.250..3.495rows=25 loops=1) 
        Filter: ((newsfeed)::text = ANY
('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
Total runtime: 3.854 ms


4) The query after removing the index on date_time, in descending order with the LIMIT (which is fast as well).

Limit  (cost=34745.39..34745.45 rows=25 width=963) (actual time=12.855..13.143 rows=25 loops=1)
  ->  Sort  (cost=34745.39..34771.49 rows=10442 width=963) (actual time=12.846..12.946 rows=25 loops=1)
        Sort Key: date_time
        Sort Method:  top-N heapsort  Memory: 40kB
        ->  Bitmap Heap Scan on newsfeed_item  (cost=421.41..34450.72 rows=10442 width=963) (actual time=0.622..9.936
rows=477loops=1) 
              Recheck Cond: ((newsfeed)::text = ANY
('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))
              ->  Bitmap Index Scan on idx_nfi_newsfeed  (cost=0.00..418.80 rows=10442 width=0) (actual
time=0.543..0.543rows=477 loops=1) 
                    Index Cond: ((newsfeed)::text = ANY
('{173ee4dcec0d11de9f4f12313c0018c1,10dabde0f70211df816612313b02054e,17841c9af70211df874b12313b02054e,1783fce2f70211df814412313b02054e,1783fdd2f70211df8c1d12313b02054e,178405a2f70211df829212313b02054e,178440c6f70211df97c812313b02054e,178416e6f70211dfac3412313b02054e,1783e4aaf70211df9acd12313b02054e,178437e8f70211df8b8512313b02054e,1783f54ef70211df81e012313b02054e,178415c4f70211df8f8112313b02054e}'::text[]))

Total runtime: 13.318 ms

Is there anything I can do to add the index on date_time without the performance problem?

regards
Dieter


Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?

От
Claudio Freire
Дата:
On Tue, Apr 12, 2011 at 7:20 AM, Dieter Rehbein
<dieter.rehbein@skiline.cc> wrote:
> Hi everybody,
>
> I have a performance-problem with a query using a LIMIT. There are other threads rergading performance issues with
LIMIT,but I didn't find useful hints for our problem and it might 
> be interesting for other postgres-users.

Did you perform an ANALYZE or VACUUM ANALYZE?
Did you try increasing the statistic targets?

AFAIK, it looks a lot like the planner is missing stats, since it
estimates the index query on idx_nfi_newsfeed will fetch 10k rows -
instead of 25.

Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?

От
Dieter Rehbein
Дата:
what I did, was an ANALYZE, which did not change anything.

I just executed a VACUUM ANALYZE and now everything performs well. hm, strange.

thanks
Dieter



Am 12.04.2011 um 09:42 schrieb Claudio Freire:

On Tue, Apr 12, 2011 at 7:20 AM, Dieter Rehbein
<dieter.rehbein@skiline.cc> wrote:
> Hi everybody,
>
> I have a performance-problem with a query using a LIMIT. There are other threads rergading performance issues with
LIMIT,but I didn't find useful hints for our problem and it might 
> be interesting for other postgres-users.

Did you perform an ANALYZE or VACUUM ANALYZE?
Did you try increasing the statistic targets?

AFAIK, it looks a lot like the planner is missing stats, since it
estimates the index query on idx_nfi_newsfeed will fetch 10k rows -
instead of 25.


Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?

От
Claudio Freire
Дата:
On Tue, Apr 12, 2011 at 10:59 AM, Dieter Rehbein
<dieter.rehbein@skiline.cc> wrote:
> I just executed a VACUUM ANALYZE and now everything performs well. hm, strange.

That probably means you need more statistics - try increasing the
newsfeed's statistics target count.

ALTER TABLE newsfeed_item ALTER COLUMN newsfeed SET STATISTICS <n>;

Try different <n> numbers, you can crank it up to 4000 or perhaps more
in 9.0, but you should start lower I guess.

Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?

От
tv@fuzzy.cz
Дата:
> On Tue, Apr 12, 2011 at 10:59 AM, Dieter Rehbein
> <dieter.rehbein@skiline.cc> wrote:
>> I just executed a VACUUM ANALYZE and now everything performs well. hm,
>> strange.
>
> That probably means you need more statistics - try increasing the
> newsfeed's statistics target count.
>
> ALTER TABLE newsfeed_item ALTER COLUMN newsfeed SET STATISTICS <n>;
>
> Try different <n> numbers, you can crank it up to 4000 or perhaps more
> in 9.0, but you should start lower I guess.

AFAIK the max value is 10000 and the default is 100. Higher numbers mean
higher overhead, so do not jump to 10000 directly. Set it to 1000 and see
if that helps, etc.

regards
Tomas


Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?

От
Dieter Rehbein
Дата:
thank's a lot guys,  I will try that out.

regards
Dieter



Am 12.04.2011 um 11:07 schrieb Claudio Freire:

On Tue, Apr 12, 2011 at 10:59 AM, Dieter Rehbein
<dieter.rehbein@skiline.cc> wrote:
> I just executed a VACUUM ANALYZE and now everything performs well. hm, strange.

That probably means you need more statistics - try increasing the
newsfeed's statistics target count.

ALTER TABLE newsfeed_item ALTER COLUMN newsfeed SET STATISTICS <n>;

Try different <n> numbers, you can crank it up to 4000 or perhaps more
in 9.0, but you should start lower I guess.


Re: performance problem with LIMIT (order BY in DESC order). Wrong index used?

От
Tom Lane
Дата:
Claudio Freire <klaussfreire@gmail.com> writes:
> Did you try increasing the statistic targets?

> AFAIK, it looks a lot like the planner is missing stats, since it
> estimates the index query on idx_nfi_newsfeed will fetch 10k rows -
> instead of 25.

BTW, this is the right suggestion, but for the wrong reason.  You seem
to be looking at

Limit  (cost=0.00..980.09 rows=25 width=963) (actual time=48.592..4060.779 rows=25 loops=1)
  ->  Index Scan Backward using "IDX_NFI_DATETIME" on newsfeed_item  (cost=0.00..409365.16 rows=10442 width=963)
(actualtime=48.581..4060.542 rows=25 loops=1) 

Here, the actual row count is constrained to 25 because the LIMIT node
stops calling the indexscan node once it's got 25.  So this case proves
little about whether the planner's estimates are any good.  You need to
check the estimates in the unconstrained plan:

  ->  Bitmap Heap Scan on newsfeed_item  (cost=421.41..34450.72 rows=10442 width=963) (actual time=0.644..12.601
rows=477loops=1) 

Here we can see that there really are only 477 rows in the table that
satisfy the WHERE clause, versus an estimate of 10K.  So sure enough,
the statistics are bad, and an increase in stats target might help.
But you can't conclude that from an explain that involves LIMIT.

            regards, tom lane