Re: Selecting max(pk) is slow on empty set

Поиск

Список

Период

Сортировка

От	Richard Huxton
Тема	Re: Selecting max(pk) is slow on empty set
Дата	22 января 2008 г. 12:04:06
Msg-id	4795E908.2040502@archonet.com обсуждение исходный текст
Ответ на	Re: Selecting max(pk) is slow on empty set ("Alexander Staubo" <alex@purefiction.net>)
Ответы	Re: Selecting max(pk) is slow on empty set ("Alexander Staubo" <alex@purefiction.net>)
Список	pgsql-general

Дерево обсуждения

Alexander Staubo wrote:
> On 1/22/08, Richard Huxton <dev@archonet.com> wrote:
>> Alexander Staubo wrote:
>>> # explain analyze select max(id) from user_messages where user_id = 13604;
>>>
>>>          QUERY PLAN
>>>
------------------------------------------------------------------------------------------------------------------------------------------------------
>>> Result  (cost=633.19..633.20 rows=1 width=0) (actual
>>> time=339160.704..339160.704 rows=1 loops=1)

>> Do you have an index on user_id? Presumably that's what's being used in
>> the case of SELECT * or count(*).
>
> Yes, I do. However, for some reason it's not being used here. The
> index is clustered -- but I haven't run "cluster" on it recently. Does
> that matter?

The index is still an index...

>> What cost does the count(*) come up with?
>
> # explain analyze select count(*) from user_messages where user_id = 13604;
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=3646.04..3646.05 rows=1 width=0) (actual
> time=39.448..39.448 rows=1 loops=1)
>    ->  Index Scan using user_messages_user on user_messages
> (cost=0.00..3643.53 rows=1000 width=0) (actual time=39.410..39.410
> rows=0 loops=1)
>          Index Cond: (user_id = 13604)
>  Total runtime: 39.648 ms
> (4 rows)
>
> So here it's using the right index.

Hmm, but with an estimated cost of 3646 (vs.633 for the max(*) which
uses the wrong index). That explains why it's walking backwards through
the pkey index, it thinks that it's 8 times cheaper.

It looks like it thinks that because the estimated cost scanning the
whole index backwards is 633188 for 1000 rows and you only want one row
so that's 1/1000 of that cost.

But why 1000 rows? Actually, it thinks 1000 rows above too. Could it be
inadequate stats on the users column? If the users it gathered stats on
all have > 1000 rows then it might use the default.

Have a look at most_common_vals,most_common_freqs in pg_stats for
tbl=user_messages, att=user perhaps. Then see if an ALTER TABLE SET
STATISTICS 100 makes a difference.

>> Can you trick it with a sub-query (to see the explain)?
>> SELECT max(id) FROM (SELECT id FROM user_messages WHERE user_id = 13604)
>> AS foo;
>
> No, I tried that as well; PostgreSQL is clever enough to optimize it
> into exactly the same query as the original.

Damn :-)

--
   Richard Huxton
   Archonet Ltd

В списке pgsql-general по дате отправления:

Предыдущее

От: "Alexander Staubo"
Дата: 22 января 2008 г., 11:40:19
Сообщение: Re: Selecting max(pk) is slow on empty set

Следующее

От: "Josh Harrison"
Дата: 22 января 2008 г., 12:15:09
Сообщение: Re: Online Oracle to Postgresql data migration

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Selecting max(pk) is slow on empty set

Предыдущее

Следующее