Re: Help with performance on current status column

Поиск
Список
Период
Сортировка
От Richard Huxton
Тема Re: Help with performance on current status column
Дата
Msg-id 4327E9D6.5020203@archonet.com
обсуждение исходный текст
Ответ на Help with performance on current status column  (Chris Kratz <chris.kratz@vistashare.com>)
Ответы Re: Help with performance on current status column  (Chris Kratz <chris.kratz@vistashare.com>)
Список pgsql-performance
Chris Kratz wrote:
> Hello All,
>
> We are struggling with a specific query that is killing us.  When doing
> explain analyze on the entire query, we *seem* to be getting killed by the
> estimated number of rows on a case statement calculation.
>
> I've included a snippet from the explain analyze of the much larger query. The
> line in question, (cost=0.00..106.52 rows=1 width=16) (actual
> time=0.048..67.728 rows=4725 loops=1) shows that it returned 4700 rows
> instead of 1 which when coupled with a later join causes the statement to run
> over 3 minutes.[1]
>
> It seems that it thinks that the scan on role_id is going to return 1 row, but
> in reality returns 4725 rows.  The case statement causing the problem uses
> todays date to see if a particular row is still active.  Here is a test case
> showing how far off the estimate is from the reality. [2]

> [2] A much simpler statement triggers the incorrect row counts here.
>
> explain analyze
> select *
> from roles rol
> where
>
>             CASE
>            WHEN rol.role_id IS NULL
>            THEN NULL
>            WHEN rol."begin" IS NOT NULL and rol."end" IS NOT NULL
>            THEN
>              CASE WHEN TIMESTAMP 'now'>=rol."begin" and TIMESTAMP
> 'now'<=rol."end"
>              THEN 'Active'
>              ELSE 'Inactive' END
>            WHEN rol."begin" IS NOT NULL
>            THEN
>              CASE WHEN TIMESTAMP 'now'>=rol."begin"
>              THEN 'Active'
>              ELSE 'Inactive' END
>            WHEN rol."end" IS NOT NULL
>            THEN
>              CASE WHEN TIMESTAMP 'now'<=rol."end"
>              THEN 'Active'
>              ELSE 'Inactive' END
>            ELSE 'Active'
>            END = 'Active'

Aside #1 - I'm not entirely clear how role_id can be null since you
seemed to be joining against it in the real query.

Aside #2 - You're probably better off with CURRENT_DATE since begin/end
seem to be dates, rather than TIMESTAMP 'now' - and in any case you
wanted "timestamp with time zone"

OK, I think the root of your problem is your use of null to mean "not
ended" or "not started" (whatever 'not started' means). PostgreSQL has
the handy timestamptz value "infinity", but only for timestamps and not
for dates. I'd probably cheat a little and use an end date of
'9999-12-31' or similar to simulate "infinity". Then your test is simply:

WHERE
   ...
   AND (rol.begin <= CURRENT_DATE AND rol.end >= CURRENT_DATE)

That should estimate simply enough.

--
   Richard Huxton
   Archonet Ltd

В списке pgsql-performance по дате отправления:

Предыдущее
От: Pryscila B Guttoski
Дата:
Сообщение: Re: About method of PostgreSQL's Optimizer
Следующее
От: Dalibor Sramek
Дата:
Сообщение: Re: Low performance on Windows problem