Обсуждение: Why is the comparison between timestamp and date so much slower then between two dates

Поиск
Список
Период
Сортировка

Why is the comparison between timestamp and date so much slower then between two dates

От
Thomas Kellerer
Дата:
I came across something strange today.

Consider the following table:

    CREATE TABLE price_history
    (
        product_id integer,
        valid_from date,
        valid_to date,
        price integer
    );

    CREATE INDEX i1 ON price_history (product_id, valid_from, valid_to);

The table contains 5 million rows and 5000 distinct product_ids
To get the current price for each product I used the following query:

   select *
   from price_history
   where current_date between valid_from and valid_to;

The execution plan is not really surprising:

    Index Scan using i1 on public.price_history  (cost=0.44..61980.61 rows=5133 width=16) (actual time=0.177..527.693
rows=5000loops=1) 
      Output: product_id, valid_from, price, valid_to
      Index Cond: ((('now'::cstring)::date >= ph.valid_from) AND (('now'::cstring)::date <= ph.valid_to))
      Buffers: shared hit=24160
    Planning time: 0.395 ms
    Execution time: 528.193 ms

Now I tried the same query using now() instead of current_date, which shows exactly the same execution plan, but is 3
timesslower: 

    Index Scan using i1 on public.price_history  (cost=0.44..61980.60 rows=5133 width=16) (actual time=0.406..1902.241
rows=5000loops=1) 
      Output: product_id, valid_from, price, valid_to
      Index Cond: ((now() >= ph.valid_from) AND (now() <= ph.valid_to))
      Buffers: shared hit=24160
    Planning time: 0.615 ms
    Execution time: 1902.777 ms

The above plans were taken on my Windows laptop with Postgres 9.5.1

On a CentOS server with 9.5.0 I can see the same difference:

Plan using now()

    Index Scan using i1 on public.price_history  (cost=0.44..110570.50 rows=44944 width=16) (actual time=0.182..837.903
rows=5000loops=1) 
      Output: product_id, valid_from, valid_to, price
      Index Cond: ((now() >= price_history.valid_from) AND (now() <= price_history.valid_to))
      Buffers: shared hit=24160
    Planning time: 0.106 ms
    Execution time: 838.529 ms

Plan using current_date:

    Index Scan using i1 on public.price_history  (cost=0.44..110570.51 rows=44944 width=16) (actual time=0.052..180.856
rows=5000loops=1) 
      Output: product_id, valid_from, valid_to, price
      Index Cond: ((('now'::cstring)::date >= price_history.valid_from) AND (('now'::cstring)::date <=
price_history.valid_to))
      Buffers: shared hit=24160
    Planning time: 0.115 ms
    Execution time: 181.226 ms

(I don't know why the estimates on the CentOS installation are so different from the one on my laptop given that both
tablescontain exactly the same data and were analyzed properly before running explain plan - but that is a different
question). 

So my question is: why is comparing a timestamp to a date so much slower?

Thomas



Re: Why is the comparison between timestamp and date so much slower then between two dates

От
Tom Lane
Дата:
Thomas Kellerer <spam_eater@gmx.net> writes:
> So my question is: why is comparing a timestamp to a date so much slower?

The date has to be up-converted to a timestamptz (not timestamp).
I think the expensive part of that is determining what timezone
applies, in particular whether DST is active.  You could try it
with "localtimestamp" (no parens) instead of "now()" to see how
it performs with a non-tz timestamp.

            regards, tom lane


Re: Why is the comparison between timestamp and date so much slower then between two dates

От
Alban Hertroys
Дата:
On 13 April 2016 at 15:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Kellerer <spam_eater@gmx.net> writes:
>> So my question is: why is comparing a timestamp to a date so much slower?
>
> The date has to be up-converted to a timestamptz (not timestamp).
> I think the expensive part of that is determining what timezone
> applies, in particular whether DST is active.  You could try it
> with "localtimestamp" (no parens) instead of "now()" to see how
> it performs with a non-tz timestamp.
>
>                         regards, tom lane

Or... you can manually down-convert the timestamptz now() to a date ;)
That's basically what you're doing when you use current_date instead of now().

The reason that the other way around is so much more expensive is that
the database needs to do that conversion twice for every row in the
table. When down-converting now(), the DB only needs to do that once
for all rows.

Regards,
Alban Hertroys.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: Why is the comparison between timestamp and date so much slower then between two dates

От
Thomas Kellerer
Дата:
Tom Lane schrieb am 13.04.2016 um 15:45:
>> So my question is: why is comparing a timestamp to a date so much slower?
>
> The date has to be up-converted to a timestamptz (not timestamp).
> I think the expensive part of that is determining what timezone
> applies, in particular whether DST is active.  You could try it
> with "localtimestamp" (no parens) instead of "now()" to see how
> it performs with a non-tz timestamp.

localtimestamp is indeed faster then now(), but still a bit slower then
current_date (700ms vs 500ms after 5 runs for each)

But as the value of now() won't change throughout the runtime of the statement
(actually the transaction), I wonder why it is being converted for every row.

Thomas

Re: Why is the comparison between timestamp and date so much slower then between two dates

От
Thomas Kellerer
Дата:
Alban Hertroys schrieb am 13.04.2016 um 16:39:
>>> So my question is: why is comparing a timestamp to a date so much slower?
>
> The reason that the other way around is so much more expensive is that
> the database needs to do that conversion twice for every row in the
> table. When down-converting now(), the DB only needs to do that once
> for all rows.

Why does it do that for each row? The value of now() won't
change while the statement is running, so this conversion could
be done once at the start of the statement.


Re: Re: Why is the comparison between timestamp and date so much slower then between two dates

От
"Mike Sofen"
Дата:
|-----Original Message-----
|From: Thomas Kellerer   Sent: Wednesday, April 13, 2016 11:37 PM
|
|Alban Hertroys schrieb am 13.04.2016 um 16:39:
|>>> So my question is: why is comparing a timestamp to a date so much slower?
|>
|> The reason that the other way around is so much more expensive is that
|> the database needs to do that conversion twice for every row in the
|> table. When down-converting now(), the DB only needs to do that once
|> for all rows.
|
|Why does it do that for each row? The value of now() won't change while the
|statement is running, so this conversion could be done once at the start of the
|statement.

The general rule in the SQL Server world is that using a function in a Where clause or join will eliminate usage of an
indexthat would have been leveraged if the function didn't exist.  The reason is that functions are non-deterministic,
sothe optimizer can't possibly tell in advance what the outcome will be and thus takes the safest route to completion.
I'mbetting that the same logic holds in PG (I just haven't tested it enough to be absolutely sure). 

In the case of now() in the Where clause, to avoid the conversion/loss of index usage, I always place (what should be a
staticvalue anyway) the output of now() into a local variable and then use that in the Where clause...and get my index
back.

This is just a style of coding (no functions in where clauses/joins), but one that doesn't seem prevalent in
PG...insteadI see people using functions within functions within functions, the cascading impact of which becomes very
hardto unravel. 

Mike Sofen



Re: Why is the comparison between timestamp and date so much slower then between two dates

От
Thomas Kellerer
Дата:
Mike Sofen schrieb am 14.04.2016 um 14:29:
> The general rule in the SQL Server world is that using a function in
> a Where clause or join will eliminate usage of an index that would
> have been leveraged if the function didn't exist. The reason is that
> functions are non-deterministic, so the optimizer can't possibly tell
> in advance what the outcome will be and thus takes the safest route
> to completion.
> I'm betting that the same logic holds in PG (I just
> haven't tested it enough to be absolutely sure).

Well, this is only true if the function "hides" the value of a column, or
if the function is not marked stable. A condition like:

  where x = some_function(42)

can absolutely use an index on the column x (and I'm pretty sure this
is true for SQL Server as well).

You can even create an index on a function expression, so that something
like

  where some_function(x) = 42

can make use of an index if that is defined as: on table_name((some_function(x)))
(Something SQL Server can't do)

You can only create such an index if the function is marked as "immutable"
which basically says that when calling the same function twice with the
same value it will return the exact same value:

   http://www.postgresql.org/docs/current/static/xfunc-volatility.html

But in general I do agree that one should be very careful with
conditions where the types don't match or where expressions are
used that can't make use of an index.

> In the case of now() in the Where clause, to avoid the
> conversion/loss of index usage, I always place (what should be a
> static value anyway) the output of now() into a local variable and
> then use that in the Where clause...and get my index back.

now() (and current_timestamp as well) are defined to return the
same value throughout the entire transaction.

So the optimizer _should_ be smart enough to do the conversion only
once at the beginning of the statement and then use that converted
value during the execution of the statement without the need
to re-evaluate it for each row.

But my question wasn't about whether it's a good idea to
use a function in the where clause, but why there is such a huge(!)
difference in performance between now() and current_date especially
given the fact that both are only evaluated once.

Thomas



Re: Re: Why is the comparison between timestamp and date so much slower then between two dates

От
Tom Lane
Дата:
Thomas Kellerer <spam_eater@gmx.net> writes:
> So the optimizer _should_ be smart enough to do the conversion only
> once at the beginning of the statement and then use that converted
> value during the execution of the statement without the need
> to re-evaluate it for each row.

It's not; especially not in your originally posted case where the
up-conversion happens on the variable not the pseudo-constant.

            regards, tom lane


Re: Why is the comparison between timestamp and date so much slower then between two dates

От
Thomas Kellerer
Дата:
Tom Lane schrieb am 14.04.2016 um 15:57:
>> So the optimizer _should_ be smart enough to do the conversion only
>> once at the beginning of the statement and then use that converted
>> value during the execution of the statement without the need
>> to re-evaluate it for each row.
>
> It's not; especially not in your originally posted case where the
> up-conversion happens on the variable not the pseudo-constant.

Which variable are you referring to?

The original query only uses a simple between condition where the
comparison value is a "constant" (the value of "now()")

Re: Why is the comparison between timestamp and date so much slower then between two dates

От
Alban Hertroys
Дата:
> On 14 Apr 2016, at 15:12, Thomas Kellerer <spam_eater@gmx.net> wrote:
>
> now() (and current_timestamp as well) are defined to return the
> same value throughout the entire transaction.
>
> So the optimizer _should_ be smart enough to do the conversion only
> once at the beginning of the statement and then use that converted
> value during the execution of the statement without the need
> to re-evaluate it for each row.

As I understand it, that's not how it works.

If the optimizer would down-convert the value of now() from a timestamp to a date, it would lose precision, possibly
resultingin wrong results for corner cases in general. 
For that reason, it chooses to do the opposite and up-converts the dates. But, because the dates are fields and not
constants,it has to do so for every row. 

If that's indeed what happens, then indeed, now() gets evaluated only once, but the slow-down is caused by having to do
conversions(for two field values) for every row. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: Why is the comparison between timestamp and date so much slower then between two dates

От
Thomas Kellerer
Дата:
Alban Hertroys schrieb am 14.04.2016 um 21:22:
>> now() (and current_timestamp as well) are defined to return the
>> same value throughout the entire transaction.
>>
>> So the optimizer _should_ be smart enough to do the conversion
>> only once at the beginning of the statement and then use that
>> converted value during the execution of the statement without the
>> need to re-evaluate it for each row.
>
> As I understand it, that's not how it works.
>
> If the optimizer would down-convert the value of now() from a
> timestamp to a date, it would lose precision, possibly resulting in
> wrong results for corner cases in general. For that reason, it
> chooses to do the opposite and up-converts the dates. But, because
> the dates are fields and not constants, it has to do so for every
> row.
>
> If that's indeed what happens, then indeed, now() gets evaluated only
> once, but the slow-down is caused by having to do conversions (for
> two field values) for every row.

Ah, that makes sense.