Обсуждение: where clauses including timstamptz and intervals

Поиск
Список
Период
Сортировка

where clauses including timstamptz and intervals

От
Niels Jespersen
Дата:
Hello all

Are these two queries exactly eqivalent? The table is partitioned on r_time, which is a timestamptz. The explain plans
arenot exactly the same. The first wants to scan a partition more than the latter.  

select f.xx from f
where f.r_time  >= '2020-10-01 00:00:00+00'::timestamptz
  and f.r_time < ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month');

select f.xx from f
where f.r_time  >= '2020-10-01 00:00:00+00'::timestamptz
  and f.r_time < ('2020-11-01 00:00:00+00'::timestamptz);

Regards Niels Jespersen



Re: where clauses including timstamptz and intervals

От
hubert depesz lubaczewski
Дата:
On Fri, Apr 09, 2021 at 07:24:54AM +0000, Niels Jespersen wrote:
> Hello all
> 
> Are these two queries exactly eqivalent? The table is partitioned on
> r_time, which is a timestamptz. The explain plans are not exactly the
> same. The first wants to scan a partition more than the latter. 
> 
> select f.xx from f 
> where f.r_time  >= '2020-10-01 00:00:00+00'::timestamptz
>   and f.r_time < ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month');
> 
> select f.xx from f 
> where f.r_time  >= '2020-10-01 00:00:00+00'::timestamptz
>   and f.r_time < ('2020-11-01 00:00:00+00'::timestamptz);

It depends on timezone.

For example, in my timezone:

$ select ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month'), '2020-11-01 00:00:00+00'::timestamptz;
        ?column?        │      timestamptz       
────────────────────────┼────────────────────────
 2020-11-01 02:00:00+01 │ 2020-11-01 01:00:00+01
(1 row)

Please note that there is 1 hour difference.

The reason is that somewhere in there we change time due to daylight
savings.

If I'd set timezone to UTC, suddently it's the same:

$ set timezone=UTC;
SET

$ select ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month') = '2020-11-01 00:00:00+00'::timestamptz;
 ?column? 
──────────
 t
(1 row)

As usual - time is more complicated than one could expect.

Best regards,

depesz



Re: where clauses including timstamptz and intervals

От
Ron
Дата:
On 4/9/21 5:24 AM, hubert depesz lubaczewski wrote:
> On Fri, Apr 09, 2021 at 07:24:54AM +0000, Niels Jespersen wrote:
>> Hello all
>>
>> Are these two queries exactly eqivalent? The table is partitioned on
>> r_time, which is a timestamptz. The explain plans are not exactly the
>> same. The first wants to scan a partition more than the latter.
>>
>> select f.xx from f
>> where f.r_time  >= '2020-10-01 00:00:00+00'::timestamptz
>>    and f.r_time < ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month');
>>
>> select f.xx from f
>> where f.r_time  >= '2020-10-01 00:00:00+00'::timestamptz
>>    and f.r_time < ('2020-11-01 00:00:00+00'::timestamptz);
> It depends on timezone.
>
> For example, in my timezone:
>
> $ select ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month'), '2020-11-01 00:00:00+00'::timestamptz;
>          ?column?        │      timestamptz
> ────────────────────────┼────────────────────────
>   2020-11-01 02:00:00+01 │ 2020-11-01 01:00:00+01
> (1 row)
>
> Please note that there is 1 hour difference.
>
> The reason is that somewhere in there we change time due to daylight
> savings.

That looks like a flaw in how "month" is calculated.  Whether or not October 
is 744 hours (31 days x 24 hours/day) or 743 hours (subtracting the "fall 
back" hour), one month from 01-Oct-2020 is still 01-Nov-2020.

-- 
Angular momentum makes the world go 'round.



Re: where clauses including timstamptz and intervals

От
Tom Lane
Дата:
Ron <ronljohnsonjr@gmail.com> writes:
> On 4/9/21 5:24 AM, hubert depesz lubaczewski wrote:
>> For example, in my timezone:
>>
>> $ select ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month'), '2020-11-01 00:00:00+00'::timestamptz;
>>         ?column?        │      timestamptz
>> ────────────────────────┼────────────────────────
>>  2020-11-01 02:00:00+01 │ 2020-11-01 01:00:00+01
>> (1 row)
>>
>> Please note that there is 1 hour difference.
>> The reason is that somewhere in there we change time due to daylight
>> savings.

> That looks like a flaw in how "month" is calculated.  Whether or not October
> is 744 hours (31 days x 24 hours/day) or 743 hours (subtracting the "fall
> back" hour), one month from 01-Oct-2020 is still 01-Nov-2020.

No, it's more likely because the calculation was done in some time
zone other than UTC, which (somewhat confusingly) is what the starting
value was specified in.  Assuming Hubert meant EU rules, we have

regression=# set timezone = 'Europe/Paris';
SET
regression=# select '2020-10-01 00:00:00+00'::timestamptz;
      timestamptz
------------------------
 2020-10-01 02:00:00+02
(1 row)

regression=# select '2020-10-01 00:00:00+00'::timestamptz + interval '1 month';
        ?column?
------------------------
 2020-11-01 02:00:00+01
(1 row)

which looks to me like the addition did exactly what it's supposed to,
ie same local time 1 month later.

            regards, tom lane



Re: where clauses including timstamptz and intervals

От
Francisco Olarte
Дата:
Ron:

On Fri, Apr 9, 2021 at 5:53 PM Ron <ronljohnsonjr@gmail.com> wrote:
> >> Are these two queries exactly eqivalent? The table is partitioned on
> >> r_time, which is a timestamptz. The explain plans are not exactly the
> >> same. The first wants to scan a partition more than the latter.
> >>
> >> select f.xx from f
> >> where f.r_time  >= '2020-10-01 00:00:00+00'::timestamptz
> >>    and f.r_time < ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month');
> >>
> >> select f.xx from f
> >> where f.r_time  >= '2020-10-01 00:00:00+00'::timestamptz
> >>    and f.r_time < ('2020-11-01 00:00:00+00'::timestamptz);
> > It depends on timezone.
....
> That looks like a flaw in how "month" is calculated.  Whether or not October
> is 744 hours (31 days x 24 hours/day) or 743 hours (subtracting the "fall
> back" hour), one month from 01-Oct-2020 is still 01-Nov-2020.

I've encountered these thing before. What is probably happening is
that your CLIENT TIMEZONE is DST aware.

When you add the interval 1st PG takes the tstz, and converts it to
internal. No problem here, it is a gmt one. The it parses the
interval, again no problem, intervals do not depend on DST.

But then you ask to add them, and timezone + interval is time zone
sensitive. And, counter intuitively to many, tstz does not store
timezone, it marks an instant in time, it just signals that
manipulation to the timezone must be TZ aware, so it does it in your
current timezone.

You probably have ( and if not, you should consider having ) your
partitions defined on GMT, so the time jump at the starty of the month
make it take one more ( are you in Britain/Portugal/Canary islands,
which are +0 in winter +1 in summer, IIRC ? )

If you want to scan exact partitions by specifying arithmetic limits,
set timezone to whichever tz your partitions are partitioned with, or
just name the partitions. If you just want to scan a date range (
thats's what I do a lot ), just trust pg to do the right thing.

Remember, specifying +00 in the input does not make the arithmetic
work in that tz, it works in your tz. I think you could achieve the
desired result by temporarily switching to timestamp without,
something like
('2020-10-01 00:00:00+00'::timestamptz at timezone 'GMT' + interval '1
month') at timezone 'GMT'
IIRC this converts to tstz to ts, then adds there, then converts back
to tstz, my results are:

cdrs=> show timezone;
 TimeZone
----------
 UTC
(1 row)

cdrs=> select ('2020-10-01 00:00:00+00'::timestamptz + interval '1 month');
        ?column?
------------------------
 2020-11-01 00:00:00+00
(1 row)

cdrs=> set timezone to 'Europe/Madrid';
SET

cdrs=> select ('2020-10-01 00:00:00+00'::timestamptz), ('2020-10-01
00:00:00+00'::timestamptz + interval '1 month');
      timestamptz       |        ?column?
------------------------+------------------------
 2020-10-01 02:00:00+02 | 2020-11-01 02:00:00+01
(1 row)

********* diferent UTC hours, same local. (02 but +02 and +01)

cdrs=> select ('2020-10-01 00:00:00+00'::timestamptz), ('2020-10-01
00:00:00+00'::timestamptz at time zone 'utc'+ interval '1 month') at
time zone 'utc';
      timestamptz       |        timezone
------------------------+------------------------
 2020-10-01 02:00:00+02 | 2020-11-01 01:00:00+01
(1 row)

********* Now same UTC hours, different local. ( 02 + 02, 01 + 01 )

If I had monthly partitions rolling at 00 utc, 1st one will nibble the
first hour of an extra one.

TSTZ is hard.

Francisco Olarte.



SV: where clauses including timstamptz and intervals

От
Niels Jespersen
Дата:
>-----Oprindelig meddelelse-----
>Fra: Tom Lane <tgl@sss.pgh.pa.us> 
>Emne: Re: where clauses including timstamptz and intervals
>
>Ron <ronljohnsonjr@gmail.com> writes:
>> On 4/9/21 5:24 AM, hubert depesz lubaczewski wrote:
>>> For example, in my timezone:
>>> 
>>> (1 row)
>>> 
>>> Please note that there is 1 hour difference.
>>> The reason is that somewhere in there we change time due to daylight 
>>> savings.
>
>> That looks like a flaw in how "month" is calculated.  Whether or not 
>> October is 744 hours (31 days x 24 hours/day) or 743 hours 
>> (subtracting the "fall back" hour), one month from 01-Oct-2020 is still 01-Nov-2020.
>
>No, it's more likely because the calculation was done in some time zone other than UTC, which (somewhat confusingly)
iswhat the starting value was specified in.  Assuming Hubert meant EU rules, we have
 
>
>regression=# set timezone = 'Europe/Paris'; SET regression=# select '2020-10-01 00:00:00+00'::timestamptz;
>      timestamptz       
>------------------------
> 2020-10-01 02:00:00+02
>(1 row)
>
>regression=# select '2020-10-01 00:00:00+00'::timestamptz + interval '1 month';
>        ?column?        
>------------------------
> 2020-11-01 02:00:00+01
>(1 row)
>
>which looks to me like the addition did exactly what it's supposed to, ie same local time 1 month later.
>
>            regards, tom lane
>

I hope this sums it up. 

select ('2020-10-01 00:00:00+00'::timestamptz at time zone 'utc' + interval '1 month')             as c1,
       '2020-11-01 00:00:00+00'::timestamptz at time zone 'utc'                                    as c2,
       ('2020-10-01 00:00:00+00'::timestamptz at time zone 'utc' + interval '1 month') =
       ('2020-11-01 00:00:00+00'::timestamptz at time zone 'utc')                                  as c3,
       ('2020-10-01 00:00:00+00'::timestamptz at time zone 'Europe/Brussels' + interval '1 month') as c4,
       '2020-11-01 00:00:00+00'::timestamptz at time zone 'Europe/Brussels'                        as c5,
       ('2020-10-01 00:00:00+00'::timestamptz at time zone 'Europe/Brussels' + interval '1 month') =
       ('2020-11-01 00:00:00+00'::timestamptz at time zone 'Europe/Brussels')                      as c6;

returns 

         c1          |         c2          | c3 |         c4          |         c5          | c6
---------------------+---------------------+----+---------------------+---------------------+----
 2020-11-01 00:00:00 | 2020-11-01 00:00:00 | t  | 2020-11-01 02:00:00 | 2020-11-01 01:00:00 | f
(1 row)

The answer to my original problem behind my original question which is how to query using utc timing in order to hit
exactlythe (utc created) partition, without changing the session timezone, would be like this:
 

select f.* from f
where f.r_time at time zone 'utc' >= '2020-10-01 00:00:00+00'::timestamptz at time zone 'utc'
  and f.r_time at time zone 'utc' < ('2020-10-01 00:00:00+00'::timestamptz at time zone 'utc' + interval '1 month');

I think maybe it's better to just change the session timezone when this need arises. 

Thank you all for feedback. 

Regards Niels


Re: where clauses including timstamptz and intervals

От
Francisco Olarte
Дата:
Niels:

On Fri, Apr 9, 2021 at 7:39 PM Niels Jespersen <NJN@dst.dk> wrote:

> The answer to my original problem behind my original question which is how to query using utc timing in order to hit
exactlythe (utc created) partition, without changing the session timezone, would be like this:
 

You would have had much success had you stated your problem instead of
the seemingly academical question of queery equivalence, anyway:

> select f.* from f
> where f.r_time at time zone 'utc' >= '2020-10-01 00:00:00+00'::timestamptz at time zone 'utc'
>   and f.r_time at time zone 'utc' < ('2020-10-01 00:00:00+00'::timestamptz at time zone 'utc' + interval '1 month');

Although it is equivalent I would suggest to use:

f.r_time< ('2020-10-01 00:00:00+00'::timestamptz at time zone 'utc' +
interval '1 month') at time zone 'utc' ;

optimizer should treat both the same, but I'm no sure it can recognize
it, and it may not see your condition is of the type "field <
constant" to select index scans and similar things. Also, you can
refactor your calculations into an stable function for greater
clarity.

> I think maybe it's better to just change the session timezone when this need arises.

Maybe. Also, I do not see the need to query exact partitions by tstz
range, but assuming you do it may be much easier to query the catalog
for the partition and then query the partition. ( In many years of
working with time-partitioned tables I have never got the need to
build a query to hit exactly X partitions from times ).

Francisco Olarte.



SV: where clauses including timstamptz and intervals

От
Niels Jespersen
Дата:
>-----Oprindelig meddelelse-----
>Fra: Francisco Olarte <folarte@peoplecall.com> 
>Sendt: 10. april 2021 09:31
>Til: Niels Jespersen <NJN@dst.dk>
>
>Although it is equivalent I would suggest to use:
>
>f.r_time< ('2020-10-01 00:00:00+00'::timestamptz at time zone 'utc' + interval '1 month') at time zone 'utc' ;
>
>optimizer should treat both the same, but I'm no sure it can recognize it, and it may not see your condition is of the
type"field < constant" to select index scans and similar things. Also, you can refactor your calculations into an
stablefunction for greater clarity.
 

The plannes seems to be able to figure this out nicely. 
>
>> I think maybe it's better to just change the session timezone when this need arises.
>
>Maybe. Also, I do not see the need to query exact partitions by tstz range, but assuming you do it may be much easier
toquery the catalog for the partition and then query the partition. ( In many years of working with time-partitioned
tablesI have never got the need to build a query to hit exactly X partitions from times ).
 
>
I need to hit an exact partition, because the query makes some QA on data coming from a file that fits that exact
timeslot.File and partition must match exactly because the file may be raplaced later and the partition truncated. 
 

Thank you for your insights.

Niels

>Francisco Olarte.