Обсуждение: Simplifying timezone support

Поиск
Список
Период
Сортировка

Simplifying timezone support

От
Tom Lane
Дата:
While looking at this recent bug report (which still fails in CVS tip)
http://archives.postgresql.org/pgsql-bugs/2003-02/msg00094.php
I realized that the code paths that putatively exist for machines
with neither HAVE_TM_ZONE nor HAVE_INT_TIMEZONE have gone unused
since at least 6.5.  Proof is that abstime2tm() doesn't even compile
in that code path (it has a reference to an undefined variable "now").

Since we evidently have no supported platforms that have neither method
of learning the timezone, I propose that we stop contorting the code
with the illusion that we can handle this case reasonably.  We can
easily set it up so that we just default to GMT when neither config
symbol is defined.

The reason I want to do this is to remove the dependency on
system-supplied values of CTimeZone and CTZName.  CTZName can go
away altogether (ditto CDayLight), and CTimeZone will only be used
when the user explicitly sets a timezone as a numeric offset from
GMT (ie, the HasCTZSet paths).  This will save cycles during every
transaction start, where we currently expend time setting these values
(see GetCurrentAbsoluteTimeUsec).  And it will fix the above-mentioned
bug, which exists because CTimeZone is set at transaction start and not
updated by a later SET TIMEZONE command.

The bug could be fixed, sort of, by calling GetCurrentAbsoluteTimeUsec
again after executing SET TIMEZONE.  But there is a variant scenario
where the same bug exists: if there has been a daylight-savings
transition since the current transaction started, we'll still get the
wrong answers.  So trying to make CTimeZone track the current timezone
correctly seems doomed to failure anyhow.

Any objections?
        regards, tom lane


Re: Simplifying timezone support

От
"Ross J. Reedstrom"
Дата:
On Wed, Feb 19, 2003 at 10:35:58PM -0500, Tom Lane wrote:
<snip Tom discussion backend internal tracking of timezone> 
> Any objections?

Not to your suggestion per se, but looking at the bug report raises a
question about pgsql's time zone parsers. It appears there's at least
two, since SET TIME ZONE accepts strings like 'US/Eastern', while general
timestamp parsing doesn't:

test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 CST';        timestamptz          
------------------------------2003-02-18 09:36:06.00933-06
(1 row)

test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST';        timestamptz          
------------------------------2003-02-18 08:36:06.00933-06
(1 row)

test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 US/Eastern';
ERROR:  Bad timestamp external representation '2003/02/18 09:36:06.00933 US/Eastern'

Further testing says it's even worse that that: 

SET TIME ZONE will silently accept any string at all, and fall back to
providing GMT when a timestamptz is requested. This includes the TLA
TZ abbreviations that the constant parsing code understands, like CST
and EST.

test=# set TIME ZONE 'CST';
SET
test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST';        timestamptz          
------------------------------2003-02-18 14:36:06.00933+00
(1 row)

test=# set TIME ZONE 'FOOBAR';
SET
test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST';        timestamptz          
------------------------------2003-02-18 14:36:06.00933+00
(1 row)

Here's an especially fun one: with DATESTYLE set to 'Postgresql,US', whatever
string is handed to SET TIME ZONE comes out the other end, if it can't
be parsed:

test=# set TIME ZONE 'FOOBAR';
SET
test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST';             timestamptz              
---------------------------------------Tue Feb 18 14:36:06.00933 2003 FOOBAR
(1 row)


Leading to this erroneous pair:

test=# set TIME ZONE 'US/Central';
SET
test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST';           timestamptz             
------------------------------------Tue Feb 18 08:36:06.00933 2003 CST
(1 row)

test=# set TIME ZONE 'CST';
SET
test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST';           timestamptz             
------------------------------------Tue Feb 18 14:36:06.00933 2003 CST
(1 row)

test=# 

Tom, since you're in (or near) that code right now, how painful would
it be to unify the time zone parsing? What's the correct behavior?
Certainly SET TIME ZONE should at leat NOTICE about invalide time zone
names?

Ross


Re: Simplifying timezone support

От
Tom Lane
Дата:
"Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> question about pgsql's time zone parsers. It appears there's at least
> two, since SET TIME ZONE accepts strings like 'US/Eastern', while general
> timestamp parsing doesn't:

The TIME ZONE string is fed to libc (via TZ environment variable); the
other cases are not.

> SET TIME ZONE will silently accept any string at all, and fall back to
> providing GMT when a timestamptz is requested.

Provide a portable way of getting libc to tell us whether it likes TZ,
and I'll be glad to fix this.

Ultimately we should probably get rid of our dependence on the libc
time routines altogether ... but I have no intention of opening that
can of worms right now.  See past discussions in the archives.
        regards, tom lane


Re: Simplifying timezone support

От
Tom Lane
Дата:
"Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> On Thu, Feb 20, 2003 at 03:21:09PM -0500, Tom Lane wrote:
>> Provide a portable way of getting libc to tell us whether it likes TZ,
>> and I'll be glad to fix this.

> Dang that lovely word 'portable'. However, given your proposed change,
> perhaps the hurdle for portable time handling is now lower: it seems we've
> not been exposed to as broad a range of broken systems as in the past.

On this particular point my threshold of 'portable' is actually pretty
low, as long as it's fail-soft.  Failure to detect bad TZ on some
systems would leave them no worse off than before, right?

But I haven't seen *any* published API that directly tells you whether
tzset liked TZ or not --- AFAICT it's supposed to just silently
substitute GMT.  Which would be okay if "GMT" were the only allowed
spelling of GMT, but it ain't ...
        regards, tom lane


Re: Simplifying timezone support

От
"Ross J. Reedstrom"
Дата:
On Thu, Feb 20, 2003 at 03:21:09PM -0500, Tom Lane wrote:
> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> > question about pgsql's time zone parsers. It appears there's at least
> > two, since SET TIME ZONE accepts strings like 'US/Eastern', while general
> > timestamp parsing doesn't:
> 
> The TIME ZONE string is fed to libc (via TZ environment variable); the
> other cases are not.
> 
> > SET TIME ZONE will silently accept any string at all, and fall back to
> > providing GMT when a timestamptz is requested.
> 
> Provide a portable way of getting libc to tell us whether it likes TZ,
> and I'll be glad to fix this.

Dang that lovely word 'portable'. However, given your proposed change,
perhaps the hurdle for portable time handling is now lower: it seems we've
not been exposed to as broad a range of broken systems as in the past.
I'll look at it. but no promises.

> Ultimately we should probably get rid of our dependence on the libc
> time routines altogether ... but I have no intention of opening that
> can of worms right now.  See past discussions in the archives.

Agreed. I see we're inheriting the actually misleading case from the
OS/libc, as well:

wallace$ unset TZ
wallace$ date
Thu Feb 20 15:00:04 CST 2003
wallace$ export TZ=US/Central
wallace$ date
Thu Feb 20 15:00:16 CST 2003
wallace$ export TZ=US/Zanzibar
wallace$ date
Thu Feb 20 21:00:33 US/Zanzibar 2003
wallace$ export TZ=CST
wallace$ date
Thu Feb 20 21:00:42 CST 2003
wallace$ 

Ross


Re: Simplifying timezone support

От
Tom Lane
Дата:
"Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> If the time zone came back UNKOWN, we go ahead and see if tzset() can
> interpret it. Criteria for failure: if the timezone offset came back 0,
> and the reported tzname[0] is the same as the string that we passed in. If
> it does, we fire a NOTICE about an unknown spelling of GMT. Note that we
> would have already caught all _known_ spellings of GMT in the first step,
> so we won't be spamming the DBA with warnings about 'GMT' and 'UTC', etc.

I'm worried about cases like "Africa/Benin" for places that just happen
to be on the prime meridian, but don't call their time GMT or UTC.
Looking at a globe, it also seems possible that there are places an hour
west of Greenwich, for which this could fail during daylight-savings
season.

> An extension to this would be to use the tzset() trick above directly
> in the datetime constant parser, as a fallback after not matching the
> table. In that case, we'd probably want to treat the unknown spelling
> of GMT as an error, though (as it currently does).

I think tzset() is probably much too slow to consider calling on every
pass through timestamptz_in ...
        regards, tom lane


Re: Simplifying timezone support

От
"Ross J. Reedstrom"
Дата:
On Thu, Feb 20, 2003 at 04:19:21PM -0500, Tom Lane wrote:
> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> > On Thu, Feb 20, 2003 at 03:21:09PM -0500, Tom Lane wrote:
> >> Provide a portable way of getting libc to tell us whether it likes TZ,
> >> and I'll be glad to fix this.
> 
> > Dang that lovely word 'portable'. However, given your proposed change,
> > perhaps the hurdle for portable time handling is now lower: it seems we've
> > not been exposed to as broad a range of broken systems as in the past.
> 
> On this particular point my threshold of 'portable' is actually pretty
> low, as long as it's fail-soft.  Failure to detect bad TZ on some
> systems would leave them no worse off than before, right?
> 
> But I haven't seen *any* published API that directly tells you whether
> tzset liked TZ or not --- AFAICT it's supposed to just silently
> substitute GMT.  Which would be okay if "GMT" were the only allowed
> spelling of GMT, but it ain't ...

I've been digging in the date and time code a bit, and now have a proposal
for dealing with SET TIME ZONE 'someunknownstring'.  First, we use the
time token table from the time constant parser in utils/adt/datetime.c
to see if we've got a recognized time zone abbreviation. If it is,
we generate a canonical POSIX timezone name for use in setting TZ,
call tzset(), and we're done.

If the time zone came back UNKOWN, we go ahead and see if tzset() can
interpret it. Criteria for failure: if the timezone offset came back 0,
and the reported tzname[0] is the same as the string that we passed in. If
it does, we fire a NOTICE about an unknown spelling of GMT. Note that we
would have already caught all _known_ spellings of GMT in the first step,
so we won't be spamming the DBA with warnings about 'GMT' and 'UTC', etc.

An extension to this would be to use the tzset() trick above directly
in the datetime constant parser, as a fallback after not matching the
table. In that case, we'd probably want to treat the unknown spelling
of GMT as an error, though (as it currently does).

Thoughts? If this seems acceptable, I can implement it this weekend.

Ross


Re: Simplifying timezone support

От
"Ross J. Reedstrom"
Дата:
On Fri, Feb 21, 2003 at 06:15:31PM -0500, Tom Lane wrote:
> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
<snip>
> 
> I'm worried about cases like "Africa/Benin" for places that just happen
> to be on the prime meridian, but don't call their time GMT or UTC.
> Looking at a globe, it also seems possible that there are places an hour
> west of Greenwich, for which this could fail during daylight-savings
> season.

Well, that'll either get caught by the existing table (we've got six
different spellings of GMT, currently) or by the 'string in != string out'
case - the zoneinfo format requires a 3 or more character abbreviation
for the time zone. For every case I'v looked at in my zoneinfo directory,
it's either 3 or 4 uppercase characters, and _never_ matches the filename
path string used to set it. I'll do an exhaustive test after dinner.

> 
> > An extension to this would be to use the tzset() trick above directly
> > in the datetime constant parser, as a fallback after not matching the
> > table. In that case, we'd probably want to treat the unknown spelling
> > of GMT as an error, though (as it currently does).
> 
> I think tzset() is probably much too slow to consider calling on every
> pass through timestamptz_in ...

It wouldn't happen on every call - only with funky timezone
representations.  We could NOTICE use of tzset(), as well, to alert the
DBA about something fishy, if you'd like.

Ross


Re: Simplifying timezone support

От
"Ross J. Reedstrom"
Дата:
On Fri, Feb 21, 2003 at 05:45:53PM -0600, Ross J. Reedstrom wrote:
> On Fri, Feb 21, 2003 at 06:15:31PM -0500, Tom Lane wrote:
> > "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> <snip>
> > 
> > I'm worried about cases like "Africa/Benin" for places that just happen
> > to be on the prime meridian, but don't call their time GMT or UTC.
> > Looking at a globe, it also seems possible that there are places an hour
> > west of Greenwich, for which this could fail during daylight-savings
> > season.
> 
> Well, that'll either get caught by the existing table (we've got six
> different spellings of GMT, currently) or by the 'string in != string out'
> case - the zoneinfo format requires a 3 or more character abbreviation
> for the time zone. For every case I'v looked at in my zoneinfo directory,
> it's either 3 or 4 uppercase characters, and _never_ matches the filename
> path string used to set it. I'll do an exhaustive test after dinner.

O.K., I've run the test: of the 1108 files in my zoneinfo database,
only 11 have matching filenames to the canonical name returned after
setting TZ.  Of those 11, 4 are some version of GMT (GMT, UCT, UTC,
WET), of which, one is in fact missing from our table - UCT. At minimum,
I'll add that.

Every other validly formatted TZ variable that returns GMT should be
caught be the datetktbl check.

I'll play with it this weekend, see how hard it is to make it work.

Ross


Re: Simplifying timezone support

От
"Ross J. Reedstrom"
Дата:
On Fri, Feb 21, 2003 at 06:15:31PM -0500, Tom Lane wrote:
> "Ross J. Reedstrom" <reedstrm@rice.edu> writes:
<snip>
> 
> I'm worried about cases like "Africa/Benin" for places that just happen
> to be on the prime meridian, but don't call their time GMT or UTC.
> Looking at a globe, it also seems possible that there are places an hour
> west of Greenwich, for which this could fail during daylight-savings
> season.

Well, that'll either get caught by the existing table (we've got six
different spellings of GMT, currently) or by the 'string in != string out'
case - the zoneinfo format requires a 3 or more character abbreviation
for the time zone. For every case I'v looked at in my zoneinfo directory,
it's either 3 or 4 uppercase characters, and _never_ matches the filename
path string used to set it. I'll do an exhaustive test after dinner.

> 
> > An extension to this would be to use the tzset() trick above directly
> > in the datetime constant parser, as a fallback after not matching the
> > table. In that case, we'd probably want to treat the unknown spelling
> > of GMT as an error, though (as it currently does).
> 
> I think tzset() is probably much too slow to consider calling on every
> pass through timestamptz_in ...

It wouldn't happen on every call - only with funky timezone
representations.  We could NOTICE use of tzset(), as well, to alert the
DBA about something fishy, if you'd like.

Ross


Re: Simplifying timezone support

От
"Ross J. Reedstrom"
Дата:
On Fri, Feb 21, 2003 at 08:39:12PM -0600, Ross J. Reedstrom wrote:
>
> Every other validly formatted TZ variable that returns GMT should be
> caught be the datetktbl check.
>
> I'll play with it this weekend, see how hard it is to make it work.

O.K., the weekend's over, And I've created two different version
of this.  Both work, ipass all the regression test, and solve the
'CST is just a funny way to say GMT' problem.  I was able to make use
of DecodePosixTimezone (DPT) from Thomas's datetime parsing code in
assign_timezone. However, the order of application of this vis. tzset
is unclear.

I had proposed doing the DPT first, then tzset, then a NOTICE if it
looked like tzset didn't. Got that working, but discovered a change of
behavior: for some of those who have a timezone in the zoneinfo database
that is a three letter abbreviation, the current code (tzset only) will
provide daylight savings time transitions, so that a timestamp in July
returns a different timezone than one in February.  This is not true for
our internal values of set time zone: there, we convert to a numerical
offset, which is constant no matter when the timestamp occurs.

This is still a win for those who's timezone abbreviation is _not_ in the
zoneinfo DB, (such as CST), which currently is silently interpreted as
an odd spelling of GMT.

Second solution - try tzset() first, and apply the following heuristic
to see if it took:

tzname[0]==$TZ and tzname[1]=="" and timezone=0 and daylight=0

In other words, _all_ the timezone related information remains the
default.  I tested this against the 1607 zoneinfo files on my system:
every one was filtered out, even things that _are_ GMT with no DST (they
all had a non-null tzname[1] == tzname[0])

If this succeeds (i.e. tzset didn't recognize the TZ), go ahead and look
it up in our big table'o date/time strings. This also works, fixing the
bogus GMT spellings, without changing current behavior for any string
that is not bogus.

Note that the sysadmin can always tell if tzset or the table was used, by
looking at the format of the 'show time zone' result. If tzset was called,
this is the string that was passed to 'set time zone'. If the table was
used, it will be an hours west of GMT offset.

The problem with this approach is that it does nothing to reduce our
dependency on the OS timezone functionality.

Comments? I've attached the second patch for discussion.

Ross


Вложения

Re: Simplifying timezone support

От
"Ross J. Reedstrom"
Дата:
According to my sent folder, this went out Monday afternoon, but I
haven't seen it yet, so I'm resending to the list only, without the
attached patch.  I'll send the patch over to patches.

Any comment on the behavior, specifically, the heuristic for deciding
tzset() failed, and the proposed order of application of tzset()
vs. table lookup?

Ross

On Mon, Feb 24, 2003 at 03:34:56PM -0600, Ross J. Reedstrom wrote:
> On Fri, Feb 21, 2003 at 08:39:12PM -0600, Ross J. Reedstrom wrote:
> > 
> > Every other validly formatted TZ variable that returns GMT should be
> > caught be the datetktbl check.
> > 
> > I'll play with it this weekend, see how hard it is to make it work.
> 
> O.K., the weekend's over, And I've created two different version
> of this.  Both work, ipass all the regression test, and solve the
> 'CST is just a funny way to say GMT' problem.  I was able to make use
> of DecodePosixTimezone (DPT) from Thomas's datetime parsing code in
> assign_timezone. However, the order of application of this vis. tzset
> is unclear.
> 
> I had proposed doing the DPT first, then tzset, then a NOTICE if it
> looked like tzset didn't. Got that working, but discovered a change of
> behavior: for some of those who have a timezone in the zoneinfo database
> that is a three letter abbreviation, the current code (tzset only) will
> provide daylight savings time transitions, so that a timestamp in July
> returns a different timezone than one in February.  This is not true for
> our internal values of set time zone: there, we convert to a numerical
> offset, which is constant no matter when the timestamp occurs.
> 
> This is still a win for those who's timezone abbreviation is _not_ in the
> zoneinfo DB, (such as CST), which currently is silently interpreted as
> an odd spelling of GMT.
> 
> Second solution - try tzset() first, and apply the following heuristic
> to see if it took:
> 
> tzname[0]==$TZ and tzname[1]=="" and timezone=0 and daylight=0
> 
> In other words, _all_ the timezone related information remains the
> default.  I tested this against the 1607 zoneinfo files on my system:
> every one was filtered out, even things that _are_ GMT with no DST (they
> all had a non-null tzname[1] == tzname[0])
> 
> If this succeeds (i.e. tzset didn't recognize the TZ), go ahead and look
> it up in our big table'o date/time strings. This also works, fixing the
> bogus GMT spellings, without changing current behavior for any string
> that is not bogus.
> 
> Note that the sysadmin can always tell if tzset or the table was used, by
> looking at the format of the 'show time zone' result. If tzset was called,
> this is the string that was passed to 'set time zone'. If the table was
> used, it will be an hours west of GMT offset.
> 
> The problem with this approach is that it does nothing to reduce our 
> dependency on the OS timezone functionality.
> 
> Comments? I've attached the second patch for discussion.
> 
> Ross
> 


Re: Simplifying timezone support

От
Tom Lane
Дата:
Awhile back, "Ross J. Reedstrom" <reedstrm@rice.edu> wrote:
> Second solution - try tzset() first, and apply the following heuristic
> to see if it took:
> tzname[0]==$TZ and tzname[1]=="" and timezone=0 and daylight=0

I finally went back to look at this issue, and soon realized that the
above test is in fact glibc-specific.  Other implementations of tzset()
may act differently.  On HPUX, tzset() appears to adopt the system-wide
setting (from /etc/zoneinfo/localtime) if TZ contains a garbage string.
And yet, that's a better fallback behavior than adopting GMT.

I have applied a patch that just checks for tzname[1] not empty or
timezone != 0; either one means that tzset was able to interpret TZ as a
non-GMT zone (or that it gave up and reverted to a non-GMT default zone,
which we can't distinguish anyway).  Failing that, it looks to see if
the string is known as a GMT equivalent to DecodeTimezone.  If not, it
complains.

I did not include Ross' idea of accepting other zone names known to
DecodeTimezone as if they were a numeric-offset value.  We could still
do that if people like it, but it seemed a bigger change in behavior
than just trying to check that tzset() worked.

I also added code that detects and rejects a leap-second-aware timezone
definition, since we've now seen a couple different reports of systems
where such zones seem to be installed by default, leading to breakage
of our date/time arithmetic.
        regards, tom lane