Обсуждение: Simplifying timezone support
While looking at this recent bug report (which still fails in CVS tip) http://archives.postgresql.org/pgsql-bugs/2003-02/msg00094.php I realized that the code paths that putatively exist for machines with neither HAVE_TM_ZONE nor HAVE_INT_TIMEZONE have gone unused since at least 6.5. Proof is that abstime2tm() doesn't even compile in that code path (it has a reference to an undefined variable "now"). Since we evidently have no supported platforms that have neither method of learning the timezone, I propose that we stop contorting the code with the illusion that we can handle this case reasonably. We can easily set it up so that we just default to GMT when neither config symbol is defined. The reason I want to do this is to remove the dependency on system-supplied values of CTimeZone and CTZName. CTZName can go away altogether (ditto CDayLight), and CTimeZone will only be used when the user explicitly sets a timezone as a numeric offset from GMT (ie, the HasCTZSet paths). This will save cycles during every transaction start, where we currently expend time setting these values (see GetCurrentAbsoluteTimeUsec). And it will fix the above-mentioned bug, which exists because CTimeZone is set at transaction start and not updated by a later SET TIMEZONE command. The bug could be fixed, sort of, by calling GetCurrentAbsoluteTimeUsec again after executing SET TIMEZONE. But there is a variant scenario where the same bug exists: if there has been a daylight-savings transition since the current transaction started, we'll still get the wrong answers. So trying to make CTimeZone track the current timezone correctly seems doomed to failure anyhow. Any objections? regards, tom lane
On Wed, Feb 19, 2003 at 10:35:58PM -0500, Tom Lane wrote: <snip Tom discussion backend internal tracking of timezone> > Any objections? Not to your suggestion per se, but looking at the bug report raises a question about pgsql's time zone parsers. It appears there's at least two, since SET TIME ZONE accepts strings like 'US/Eastern', while general timestamp parsing doesn't: test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 CST'; timestamptz ------------------------------2003-02-18 09:36:06.00933-06 (1 row) test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST'; timestamptz ------------------------------2003-02-18 08:36:06.00933-06 (1 row) test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 US/Eastern'; ERROR: Bad timestamp external representation '2003/02/18 09:36:06.00933 US/Eastern' Further testing says it's even worse that that: SET TIME ZONE will silently accept any string at all, and fall back to providing GMT when a timestamptz is requested. This includes the TLA TZ abbreviations that the constant parsing code understands, like CST and EST. test=# set TIME ZONE 'CST'; SET test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST'; timestamptz ------------------------------2003-02-18 14:36:06.00933+00 (1 row) test=# set TIME ZONE 'FOOBAR'; SET test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST'; timestamptz ------------------------------2003-02-18 14:36:06.00933+00 (1 row) Here's an especially fun one: with DATESTYLE set to 'Postgresql,US', whatever string is handed to SET TIME ZONE comes out the other end, if it can't be parsed: test=# set TIME ZONE 'FOOBAR'; SET test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST'; timestamptz ---------------------------------------Tue Feb 18 14:36:06.00933 2003 FOOBAR (1 row) Leading to this erroneous pair: test=# set TIME ZONE 'US/Central'; SET test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST'; timestamptz ------------------------------------Tue Feb 18 08:36:06.00933 2003 CST (1 row) test=# set TIME ZONE 'CST'; SET test=# select TIMESTAMP WITH TIME ZONE '2003/02/18 09:36:06.00933 EST'; timestamptz ------------------------------------Tue Feb 18 14:36:06.00933 2003 CST (1 row) test=# Tom, since you're in (or near) that code right now, how painful would it be to unify the time zone parsing? What's the correct behavior? Certainly SET TIME ZONE should at leat NOTICE about invalide time zone names? Ross
"Ross J. Reedstrom" <reedstrm@rice.edu> writes: > question about pgsql's time zone parsers. It appears there's at least > two, since SET TIME ZONE accepts strings like 'US/Eastern', while general > timestamp parsing doesn't: The TIME ZONE string is fed to libc (via TZ environment variable); the other cases are not. > SET TIME ZONE will silently accept any string at all, and fall back to > providing GMT when a timestamptz is requested. Provide a portable way of getting libc to tell us whether it likes TZ, and I'll be glad to fix this. Ultimately we should probably get rid of our dependence on the libc time routines altogether ... but I have no intention of opening that can of worms right now. See past discussions in the archives. regards, tom lane
"Ross J. Reedstrom" <reedstrm@rice.edu> writes: > On Thu, Feb 20, 2003 at 03:21:09PM -0500, Tom Lane wrote: >> Provide a portable way of getting libc to tell us whether it likes TZ, >> and I'll be glad to fix this. > Dang that lovely word 'portable'. However, given your proposed change, > perhaps the hurdle for portable time handling is now lower: it seems we've > not been exposed to as broad a range of broken systems as in the past. On this particular point my threshold of 'portable' is actually pretty low, as long as it's fail-soft. Failure to detect bad TZ on some systems would leave them no worse off than before, right? But I haven't seen *any* published API that directly tells you whether tzset liked TZ or not --- AFAICT it's supposed to just silently substitute GMT. Which would be okay if "GMT" were the only allowed spelling of GMT, but it ain't ... regards, tom lane
On Thu, Feb 20, 2003 at 03:21:09PM -0500, Tom Lane wrote: > "Ross J. Reedstrom" <reedstrm@rice.edu> writes: > > question about pgsql's time zone parsers. It appears there's at least > > two, since SET TIME ZONE accepts strings like 'US/Eastern', while general > > timestamp parsing doesn't: > > The TIME ZONE string is fed to libc (via TZ environment variable); the > other cases are not. > > > SET TIME ZONE will silently accept any string at all, and fall back to > > providing GMT when a timestamptz is requested. > > Provide a portable way of getting libc to tell us whether it likes TZ, > and I'll be glad to fix this. Dang that lovely word 'portable'. However, given your proposed change, perhaps the hurdle for portable time handling is now lower: it seems we've not been exposed to as broad a range of broken systems as in the past. I'll look at it. but no promises. > Ultimately we should probably get rid of our dependence on the libc > time routines altogether ... but I have no intention of opening that > can of worms right now. See past discussions in the archives. Agreed. I see we're inheriting the actually misleading case from the OS/libc, as well: wallace$ unset TZ wallace$ date Thu Feb 20 15:00:04 CST 2003 wallace$ export TZ=US/Central wallace$ date Thu Feb 20 15:00:16 CST 2003 wallace$ export TZ=US/Zanzibar wallace$ date Thu Feb 20 21:00:33 US/Zanzibar 2003 wallace$ export TZ=CST wallace$ date Thu Feb 20 21:00:42 CST 2003 wallace$ Ross
"Ross J. Reedstrom" <reedstrm@rice.edu> writes: > If the time zone came back UNKOWN, we go ahead and see if tzset() can > interpret it. Criteria for failure: if the timezone offset came back 0, > and the reported tzname[0] is the same as the string that we passed in. If > it does, we fire a NOTICE about an unknown spelling of GMT. Note that we > would have already caught all _known_ spellings of GMT in the first step, > so we won't be spamming the DBA with warnings about 'GMT' and 'UTC', etc. I'm worried about cases like "Africa/Benin" for places that just happen to be on the prime meridian, but don't call their time GMT or UTC. Looking at a globe, it also seems possible that there are places an hour west of Greenwich, for which this could fail during daylight-savings season. > An extension to this would be to use the tzset() trick above directly > in the datetime constant parser, as a fallback after not matching the > table. In that case, we'd probably want to treat the unknown spelling > of GMT as an error, though (as it currently does). I think tzset() is probably much too slow to consider calling on every pass through timestamptz_in ... regards, tom lane
On Thu, Feb 20, 2003 at 04:19:21PM -0500, Tom Lane wrote: > "Ross J. Reedstrom" <reedstrm@rice.edu> writes: > > On Thu, Feb 20, 2003 at 03:21:09PM -0500, Tom Lane wrote: > >> Provide a portable way of getting libc to tell us whether it likes TZ, > >> and I'll be glad to fix this. > > > Dang that lovely word 'portable'. However, given your proposed change, > > perhaps the hurdle for portable time handling is now lower: it seems we've > > not been exposed to as broad a range of broken systems as in the past. > > On this particular point my threshold of 'portable' is actually pretty > low, as long as it's fail-soft. Failure to detect bad TZ on some > systems would leave them no worse off than before, right? > > But I haven't seen *any* published API that directly tells you whether > tzset liked TZ or not --- AFAICT it's supposed to just silently > substitute GMT. Which would be okay if "GMT" were the only allowed > spelling of GMT, but it ain't ... I've been digging in the date and time code a bit, and now have a proposal for dealing with SET TIME ZONE 'someunknownstring'. First, we use the time token table from the time constant parser in utils/adt/datetime.c to see if we've got a recognized time zone abbreviation. If it is, we generate a canonical POSIX timezone name for use in setting TZ, call tzset(), and we're done. If the time zone came back UNKOWN, we go ahead and see if tzset() can interpret it. Criteria for failure: if the timezone offset came back 0, and the reported tzname[0] is the same as the string that we passed in. If it does, we fire a NOTICE about an unknown spelling of GMT. Note that we would have already caught all _known_ spellings of GMT in the first step, so we won't be spamming the DBA with warnings about 'GMT' and 'UTC', etc. An extension to this would be to use the tzset() trick above directly in the datetime constant parser, as a fallback after not matching the table. In that case, we'd probably want to treat the unknown spelling of GMT as an error, though (as it currently does). Thoughts? If this seems acceptable, I can implement it this weekend. Ross
On Fri, Feb 21, 2003 at 06:15:31PM -0500, Tom Lane wrote: > "Ross J. Reedstrom" <reedstrm@rice.edu> writes: <snip> > > I'm worried about cases like "Africa/Benin" for places that just happen > to be on the prime meridian, but don't call their time GMT or UTC. > Looking at a globe, it also seems possible that there are places an hour > west of Greenwich, for which this could fail during daylight-savings > season. Well, that'll either get caught by the existing table (we've got six different spellings of GMT, currently) or by the 'string in != string out' case - the zoneinfo format requires a 3 or more character abbreviation for the time zone. For every case I'v looked at in my zoneinfo directory, it's either 3 or 4 uppercase characters, and _never_ matches the filename path string used to set it. I'll do an exhaustive test after dinner. > > > An extension to this would be to use the tzset() trick above directly > > in the datetime constant parser, as a fallback after not matching the > > table. In that case, we'd probably want to treat the unknown spelling > > of GMT as an error, though (as it currently does). > > I think tzset() is probably much too slow to consider calling on every > pass through timestamptz_in ... It wouldn't happen on every call - only with funky timezone representations. We could NOTICE use of tzset(), as well, to alert the DBA about something fishy, if you'd like. Ross
On Fri, Feb 21, 2003 at 05:45:53PM -0600, Ross J. Reedstrom wrote: > On Fri, Feb 21, 2003 at 06:15:31PM -0500, Tom Lane wrote: > > "Ross J. Reedstrom" <reedstrm@rice.edu> writes: > <snip> > > > > I'm worried about cases like "Africa/Benin" for places that just happen > > to be on the prime meridian, but don't call their time GMT or UTC. > > Looking at a globe, it also seems possible that there are places an hour > > west of Greenwich, for which this could fail during daylight-savings > > season. > > Well, that'll either get caught by the existing table (we've got six > different spellings of GMT, currently) or by the 'string in != string out' > case - the zoneinfo format requires a 3 or more character abbreviation > for the time zone. For every case I'v looked at in my zoneinfo directory, > it's either 3 or 4 uppercase characters, and _never_ matches the filename > path string used to set it. I'll do an exhaustive test after dinner. O.K., I've run the test: of the 1108 files in my zoneinfo database, only 11 have matching filenames to the canonical name returned after setting TZ. Of those 11, 4 are some version of GMT (GMT, UCT, UTC, WET), of which, one is in fact missing from our table - UCT. At minimum, I'll add that. Every other validly formatted TZ variable that returns GMT should be caught be the datetktbl check. I'll play with it this weekend, see how hard it is to make it work. Ross
On Fri, Feb 21, 2003 at 06:15:31PM -0500, Tom Lane wrote: > "Ross J. Reedstrom" <reedstrm@rice.edu> writes: <snip> > > I'm worried about cases like "Africa/Benin" for places that just happen > to be on the prime meridian, but don't call their time GMT or UTC. > Looking at a globe, it also seems possible that there are places an hour > west of Greenwich, for which this could fail during daylight-savings > season. Well, that'll either get caught by the existing table (we've got six different spellings of GMT, currently) or by the 'string in != string out' case - the zoneinfo format requires a 3 or more character abbreviation for the time zone. For every case I'v looked at in my zoneinfo directory, it's either 3 or 4 uppercase characters, and _never_ matches the filename path string used to set it. I'll do an exhaustive test after dinner. > > > An extension to this would be to use the tzset() trick above directly > > in the datetime constant parser, as a fallback after not matching the > > table. In that case, we'd probably want to treat the unknown spelling > > of GMT as an error, though (as it currently does). > > I think tzset() is probably much too slow to consider calling on every > pass through timestamptz_in ... It wouldn't happen on every call - only with funky timezone representations. We could NOTICE use of tzset(), as well, to alert the DBA about something fishy, if you'd like. Ross
On Fri, Feb 21, 2003 at 08:39:12PM -0600, Ross J. Reedstrom wrote: > > Every other validly formatted TZ variable that returns GMT should be > caught be the datetktbl check. > > I'll play with it this weekend, see how hard it is to make it work. O.K., the weekend's over, And I've created two different version of this. Both work, ipass all the regression test, and solve the 'CST is just a funny way to say GMT' problem. I was able to make use of DecodePosixTimezone (DPT) from Thomas's datetime parsing code in assign_timezone. However, the order of application of this vis. tzset is unclear. I had proposed doing the DPT first, then tzset, then a NOTICE if it looked like tzset didn't. Got that working, but discovered a change of behavior: for some of those who have a timezone in the zoneinfo database that is a three letter abbreviation, the current code (tzset only) will provide daylight savings time transitions, so that a timestamp in July returns a different timezone than one in February. This is not true for our internal values of set time zone: there, we convert to a numerical offset, which is constant no matter when the timestamp occurs. This is still a win for those who's timezone abbreviation is _not_ in the zoneinfo DB, (such as CST), which currently is silently interpreted as an odd spelling of GMT. Second solution - try tzset() first, and apply the following heuristic to see if it took: tzname[0]==$TZ and tzname[1]=="" and timezone=0 and daylight=0 In other words, _all_ the timezone related information remains the default. I tested this against the 1607 zoneinfo files on my system: every one was filtered out, even things that _are_ GMT with no DST (they all had a non-null tzname[1] == tzname[0]) If this succeeds (i.e. tzset didn't recognize the TZ), go ahead and look it up in our big table'o date/time strings. This also works, fixing the bogus GMT spellings, without changing current behavior for any string that is not bogus. Note that the sysadmin can always tell if tzset or the table was used, by looking at the format of the 'show time zone' result. If tzset was called, this is the string that was passed to 'set time zone'. If the table was used, it will be an hours west of GMT offset. The problem with this approach is that it does nothing to reduce our dependency on the OS timezone functionality. Comments? I've attached the second patch for discussion. Ross
Вложения
According to my sent folder, this went out Monday afternoon, but I haven't seen it yet, so I'm resending to the list only, without the attached patch. I'll send the patch over to patches. Any comment on the behavior, specifically, the heuristic for deciding tzset() failed, and the proposed order of application of tzset() vs. table lookup? Ross On Mon, Feb 24, 2003 at 03:34:56PM -0600, Ross J. Reedstrom wrote: > On Fri, Feb 21, 2003 at 08:39:12PM -0600, Ross J. Reedstrom wrote: > > > > Every other validly formatted TZ variable that returns GMT should be > > caught be the datetktbl check. > > > > I'll play with it this weekend, see how hard it is to make it work. > > O.K., the weekend's over, And I've created two different version > of this. Both work, ipass all the regression test, and solve the > 'CST is just a funny way to say GMT' problem. I was able to make use > of DecodePosixTimezone (DPT) from Thomas's datetime parsing code in > assign_timezone. However, the order of application of this vis. tzset > is unclear. > > I had proposed doing the DPT first, then tzset, then a NOTICE if it > looked like tzset didn't. Got that working, but discovered a change of > behavior: for some of those who have a timezone in the zoneinfo database > that is a three letter abbreviation, the current code (tzset only) will > provide daylight savings time transitions, so that a timestamp in July > returns a different timezone than one in February. This is not true for > our internal values of set time zone: there, we convert to a numerical > offset, which is constant no matter when the timestamp occurs. > > This is still a win for those who's timezone abbreviation is _not_ in the > zoneinfo DB, (such as CST), which currently is silently interpreted as > an odd spelling of GMT. > > Second solution - try tzset() first, and apply the following heuristic > to see if it took: > > tzname[0]==$TZ and tzname[1]=="" and timezone=0 and daylight=0 > > In other words, _all_ the timezone related information remains the > default. I tested this against the 1607 zoneinfo files on my system: > every one was filtered out, even things that _are_ GMT with no DST (they > all had a non-null tzname[1] == tzname[0]) > > If this succeeds (i.e. tzset didn't recognize the TZ), go ahead and look > it up in our big table'o date/time strings. This also works, fixing the > bogus GMT spellings, without changing current behavior for any string > that is not bogus. > > Note that the sysadmin can always tell if tzset or the table was used, by > looking at the format of the 'show time zone' result. If tzset was called, > this is the string that was passed to 'set time zone'. If the table was > used, it will be an hours west of GMT offset. > > The problem with this approach is that it does nothing to reduce our > dependency on the OS timezone functionality. > > Comments? I've attached the second patch for discussion. > > Ross >
Awhile back, "Ross J. Reedstrom" <reedstrm@rice.edu> wrote: > Second solution - try tzset() first, and apply the following heuristic > to see if it took: > tzname[0]==$TZ and tzname[1]=="" and timezone=0 and daylight=0 I finally went back to look at this issue, and soon realized that the above test is in fact glibc-specific. Other implementations of tzset() may act differently. On HPUX, tzset() appears to adopt the system-wide setting (from /etc/zoneinfo/localtime) if TZ contains a garbage string. And yet, that's a better fallback behavior than adopting GMT. I have applied a patch that just checks for tzname[1] not empty or timezone != 0; either one means that tzset was able to interpret TZ as a non-GMT zone (or that it gave up and reverted to a non-GMT default zone, which we can't distinguish anyway). Failing that, it looks to see if the string is known as a GMT equivalent to DecodeTimezone. If not, it complains. I did not include Ross' idea of accepting other zone names known to DecodeTimezone as if they were a numeric-offset value. We could still do that if people like it, but it seemed a bigger change in behavior than just trying to check that tzset() worked. I also added code that detects and rejects a leap-second-aware timezone definition, since we've now seen a couple different reports of systems where such zones seem to be installed by default, leading to breakage of our date/time arithmetic. regards, tom lane