Обсуждение: Not quite there on timezone names in timestamp input

Поиск
Список
Период
Сортировка

Not quite there on timezone names in timestamp input

От
Tom Lane
Дата:
This doesn't work:

regression=# select '2006-07-13 09:20:00 EST5EDT'::timestamptz;
ERROR:  invalid input syntax for type timestamp with time zone: "2006-07-13 09:20:00 EST5EDT"

It never has worked in the past, of course, but I expected it to work in
HEAD seeing that the zic database includes EST5EDT as one of the defined
zone names (and pg_timezone_names shows it as available).

On investigation, the problem seems to lie with ParseDateTime(), which
is being quite finicky about what it will believe is a timezone name.
In particular it won't accept digits as part of the name.  That can
probably be fixed but I'm wondering about the other more or less special
cases in the timezone name list, such as 
America/Port-au-PrinceGB-EireGMT-0Etc/GMT+8W-SU

In particular it seems tough to accept GMT-0 as a timezone name without
breaking recognition of month-name dates such as 'Nov-09-2006'.  Any
thoughts about appropriate heuristics?  Should we just accept a few
cases where we don't recognize zic timezone names?  (If so, we'd better
hack pg_timezone_names to not show those cases...)

Another problem here is case folding.  The existing code smashes the
string to initcap format up to the '/' if any, but doesn't fold the part
after the '/', leaving us with a weird combination of case sensitivity
and insensitivity, not to mention another reason why EST5EDT won't work.
I'd like it to be entirely case insensitive, but am not sure how to make
the file lookup work that way.  Thoughts?
        regards, tom lane


Re: Not quite there on timezone names in timestamp input

От
Martijn van Oosterhout
Дата:
On Sat, Oct 14, 2006 at 07:42:18PM -0400, Tom Lane wrote:
> On investigation, the problem seems to lie with ParseDateTime(), which
> is being quite finicky about what it will believe is a timezone name.
> In particular it won't accept digits as part of the name.  That can
> probably be fixed but I'm wondering about the other more or less special
> cases in the timezone name list, such as

Last time I wrote some code with the zic database, I let it accept
anything that started with a capital letter, which dropped the posix
and leaps directories, the zone.tab file, etc.

> In particular it seems tough to accept GMT-0 as a timezone name without
> breaking recognition of month-name dates such as 'Nov-09-2006'.  Any
> thoughts about appropriate heuristics?  Should we just accept a few
> cases where we don't recognize zic timezone names?  (If so, we'd better
> hack pg_timezone_names to not show those cases...)

The only useful thing I can think of is the the timezones has to be the
last part of the string. If it contains a slash, it's a timezone. Other
than that...

> Another problem here is case folding.  The existing code smashes the
> string to initcap format up to the '/' if any, but doesn't fold the part
> after the '/', leaving us with a weird combination of case sensitivity
> and insensitivity, not to mention another reason why EST5EDT won't work.
> I'd like it to be entirely case insensitive, but am not sure how to make
> the file lookup work that way.  Thoughts?

You can't make a file lookup that way. If you want that to work, you
should preload a list of timezones and store the list internally.

Another way to deal with start timezone names: we have a table for
looking up "EST" and such, short names like GMT-0 could be added to
that table...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Not quite there on timezone names in timestamp input

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Sat, Oct 14, 2006 at 07:42:18PM -0400, Tom Lane wrote:
>> On investigation, the problem seems to lie with ParseDateTime(), which
>> is being quite finicky about what it will believe is a timezone name.

> Last time I wrote some code with the zic database, I let it accept
> anything that started with a capital letter, which dropped the posix
> and leaps directories, the zone.tab file, etc.

The trick is not so much where does the TZ name start, as where does it
end.  Also, we have to distinguish dates in the style Oct-15-2006 from
TZ names ... where I believe the code also allows Oct/15/2006 and
Oct.15.2006.

We could probably distinguish the date case by the rules* exactly two separator characters, both alike;* only letters
beforethe first separator, only digits after
 

>> I'd like it to be entirely case insensitive, but am not sure how to make
>> the file lookup work that way.  Thoughts?

> You can't make a file lookup that way. If you want that to work, you
> should preload a list of timezones and store the list internally.

That seems pretty expensive --- not too bad for Unix, in which we could
let backends inherit the list from the postmaster, but in Windows each
backend would have to compute its own list :-(

We could implement our own case-insensitive file search atop a readdir()
scan, but that might just be transferring the expense from one place to
another.  However, we do cache the results of TZ data loads, and most of
the time a backend won't be referencing very many different TZ
definitions over its lifespan, so it might work OK.
        regards, tom lane