Обсуждение: Not quite there on timezone names in timestamp input
This doesn't work: regression=# select '2006-07-13 09:20:00 EST5EDT'::timestamptz; ERROR: invalid input syntax for type timestamp with time zone: "2006-07-13 09:20:00 EST5EDT" It never has worked in the past, of course, but I expected it to work in HEAD seeing that the zic database includes EST5EDT as one of the defined zone names (and pg_timezone_names shows it as available). On investigation, the problem seems to lie with ParseDateTime(), which is being quite finicky about what it will believe is a timezone name. In particular it won't accept digits as part of the name. That can probably be fixed but I'm wondering about the other more or less special cases in the timezone name list, such as America/Port-au-PrinceGB-EireGMT-0Etc/GMT+8W-SU In particular it seems tough to accept GMT-0 as a timezone name without breaking recognition of month-name dates such as 'Nov-09-2006'. Any thoughts about appropriate heuristics? Should we just accept a few cases where we don't recognize zic timezone names? (If so, we'd better hack pg_timezone_names to not show those cases...) Another problem here is case folding. The existing code smashes the string to initcap format up to the '/' if any, but doesn't fold the part after the '/', leaving us with a weird combination of case sensitivity and insensitivity, not to mention another reason why EST5EDT won't work. I'd like it to be entirely case insensitive, but am not sure how to make the file lookup work that way. Thoughts? regards, tom lane
On Sat, Oct 14, 2006 at 07:42:18PM -0400, Tom Lane wrote: > On investigation, the problem seems to lie with ParseDateTime(), which > is being quite finicky about what it will believe is a timezone name. > In particular it won't accept digits as part of the name. That can > probably be fixed but I'm wondering about the other more or less special > cases in the timezone name list, such as Last time I wrote some code with the zic database, I let it accept anything that started with a capital letter, which dropped the posix and leaps directories, the zone.tab file, etc. > In particular it seems tough to accept GMT-0 as a timezone name without > breaking recognition of month-name dates such as 'Nov-09-2006'. Any > thoughts about appropriate heuristics? Should we just accept a few > cases where we don't recognize zic timezone names? (If so, we'd better > hack pg_timezone_names to not show those cases...) The only useful thing I can think of is the the timezones has to be the last part of the string. If it contains a slash, it's a timezone. Other than that... > Another problem here is case folding. The existing code smashes the > string to initcap format up to the '/' if any, but doesn't fold the part > after the '/', leaving us with a weird combination of case sensitivity > and insensitivity, not to mention another reason why EST5EDT won't work. > I'd like it to be entirely case insensitive, but am not sure how to make > the file lookup work that way. Thoughts? You can't make a file lookup that way. If you want that to work, you should preload a list of timezones and store the list internally. Another way to deal with start timezone names: we have a table for looking up "EST" and such, short names like GMT-0 could be added to that table... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Sat, Oct 14, 2006 at 07:42:18PM -0400, Tom Lane wrote: >> On investigation, the problem seems to lie with ParseDateTime(), which >> is being quite finicky about what it will believe is a timezone name. > Last time I wrote some code with the zic database, I let it accept > anything that started with a capital letter, which dropped the posix > and leaps directories, the zone.tab file, etc. The trick is not so much where does the TZ name start, as where does it end. Also, we have to distinguish dates in the style Oct-15-2006 from TZ names ... where I believe the code also allows Oct/15/2006 and Oct.15.2006. We could probably distinguish the date case by the rules* exactly two separator characters, both alike;* only letters beforethe first separator, only digits after >> I'd like it to be entirely case insensitive, but am not sure how to make >> the file lookup work that way. Thoughts? > You can't make a file lookup that way. If you want that to work, you > should preload a list of timezones and store the list internally. That seems pretty expensive --- not too bad for Unix, in which we could let backends inherit the list from the postmaster, but in Windows each backend would have to compute its own list :-( We could implement our own case-insensitive file search atop a readdir() scan, but that might just be transferring the expense from one place to another. However, we do cache the results of TZ data loads, and most of the time a backend won't be referencing very many different TZ definitions over its lifespan, so it might work OK. regards, tom lane