Обсуждение: timezones to own config file
I looked into the timezone specifications and basically extracted a list of existing offsets from the zic database. My proposed format for the timezone files is something like this: HADT -32400 D # Hawaii-Aleutain Daylight Time # (America/Adak) HAST -36000 # Hawaii-Aleutain Standard Time # (America/Adak) That is, the abbreviation, the offset in seconds, optionally a D to mark daylight saving times (goes into tm->is_dst), the name of the timezone and the full zic names that use this timezone. I also made the extracting script find all conflicts and commented them manually as shown here. Most of the conflicts are between America and Asia. # CONFLICT! ADT is not unique # Other timezones: # - ADT: Arabic Daylight Time (Asia) ADT -10800 D # Atlantic Daylight Time # (America/Glace_Bay) # (America/Goose_Bay) # (America/Halifax) # (America/Thule) # (Atlantic/Bermuda) However, even within all "America/..." names, there are conflicts. For example CST is used as US Central Time and as Cuba Central Standard Time. While US Central time is UTC-6h, Cuba Central Standard Time is UTC-5h. Another problem is that lots of the timezone names that are hardcoded into the backend seem to be way outdated or just doubtable, many of them do not show up in the zic database. For example NT (Nome Time) seemed to have existed until 1967, America/Nome is listed in the zic database at AKDT/AKST which is Alaska Daylight/Standard Time. Other examples: JAYT, Jayapura Time: Asia/Jayapura is listed as EIT (East Indonesia Time) in the zic database. JAVT, Java Time (07:00? see JT): zic database says that it is outdated and was used until 1932. JT, Java Time (07:30? see JAVT): I did not find a proof that this is really +7.5 hours, some sources say it's just 7 hours. HMT is the strangest of the bunch, I have found the name "Heard and Mc.Donald Time" but with a different offset. I could not find a reference to some "Hellas"-Time as indicated in the comment. So could we remove some of those on the grounds that they do not seem to be used any more (please correct me here if someone knows more) and that you can easily add offsets for those if you need them? With the same argument we could even remove timezones like BDST (British Double Summer Time), DNT (Dansk Normal Tid), FST (French Summer Time), NOR (Norway Standard Time), SWT (Swedish Winter Time). Could anybody from those countries comment on whether or not those are still used or just outdated? I figure that most of those countries have moved since long to the more common timezone names... Ok, after all this has been sorted out I propose to make different files for the different continents and let the user specify with a guc which ones he wants to use. I could think of three possible ways: 1) (See Toms idea in http://archives.postgresql.org/pgsql-hackers/2006-05/msg01048.php ) Conflicts within one set can just be commented - we would try to include whatever will probably be used by the majority of users and comment the other one(s). Conflicts between two sets would show up when postmaster gets started, it would complain about different definitions for the same timezone. An American who wants to use some Asian timezones would have to work through both files and comment conflicting timezones on one side or the other to make postmaster start up without errors. 2) Find out which timezones do not conflict, put them in a set and load this by default. Create other sets that are conflicting but that have some "override" capability with regard to previous timezone definitions. Decide on the default value for the guc (could point to American timezones for example). An Australian could either select only the Australian file or could specify "America, Australia" and the Australian set overrides the American timezones in case of conflicts. This way, most people do not have to make changes and those who have to can specify their "override"-file and keep all the rest, including non-conflicting timezones from a conflicting timezone set. 3) Combine both, let the user specify the guc variable as "A, B, C" and look into C first, then in B and then in A.... *thinking* Right now I actually think that the "overriding" idea is not that intuitive, most people would probably expect that this is a list of priorities, so A overrides B which overrides C. What do you think? Having a larger token table in datetime.c does not seem to affect performance all that much. I did parsing tests with 2 million timestamps equally distributed over all timezone abbreviations that I had loaded previously and the difference of 154 timezones in comparsion to other runs with just 35 was at about ~120ms (on my quite slow laptop computer). The timezone definition files should be read at server start but should they also be read at SIGHUP? If so, should they be read only by the postmaster or by all backends? Joachim
On Tue, Jun 13, 2006 at 02:20:09PM +0200, Joachim Wieland wrote: > I looked into the timezone specifications and basically extracted a list of > existing offsets from the zic database. > > My proposed format for the timezone files is something like this: <sip> Any particular reason this can't be a normal table in pg_catalog which you can select/update. > Another problem is that lots of the timezone names that are hardcoded into > the backend seem to be way outdated or just doubtable, many of them do not > show up in the zic database. <snip lots of dodgy timezones> I've been trying to convince people for a while now that the appropriate tz string for australia is AEST/ACST/AWST but no-one seems convinced yet. Hence, I never actually specify timezones and all my timestamps are inserted as GMT. IMHO, you should simply setup the table so that it is backward compatable and let people edit it themselves. You're never going to be able to convince anyone that people arn't relying on it exactly the way it is now. The most important thing is to get rid of the "australian_timezones" hack, everything else is bonus. > The timezone definition files should be read at server start but should they > also be read at SIGHUP? If so, should they be read only by the postmaster or > by all backends? Good question... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Joachim Wieland <joe@mcknight.de> writes: > The timezone definition files should be read at server start but should they > also be read at SIGHUP? If so, should they be read only by the postmaster or > by all backends? Presumably the name of the definition file to use will be a GUC variable. I would expect the code to re-read the file any time the variable's value is changed. In the case of a change via postgresql.conf this would automatically happen in all backends as well as the postmaster. You'll need to make it follow the semantics already in use for errors in postgresql.conf, viz: * error detected during postmaster startup -> report error and quit * error detectedduring postmaster reload -> log message, ignore new setting * error detected during backend reload -> debug message,ignore new setting As far as the appropriate contents of the files go, I'd suggest *not* trying to account for every abbreviation mentioned in the zic database; lots of them are surely uninteresting, and anyone who does want Nome Time will now be able to add it for himself. The more abbreviations you try to understand, the less chance you have of detecting plain old errors. IIRC, the conflicts we've actually heard about in practice are IST (Israel vs India) and Aussie vs. USA zone names. So it might work to have two base definition files, one for Europe/Americas (with USA and Israel names) and one for Far East (with Aussie and Indian names). I am not sure where Tom Lockhart got the list of timezone names that's currently hardwired in datetime.c, but for sure you needn't treat it as being graven on stone tablets. regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > Any particular reason this can't be a normal table in pg_catalog which > you can select/update. That doesn't do anything to help with one of the main problems: that we have at least two (maybe more) alternative sets of names that people might want as default. Getting rid of "australian_timezones" is fine, but we can't do it by saying "all you aussies have to hack the standard list according to your own ideas". I don't expect that very many people will actually need to make custom timezone name lists --- if we find they do, we'll need to work harder on the default lists. So the design center should be "select one of a few predefined lists", not "hack away on system catalog until you like it". Especially not if they have to do it in template0, template1, postgres, etc. Basically, a GUC variable is just about the right paradigm for this, a system catalog isn't. I'd also be a bit worried about performance issues, eg, whether VACUUM FULL on such a table would bring datetime operations to a halt. regards, tom lane
On Tue, Jun 13, 2006 at 11:51:25AM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > Any particular reason this can't be a normal table in pg_catalog which > > you can select/update. > > That doesn't do anything to help with one of the main problems: that > we have at least two (maybe more) alternative sets of names that people > might want as default. <snip> I think my actual point was something else. We currently get calls from people trying to administer machines that it's annoying that various configuration information is stored in files, beyond the easy reach of SQL. What I was thinking is why we couldn't just store the information in a global shared system table that is only read on config reload. You could have a few columns, maybe the first being a list name, which is referenced from a GUC. If you issue a config reload during a VACUUM FULL, I guess that might be an issue, yes. I was just thinking people might appreciate being able to configure the timezones without opening a shell... Similarly, it would also give a way for user-interfaces to get a list of available valid timezones and their actual meanings, which is currently impossible. Just a thought really... -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Tue, Jun 13, 2006 at 11:11:26PM +0200, Martijn van Oosterhout wrote: > On Tue, Jun 13, 2006 at 11:51:25AM -0400, Tom Lane wrote: > > Martijn van Oosterhout <kleptog@svana.org> writes: > > > Any particular reason this can't be a normal table in pg_catalog which > > > you can select/update. > > > > That doesn't do anything to help with one of the main problems: that > > we have at least two (maybe more) alternative sets of names that people > > might want as default. > > <snip> > > I think my actual point was something else. We currently get calls from > people trying to administer machines that it's annoying that various > configuration information is stored in files, beyond the easy reach of > SQL. > > What I was thinking is why we couldn't just store the information in a > global shared system table that is only read on config reload. You > could have a few columns, maybe the first being a list name, which is > referenced from a GUC. > > If you issue a config reload during a VACUUM FULL, I guess that might > be an issue, yes. I was just thinking people might appreciate being > able to configure the timezones without opening a shell... Similarly, > it would also give a way for user-interfaces to get a list of available > valid timezones and their actual meanings, which is currently > impossible. ISTM that's an issue that affects all configuration stuff, not just the timezones; if we're going to come up with a way to manage settings without touching a file, it should work for everything. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Martijn van Oosterhout <kleptog@svana.org> writes: > What I was thinking is why we couldn't just store the information in a > global shared system table that is only read on config reload. You > could have a few columns, maybe the first being a list name, which is > referenced from a GUC. Hmmm ... if we keep the notion of a GUC that identifies a set of compatible timezone names, then a table with a primary key of (tz_set_name, tz_name) doesn't seem quite so awful. The main remaining objection I can see is that the postmaster couldn't use it, only backends. Now this doesn't matter much as far as timestamp operations go because I don't think the postmaster does any operations that need TZ data --- but what of verifying that the GUC variable has a valid value in postgresql.conf at startup? If you're willing to abandon sanity checking on that string, it might work. One interesting thought about a system table is that it could be referenced through a syscache, which'd have the nice property that only the (probably few) values actually referenced in a given session need to get loaded. regards, tom lane
On Tue, Jun 13, 2006 at 11:11:26PM +0200, Martijn van Oosterhout wrote: > I think my actual point was something else. We currently get calls from > people trying to administer machines that it's annoying that various > configuration information is stored in files, beyond the easy reach of > SQL. While that would be a nice feature I don't see yet how this should work in detail. > What I was thinking is why we couldn't just store the information in a > global shared system table that is only read on config reload. You > could have a few columns, maybe the first being a list name, which is > referenced from a GUC. > [...] Similarly, it would also give a way for user-interfaces to get a > list of available valid timezones and their actual meanings, which is > currently impossible. I think you'll get lots of problems when you only read the table on config reload. First you get inconsistencies. I update the table with new timezone data and have to SIGHUP postmaster to read in the new table. In the meantime users get a wrong list of available timezones. Next, how does all that work with transactions and visibility? What if I update and send SIGHUP, what gets applied while I have not yet committed? And when the transaction rolls back, which version is active? Third, what about dumping and restoring? Dumping would have to compare the built-in set with the active settings and generate SQL commands from it, right? Restoring would execute them and - to make that active - reload the configuration which could have the side effects of activating other settings. > Just a thought really... Here as well :-) The config-by-sql approach seems to need some more specifications. If nobody objects, I'll submit the patch in the form that it reads from a file, as Tom pointed out this is a setting that gets only changed by few people and moreover it's not part of regular administrator's work (like changes to pg_hba.conf for example). Right now you have to get pgsql's source and recompile, so having to edit a file and sending a SIGHUP seems to be quite an improvement already. Getting a list of active timezones could be easily done with a system view but you might not want to promise its existence when you think about moving the whole thing to a system catalog later... Joachim
Joachim Wieland <joe@mcknight.de> writes: > Getting a list of active timezones could be easily done > with a system view but you might not want to promise its existence when you > think about moving the whole thing to a system catalog later... A read-only view wouldn't be a bad idea, actually, for both the long-form TZ names and the abbreviations. It'd be easy to cons one up the same way as our other views based on functions. Doesn't even need to be in core, could be contrib, if there's doubts about its usefulness. regards, tom lane