Обсуждение: Proposal for updating src/timezone

Поиск
Список
Период
Сортировка

Proposal for updating src/timezone

От
John Cochran
Дата:
Given my recent examination of the src/timezone subtree and the current code at IANA for timezone information and tools, I proposing the following changes and would like to first get group consensus on the change prior to investing any major effort. 

My proposal is the have the following directory structure

.../src/timezone - Would have only PostgreSQL specific code
.../src/timezone/tznames - would be unaltered from current (optionally this could be removed)
.../src/timezone/iana - would contain unaltered code and data from IANA

The .../src/timezone/data directory would be removed.

In doing this, any future updates to the IANA code should be trivial to incorporate into the postgres tree.

Reasons for change.
1. The current timezone code is based upon, according to the comments, version 2010c from IANA. IANA's latest code is version 2014e and there are substantial changes between those versions (as well as 35 intermediate versions ... Yes, IANA changes and updates the code regularly and frequently)

2. The current timezone data has a few entries that are not properly handled by any pre-2013 code so an update of some sort is needed anyway.

3. The latest code from IANA compiles with GCC using a kitchen sink array of warning options without any generated warnings. This should alleviate any  concerns about not accepting any code unless it's warning free. In case anyone is interested, the GCC options are: -Dlint -g3 -O3 -fno-common -fstrict-aliasing  -Wall -Wextra -Wbad-function-cast -Wcast-align -Wcast-qual -Wformat=2 -Winit-self -Wmissing-declarations -Wmissing-noreturn -Wmissing-prototypes -Wnested-externs -Wno-address -Wno-cast-qual -Wno-format-nonliteral -Wno-sign-compare -Wno-sign-conversion -Wno-type-limits -Wno-unused-parameter -Woverlength-strings -Wpointer-arith -Wshadow -Wstrict-prototypes -Wsuggest-attribute=const -Wsuggest-attribute=noreturn -Wsuggest-attribute=pure -Wtrampolines -Wwrite-strings

Aspects of change that I'm not happy with.
1. I would have liked to recommend 2 sub-directories underneath .../src/timezone/iana named "code" and "data", but unfortunately have to suggest untarring both the code and data files directly into the iana directory. The reason for this is that the IANA code does not compile using the IANA Makefile unless the script yearistype.sh is present and that script is currently present in the data tar file, not the code tar file. And that unfortunately means that splitting the IANA code and data into different directories would violate the objective of being able to deposit the files unaltered into postgres. 

2. Depositing the IANA code unaltered would also deposit some "junk" files into the directory. The files would mainly consist of man pages and html files containing documentation for the timezone code. The extra files would consume approximately 500 kilobytes above what's actually needed, but otherwise wouldn't have any adverse effects.

Thank you for reading this proposal,
John Cochran
-- 

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. — C.A.R. Hoare

Re: Proposal for updating src/timezone

От
Tom Lane
Дата:
John Cochran <j69cochran@gmail.com> writes:
> My proposal is the have the following directory structure

> .../src/timezone - Would have only PostgreSQL specific code
> .../src/timezone/tznames - would be unaltered from current (optionally this
> could be removed)
> .../src/timezone/iana - would contain unaltered code and data from IANA

> The .../src/timezone/data directory would be removed.

I am not in favor of doing anything to the data/ directory.  If we can
have unaltered tzcode source files in their own subdirectory, that'd be
great, but I see no advantage to smashing the IANA code and data files
together.  Furthermore, doing so would break the existing ability to just
cherry-pick tzdata update commits into back branches.

Part of my thought here is that we do generally just dump tzdata updates
into our tree without much thought, but I don't think that will ever be
the case for tzcode updates; we'll want to look at those with more care,
and probably we'll not back-patch them.

> 1. I would have liked to recommend 2 sub-directories underneath
> .../src/timezone/iana named "code" and "data", but unfortunately have to
> suggest untarring both the code and data files directly into the iana
> directory. The reason for this is that the IANA code does not compile using
> the IANA Makefile unless the script yearistype.sh is present and that
> script is currently present in the data tar file, not the code tar file.

I have exactly zero expectation of using their Makefile, so this is not a
concern.

> 2. Depositing the IANA code unaltered would also deposit some "junk" files
> into the directory. The files would mainly consist of man pages and html
> files containing documentation for the timezone code. The extra files would
> consume approximately 500 kilobytes above what's actually needed, but
> otherwise wouldn't have any adverse effects.

We're not doing that either, IMO.  Why should we invest 500K of tarball
space on that?
        regards, tom lane



Re: Proposal for updating src/timezone

От
John Cochran
Дата:
On Fri, Jul 18, 2014 at 1:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
John Cochran <j69cochran@gmail.com> writes:
> My proposal is the have the following directory structure
 
...
> 1. I would have liked to recommend 2 sub-directories underneath
... 
 
I have exactly zero expectation of using their Makefile, so this is not a concern.

> 2. Depositing the IANA code unaltered would also deposit some "junk" files
... 
We're not doing that either, IMO.  Why should we invest 500K of tarball
space on that?
 

Understandable, and both issues are in the category of "Things I didn't like"

Did a diff between the 2010c version of localtime.c and the postgres version and saw a lot more differences than what could have been expected from simple reformatting and adaptation. So I installed gitk and took a look at the change history of localtime.c....

Well, looking at the results, it pretty much put paid to the concept of ever importing the IANA code unaltered into the postgres tree. In fact, it looks like the original version of localtime.c was based upon one of the 2004a thru 2004c versions of the IANA code and since then has been independently maintained. 

In any case, I think my best path forward is to look at the change history of the source files under src/timezone and determine the actual origin of each file based upon the git history and the IANA archive. Then see what's needed to bring the code up to date based upon the IANA code. Due to the configure option "--with-system-tzdata", I assume that postgres needs to remain compatible with the output from the IANA zic program. That combined with the warning messages about some of the entries in the current timezone data not being compatible with pre-2013 code worries me a bit since the 2010c code from IANA does not support the output from the 2014e zic. 
--

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. — C.A.R. Hoare

Re: Proposal for updating src/timezone

От
Tom Lane
Дата:
John Cochran <j69cochran@gmail.com> writes:
> Did a diff between the 2010c version of localtime.c and the postgres
> version and saw a lot more differences than what could have been expected
> from simple reformatting and adaptation. So I installed gitk and took a
> look at the change history of localtime.c....

> Well, looking at the results, it pretty much put paid to the concept of
> ever importing the IANA code unaltered into the postgres tree. In fact, it
> looks like the original version of localtime.c was based upon one of the
> 2004a thru 2004c versions of the IANA code and since then has been
> independently maintained.

That's certainly true, but I'm not sure that we should give up all that
easily on getting closer to the IANA code again.  For one thing, some of
the thrashing had to do with the fact that we went over to 64-bit
timestamps sooner than the IANA crew did.  I'm assuming that they insist
on 64-bit arithmetic now; we do too, so for instance some of the typedef
hacking might no longer be necessary.

AFAICT from a quick scan of the git logs, the main thing we've done to the
API that's not in IANA is invent pg_next_dst_boundary().  Perhaps that
could be pushed into a separate source file.

Even if we only got to a point where the sources were the same except for
a few narrow patches, it would make tracking their updates much easier.
        regards, tom lane



Re: Proposal for updating src/timezone

От
John Cochran
Дата:
Agreed. Right now, I'm seeing about updating zic.c to match the IANA code combined with the modifications that postgres did to it. So far, it doesn't look like many functional changes have been done, but due to the use of pgindent, there's a LOT of cosmetic changes that add one heck of a lot of noise in determining the actual differences. After I get the code closely matching the IANA code, I'll submit a patch (unfortunately it will pretty much be an entire replacement of zic.c due to the massive changes IANA did between 2010c and 2014e). I will request very strongly that pgindent not be used again on the IANA code so future maintainers can easily perform a diff between the IANA code and the postgres code to determine the actual differences. I'll then see about doing the same with the other source files in timezone.




On Fri, Jul 18, 2014 at 4:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
John Cochran <j69cochran@gmail.com> writes:
> Did a diff between the 2010c version of localtime.c and the postgres
> version and saw a lot more differences than what could have been expected
> from simple reformatting and adaptation. So I installed gitk and took a
> look at the change history of localtime.c....

> Well, looking at the results, it pretty much put paid to the concept of
> ever importing the IANA code unaltered into the postgres tree. In fact, it
> looks like the original version of localtime.c was based upon one of the
> 2004a thru 2004c versions of the IANA code and since then has been
> independently maintained.

That's certainly true, but I'm not sure that we should give up all that
easily on getting closer to the IANA code again.  For one thing, some of
the thrashing had to do with the fact that we went over to 64-bit
timestamps sooner than the IANA crew did.  I'm assuming that they insist
on 64-bit arithmetic now; we do too, so for instance some of the typedef
hacking might no longer be necessary.

AFAICT from a quick scan of the git logs, the main thing we've done to the
API that's not in IANA is invent pg_next_dst_boundary().  Perhaps that
could be pushed into a separate source file.

Even if we only got to a point where the sources were the same except for
a few narrow patches, it would make tracking their updates much easier.

                        regards, tom lane



--

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. — C.A.R. Hoare

Re: Proposal for updating src/timezone

От
Michael Banck
Дата:
Hi,

On Sat, Jul 19, 2014 at 09:28:25AM -0400, John Cochran wrote:
> Agreed. Right now, I'm seeing about updating zic.c to match the IANA code
> combined with the modifications that postgres did to it. So far, it doesn't
> look like many functional changes have been done, but due to the use of
> pgindent, there's a LOT of cosmetic changes that add one heck of a lot of
> noise in determining the actual differences. 

Maybe if you pgindent the IANA code as well, you can more easily diff
the actual changes between the two, did you try that?


Michael



Re: Proposal for updating src/timezone

От
John Cochran
Дата:



On Sat, Jul 19, 2014 at 11:58 AM, Michael Banck <mbanck@gmx.net> wrote:
SNIP
Maybe if you pgindent the IANA code as well, you can more easily diff
the actual changes between the two, did you try that?


Michael

Unfortunately, pgindent doesn't work well with the IANA code as evident by some previous checkins.

To quote a checkin done 2004-05-21 16:59:10 by Tom Lane
pgindent did a pretty awful job on the timezone code, particularly with respect to doubly-starred comment blocks.  Do some manual cleanup.

As it is, I've finished checking the differences between the postgres and IANA code for zic.c after editing both to eliminate non-functional style differences such as indentation, function prototypes, comparing strchr results against NULL or 0, etc. It looks like the only differences from the stock code is support for the -P option implemented by Tom Lane and the substitution of the postgres code for getopt instead of the unix default as well as a few minor changes in some defaults and minor modifications to deal with Windows. Overall, rather small and trivial.

Additionally, I discovered a little piece of low hanging fruit. The Makefile for the timezone code links together zic.o ialloc.o scheck.o and localtime.o in order to make zic.  Additionally, zic.c has a stub implementation of pg_open_tzfile() in order to resolve a linkage issue with the localtime.o module for zic.
Well, as it turns out, localtime.o doesn't supply ANY symbols that zic needs and therefore can be omitted entirely from the list of object files comprising zic. Which in turn means that the stub implementation of pg_open_tzfile can be removed from the postgres version of zic.c.  I'm not bothering to submit a patch involving this since that patch will be quite short lived given my objective to bring the code up to date with the 2014e version of the IANA code. But I am submitting a bug report to IANA on the Makefile since the unneeded linkage with localtime.o is still in the 2014e code on their site.

--

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. — C.A.R. Hoare

Re: Proposal for updating src/timezone

От
Gavin Flower
Дата:
<div class="moz-cite-prefix">On 20/07/14 06:30, John Cochran wrote:<br /></div><blockquote
cite="mid:CAGQU3n49T4W2hKXGnzS-vJNb1xuHx3xNzBcjk=kpmwQdyMDTzA@mail.gmail.com"type="cite"><div dir="ltr"><br /><div
class="gmail_extra"><br/><br /><div class="gmail_quote">On Sat, Jul 19, 2014 at 11:58 AM, Michael Banck <span
dir="ltr"><<ahref="mailto:mbanck@gmx.net" moz-do-not-send="true" target="_blank">mbanck@gmx.net</a>></span>
wrote:</div><divclass="gmail_quote">SNIP <blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"> Maybe if you pgindent the IANA code as well, you can more easily
diff<br/> the actual changes between the two, did you try that?<br /><span class="HOEnZb"><font color="#888888"><br
/><br/> Michael<br /></font></span></blockquote></div><br /> Unfortunately, pgindent doesn't work well with the IANA
codeas evident by some previous checkins.</div><div class="gmail_extra"><br /></div><div class="gmail_extra">To quote a
checkindone 2004-05-21 16:59:10 by Tom Lane</div><div class="gmail_extra">pgindent did a pretty awful job on the
timezonecode, particularly with respect to doubly-starred comment blocks.  Do some manual cleanup.<br /></div><br
/></div></blockquote>[...]<br /><br /> For simply noting functional differences: could not make copies of the sources
filesfor both the pg & IANA code, and remove ALL indentation prior to to a a diff?<br /><br /><br /> Cheers,<br />
Gavin<br/> 

Re: Proposal for updating src/timezone

От
Tom Lane
Дата:
John Cochran <j69cochran@gmail.com> writes:
> As it is, I've finished checking the differences between the postgres and
> IANA code for zic.c after editing both to eliminate non-functional style
> differences such as indentation, function prototypes, comparing strchr
> results against NULL or 0, etc. It looks like the only differences from the
> stock code is support for the -P option implemented by Tom Lane and the
> substitution of the postgres code for getopt instead of the unix default as
> well as a few minor changes in some defaults and minor modifications to
> deal with Windows. Overall, rather small and trivial.

> Additionally, I discovered a little piece of low hanging fruit. The
> Makefile for the timezone code links together zic.o ialloc.o scheck.o and
> localtime.o in order to make zic.  Additionally, zic.c has a stub
> implementation of pg_open_tzfile() in order to resolve a linkage issue with
> the localtime.o module for zic.
> Well, as it turns out, localtime.o doesn't supply ANY symbols that zic
> needs and therefore can be omitted entirely from the list of object files
> comprising zic. Which in turn means that the stub implementation of
> pg_open_tzfile can be removed from the postgres version of zic.c.  I'm not
> bothering to submit a patch involving this since that patch will be quite
> short lived given my objective to bring the code up to date with the 2014e
> version of the IANA code. But I am submitting a bug report to IANA on the
> Makefile since the unneeded linkage with localtime.o is still in the 2014e
> code on their site.

John, have you made any further progress on this since July?

The urgency of updating our timezone code has risen quite a bit for me,
because while testing an update of the data files to tzdata2014h I became
aware that the "-P" option is failing to print a noticeable number of
zone abbreviations that clearly exist in the data files.  Since the -P
hack itself is so simple, it's hard to come to any other conclusion than
that the rest of the timezone code is buggy --- presumably because it's
four years behind what IANA is targeting with the data files.

I spent a little bit of time looking at just applying the diffs between
tzcode2010c and tzcode2014h to our code, but the diffs are large enough
that that seems both tedious and quite error-prone.  I think we'd be
better advised to push forward with the idea of trying to absorb
more-or-less-unmodified versions of the timezone code files.  (I'm
particularly disillusioned with the idea that we'll keep on pgindent'ing
that code --- the effort-to-reward ratio even to keep the comments
readable just ain't very good.)
        regards, tom lane



Re: Proposal for updating src/timezone

От
Tom Lane
Дата:
I wrote:
> The urgency of updating our timezone code has risen quite a bit for me,
> because while testing an update of the data files to tzdata2014h I became
> aware that the "-P" option is failing to print a noticeable number of
> zone abbreviations that clearly exist in the data files.  Since the -P
> hack itself is so simple, it's hard to come to any other conclusion than
> that the rest of the timezone code is buggy --- presumably because it's
> four years behind what IANA is targeting with the data files.

Ah, scratch that.  On further investigation, it turns out that my code for
"-P" is in fact wrong; it misses out printing zone abbreviations in cases
where an area reverted to an older meaning of a zone abbrev after having
used another meaning for awhile.  Which, in fact, is what the Russians
have just done, and it seems it's happened rather often elsewhere as well.

We do still need to work on syncing our code with IANA's updates one way
or the other, because eventually they are going to start shipping zone
definitions that break older copies of the library; this has been
mentioned more than once on the IANA announcement list.  But it doesn't
seem like that has happened quite yet.

Meanwhile, it looks like there's a fair amount of missed updates in our
tznames/ files as a result of this error.  Off I go to look at that.

(The underlying motivation here is that I'd like to get the tzdata2014h
updates installed before we wrap 9.4beta3.  Since it looks like those
changes are going to be rather invasive, it seems like getting some beta
testing on them would be a good thing before we unleash them on
back-branch releases.)
        regards, tom lane