Обсуждение: [HACKERS] Cutting initdb's runtime (Perl question embedded)

Поиск
Список
Период
Сортировка

[HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Andres mentioned, and I've confirmed locally, that a large chunk of
initdb's runtime goes into regprocin's brute-force lookups of function
OIDs from function names.  The recent discussion about cutting TAP test
time prompted me to look into that question again.  We had had some
grand plans for getting genbki.pl to perform the name-to-OID conversion
as part of a big rewrite, but since that project is showing few signs
of life, I'm thinking that a more localized performance fix would be
a good thing to look into.  There seem to be a couple of plausible
routes to a fix:

1. The best thing would still be to make genbki.pl do the conversion,
and write numeric OIDs into postgres.bki.  The core stumbling block
here seems to be that for most catalogs, Catalog.pm and genbki.pl
never really break down a DATA line into fields --- and we certainly
have got to do that, if we're going to replace the values of regproc
fields.  The places that do need to do that approximate it like this:
# To construct fmgroids.h and fmgrtab.c, we need to inspect some# of the individual data fields.  Just splitting on
whitespace#won't work, because some quoted fields might contain internal# whitespace.  We handle this by folding them
allto a simple# "xxx". Fortunately, this script doesn't need to look at any# fields that might need quoting, so this
simplehack is# sufficient.$row->{bki_values} =~ s/"[^"]*"/"xxx"/g;@{$row}{@attnames} = split /\s+/,
$row->{bki_values};

We would need a bullet-proof, non-hack, preferably not too slow way to
split DATA lines into fields properly.  I'm one of the world's worst
Perl programmers, but surely there's a way?

2. Alternatively, we could teach bootstrap mode to build a hashtable
mapping proname to OID while it reads pg_proc data from postgres.bki,
and then make regprocin's bootstrap path consult the hashtable instead
of looking directly at the pg_proc file.  That I'm quite sure is do-able,
but it seems like it's leaving money on the table compared to doing
the transformation earlier.

Thoughts?
        regards, tom lane



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andreas Karlsson
Дата:
On 04/12/2017 04:12 PM, Tom Lane wrote:
> 1. The best thing would still be to make genbki.pl do the conversion,
> and write numeric OIDs into postgres.bki.  The core stumbling block
> here seems to be that for most catalogs, Catalog.pm and genbki.pl
> never really break down a DATA line into fields --- and we certainly
> have got to do that, if we're going to replace the values of regproc
> fields.  The places that do need to do that approximate it like this:
>
>     # To construct fmgroids.h and fmgrtab.c, we need to inspect some
>     # of the individual data fields.  Just splitting on whitespace
>     # won't work, because some quoted fields might contain internal
>     # whitespace.  We handle this by folding them all to a simple
>     # "xxx". Fortunately, this script doesn't need to look at any
>     # fields that might need quoting, so this simple hack is
>     # sufficient.
>     $row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
>     @{$row}{@attnames} = split /\s+/, $row->{bki_values};
>
> We would need a bullet-proof, non-hack, preferably not too slow way to
> split DATA lines into fields properly.  I'm one of the world's worst
> Perl programmers, but surely there's a way?
>
> 2. Alternatively, we could teach bootstrap mode to build a hashtable
> mapping proname to OID while it reads pg_proc data from postgres.bki,
> and then make regprocin's bootstrap path consult the hashtable instead
> of looking directly at the pg_proc file.  That I'm quite sure is do-able,
> but it seems like it's leaving money on the table compared to doing
> the transformation earlier.
>
> Thoughts?

Looked at this an option 1 seems simple enough if I am not missing 
something. I might hack something up later tonight. Either way I think 
this improvement can be done separately from the proposed replacement of 
the catalog header files. Trying to fix everything at once often leads 
to nothing being fixed at all.

Andreas



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andres Freund
Дата:
On 2017-04-12 10:12:47 -0400, Tom Lane wrote:
> Andres mentioned, and I've confirmed locally, that a large chunk of
> initdb's runtime goes into regprocin's brute-force lookups of function
> OIDs from function names.  The recent discussion about cutting TAP test
> time prompted me to look into that question again.  We had had some
> grand plans for getting genbki.pl to perform the name-to-OID conversion
> as part of a big rewrite, but since that project is showing few signs
> of life, I'm thinking that a more localized performance fix would be
> a good thing to look into.  There seem to be a couple of plausible
> routes to a fix:
> 
> 1. The best thing would still be to make genbki.pl do the conversion,
> and write numeric OIDs into postgres.bki.  The core stumbling block
> here seems to be that for most catalogs, Catalog.pm and genbki.pl
> never really break down a DATA line into fields --- and we certainly
> have got to do that, if we're going to replace the values of regproc
> fields.  The places that do need to do that approximate it like this:
> 
>     # To construct fmgroids.h and fmgrtab.c, we need to inspect some
>     # of the individual data fields.  Just splitting on whitespace
>     # won't work, because some quoted fields might contain internal
>     # whitespace.  We handle this by folding them all to a simple
>     # "xxx". Fortunately, this script doesn't need to look at any
>     # fields that might need quoting, so this simple hack is
>     # sufficient.
>     $row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
>     @{$row}{@attnames} = split /\s+/, $row->{bki_values};
> 
> We would need a bullet-proof, non-hack, preferably not too slow way to
> split DATA lines into fields properly.  I'm one of the world's worst
> Perl programmers, but surely there's a way?

I've done something like 1) before:
http://archives.postgresql.org/message-id/20150221230839.GE2037%40awork2.anarazel.de

I don't think the speeds matters all that much, because we'll only do it
when generating the .bki file - a couple ms more or less won't matter
much.

I IIRC spent some more time to also load the data files from a different
format:
https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/sane-catalog
although that's presumably heavily outdated now.

- Andres



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andreas Karlsson
Дата:
On 04/12/2017 05:00 PM, Andreas Karlsson wrote:
> Looked at this an option 1 seems simple enough if I am not missing
> something. I might hack something up later tonight. Either way I think
> this improvement can be done separately from the proposed replacement of
> the catalog header files. Trying to fix everything at once often leads
> to nothing being fixed at all.

Here is my proof of concept patch. It does basically the same thing as 
Andres's patch except that it handles quoted values a bit better and 
does not try to support anything other than the regproc type.

The patch speeds up initdb without fsync from 0.80 seconds to 0.55 
seconds, which is a nice speedup, while adding a negligible amount of 
extra work on compilation.

Two things which could be improved in this patch if people deem it 
important:

- Refactor the code to be more generic, like Andres patch, so it is 
easier to add lookups for other data types.

- Write something closer to a real .bki parser to actually understand 
quoted values and _null_ when parsing the data. Right now this patch 
only splits each row into its fields (while being aware of quotes) but 
does not attempt to remove quotes. The PoC patch treats "foo" and foo as 
different.

Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Andreas Karlsson <andreas@proxel.se> writes:
> Here is my proof of concept patch. It does basically the same thing as 
> Andres's patch except that it handles quoted values a bit better and 
> does not try to support anything other than the regproc type.

> The patch speeds up initdb without fsync from 0.80 seconds to 0.55 
> seconds, which is a nice speedup, while adding a negligible amount of 
> extra work on compilation.

I've pushed this with some mostly-cosmetic adjustments:

* created a single subroutine that understands how to split DATA lines,
rather than having several copies of the regex

* rearranged the code so that the data structure returned by
Catalog::Catalogs() isn't scribbled on (which was already
happening before your patch, but it seemed pretty ugly to me)

* stripped out the bootstrap-time name lookup code from all of reg*
not just regproc.

There's certainly lots more that could be done in the genbki code,
but I think all we can justify at this stage of the development
cycle is to get the low-hanging fruit for testing speedups.
        regards, tom lane



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andres Freund
Дата:
On 2017-04-13 12:13:30 -0400, Tom Lane wrote:
> Andreas Karlsson <andreas@proxel.se> writes:
> > Here is my proof of concept patch. It does basically the same thing as 
> > Andres's patch except that it handles quoted values a bit better and 
> > does not try to support anything other than the regproc type.
> 
> > The patch speeds up initdb without fsync from 0.80 seconds to 0.55 
> > seconds, which is a nice speedup, while adding a negligible amount of 
> > extra work on compilation.
> 
> I've pushed this with some mostly-cosmetic adjustments:

Cool.  I wonder if we also should remove AtEOXact_CatCache()'s
cross-checks - the resowner replacement has been in place for a while,
and seems robust enough.  They're now the biggest user of time.

- Andres



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> Cool.  I wonder if we also should remove AtEOXact_CatCache()'s
> cross-checks - the resowner replacement has been in place for a while,
> and seems robust enough.  They're now the biggest user of time.

Hm, biggest user of time in what workload?  I've not noticed that
function particularly.

I agree that it doesn't seem like we need to spend a lot of time
cross-checking there, though.  Maybe keep the code but #ifdef it
under some nondefault debugging symbol.
        regards, tom lane



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andres Freund
Дата:
On 2017-04-13 12:56:14 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Cool.  I wonder if we also should remove AtEOXact_CatCache()'s
> > cross-checks - the resowner replacement has been in place for a while,
> > and seems robust enough.  They're now the biggest user of time.
> 
> Hm, biggest user of time in what workload?  I've not noticed that
> function particularly.

Just initdb.  I presume it's because the catcaches will frequently be
relatively big there.


> I agree that it doesn't seem like we need to spend a lot of time
> cross-checking there, though.  Maybe keep the code but #ifdef it
> under some nondefault debugging symbol.

Hm, if we want to keep it, maybe tie it to CLOBBER_CACHE_ALWAYS or such,
so it gets compiled at least sometimes? Not a great fit, but ...

Greetings,

Andres Freund



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2017-04-13 12:56:14 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> Cool.  I wonder if we also should remove AtEOXact_CatCache()'s
>>> cross-checks - the resowner replacement has been in place for a while,
>>> and seems robust enough.  They're now the biggest user of time.

>> Hm, biggest user of time in what workload?  I've not noticed that
>> function particularly.

> Just initdb.  I presume it's because the catcaches will frequently be
> relatively big there.

Hm.  That ties into something I was looking at yesterday.  The only
reason that function is called so much is that bootstrap mode runs a
separate transaction for *each line of the bki file* (cf do_start,
do_end in bootparse.y).  Which seems pretty silly.  I experimented
with collapsing all the transactions for consecutive DATA lines into
one transaction, but couldn't immediately make it work due to memory
management issues.  I didn't try collapsing the entire run into a
single transaction, but maybe that would actually be easier, though
no doubt more wasteful of memory.

>> I agree that it doesn't seem like we need to spend a lot of time
>> cross-checking there, though.  Maybe keep the code but #ifdef it
>> under some nondefault debugging symbol.

> Hm, if we want to keep it, maybe tie it to CLOBBER_CACHE_ALWAYS or such,
> so it gets compiled at least sometimes? Not a great fit, but ...

Don't like that, because CCA is by definition not the normal cache
behavior.  It would make a bit of sense to tie it to CACHEDEBUG,
but as you say, it'd never get tested normally if we do that.

On the whole, though, we may be looking at diminishing returns here.
I just did some "perf" measurement of the overall "initdb" cycle,
and what I'm seeing suggests that bootstrap mode as such is now a
pretty small fraction of the overall cycle:

+   51.07%     0.01%            28  postgres         postgres                                      [.] PostgresMain
                # 
...
+   13.52%     0.00%             0  postgres         postgres                                      [.]
AuxiliaryProcessMain             # 

That says that the post-bootstrap steps are now the bulk of the time,
which agrees with naked-eye observation.
        regards, tom lane



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andres Freund
Дата:
On 2017-04-13 14:05:43 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-04-13 12:56:14 -0400, Tom Lane wrote:
> >> Andres Freund <andres@anarazel.de> writes:
> >>> Cool.  I wonder if we also should remove AtEOXact_CatCache()'s
> >>> cross-checks - the resowner replacement has been in place for a while,
> >>> and seems robust enough.  They're now the biggest user of time.
> 
> >> Hm, biggest user of time in what workload?  I've not noticed that
> >> function particularly.
> 
> > Just initdb.  I presume it's because the catcaches will frequently be
> > relatively big there.
> 
> Hm.  That ties into something I was looking at yesterday.  The only
> reason that function is called so much is that bootstrap mode runs a
> separate transaction for *each line of the bki file* (cf do_start,
> do_end in bootparse.y).  Which seems pretty silly.

Indeed.


> On the whole, though, we may be looking at diminishing returns here.
> I just did some "perf" measurement of the overall "initdb" cycle,
> and what I'm seeing suggests that bootstrap mode as such is now a
> pretty small fraction of the overall cycle:
> 
> +   51.07%     0.01%            28  postgres         postgres                                      [.] PostgresMain
                  #
 
> ...
> +   13.52%     0.00%             0  postgres         postgres                                      [.]
AuxiliaryProcessMain             #
 
> 
> That says that the post-bootstrap steps are now the bulk of the time,
> which agrees with naked-eye observation.

The AtEOXact_CatCache weren't only in bootstrap mode, but yea, it's by
far smaller wins in comparison to the regprocin thing.

Greetings,

Andres Freund



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andreas Karlsson
Дата:
On 04/13/2017 06:13 PM, Tom Lane wrote:
> I've pushed this with some mostly-cosmetic adjustments:

Thanks! I like your adjustments.

> There's certainly lots more that could be done in the genbki code,
> but I think all we can justify at this stage of the development
> cycle is to get the low-hanging fruit for testing speedups.

Yeah, I also noticed that the genbki code seems to have gotten little 
love and that much more can be done here.

Andreas



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2017-04-13 14:05:43 -0400, Tom Lane wrote:
>> Hm.  That ties into something I was looking at yesterday.  The only
>> reason that function is called so much is that bootstrap mode runs a
>> separate transaction for *each line of the bki file* (cf do_start,
>> do_end in bootparse.y).  Which seems pretty silly.

> Indeed.

I failed to resist the temptation to poke at this, and found that
indeed nothing seems to break if we just use one transaction for the
whole processing of postgres.bki.  So I've pushed a patch that does
that.  We're definitely down to the point where worrying about the
speed of bootstrap mode, per se, is useless; the other steps in
initdb visibly take a lot more time.
        regards, tom lane



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andreas Karlsson
Дата:
On 04/14/2017 11:54 PM, Tom Lane wrote:
> I failed to resist the temptation to poke at this, and found that
> indeed nothing seems to break if we just use one transaction for the
> whole processing of postgres.bki.  So I've pushed a patch that does
> that.  We're definitely down to the point where worrying about the
> speed of bootstrap mode, per se, is useless; the other steps in
> initdb visibly take a lot more time.

Looked some at this and what take time now for me seems to mainly be 
these four things (out of a total runtime of 560 ms).

1. setup_conversion:        140 ms
2. select_default_timezone:  90 ms
3. bootstrap_template1:      80 ms
4. setup_schema:             65 ms

These four take up about two thirds of the total runtime, so it seems 
likely that we may still have relatively low hanging fruit (but not 
worth committing for PostgreSQL 10).

I have not done profiling of these functions yet, so am not sure how 
they best would be fixed but maybe setup_conversion could be converted 
into bki entries to speed it up.

Andreas



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Gavin Flower
Дата:
On 15/04/17 13:44, Andreas Karlsson wrote:
> On 04/14/2017 11:54 PM, Tom Lane wrote:
>> I failed to resist the temptation to poke at this, and found that
>> indeed nothing seems to break if we just use one transaction for the
>> whole processing of postgres.bki.  So I've pushed a patch that does
>> that.  We're definitely down to the point where worrying about the
>> speed of bootstrap mode, per se, is useless; the other steps in
>> initdb visibly take a lot more time.
>
> Looked some at this and what take time now for me seems to mainly be 
> these four things (out of a total runtime of 560 ms).
>
> 1. setup_conversion:        140 ms
> 2. select_default_timezone:  90 ms
> 3. bootstrap_template1:      80 ms
> 4. setup_schema:             65 ms
>
> These four take up about two thirds of the total runtime, so it seems 
> likely that we may still have relatively low hanging fruit (but not 
> worth committing for PostgreSQL 10).
>
> I have not done profiling of these functions yet, so am not sure how 
> they best would be fixed but maybe setup_conversion could be converted 
> into bki entries to speed it up.
>
> Andreas
>
>
How much could be done concurrently?


Cheers.
Gavin




Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Andreas Karlsson <andreas@proxel.se> writes:
> Looked some at this and what take time now for me seems to mainly be 
> these four things (out of a total runtime of 560 ms).

> 1. setup_conversion:        140 ms
> 2. select_default_timezone:  90 ms
> 3. bootstrap_template1:      80 ms
> 4. setup_schema:             65 ms

FWIW, you can bypass select_default_timezone by setting environment
variable TZ to a valid timezone name before calling initdb.  On
my workstation, that currently cuts the time for "initdb --no-sync"
from about 1.25 sec to just about exactly 1.0 sec.

I doubt that we want all the buildfarm members to switch over to
doing that, since then we'd lose coverage of select_default_timezone.
But it's interesting to speculate about having the buildfarm script
extract the selected timezone name out of postgresql.conf after the
first initdb it does, and then set TZ for remaining tests.  We surely
don't need to test select_default_timezone more than once per
buildfarm run.
        regards, tom lane



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andrew Dunstan
Дата:

On 04/15/2017 12:07 AM, Tom Lane wrote:
> Andreas Karlsson <andreas@proxel.se> writes:
>> Looked some at this and what take time now for me seems to mainly be 
>> these four things (out of a total runtime of 560 ms).
>> 1. setup_conversion:        140 ms
>> 2. select_default_timezone:  90 ms
>> 3. bootstrap_template1:      80 ms
>> 4. setup_schema:             65 ms
> FWIW, you can bypass select_default_timezone by setting environment
> variable TZ to a valid timezone name before calling initdb.  On
> my workstation, that currently cuts the time for "initdb --no-sync"
> from about 1.25 sec to just about exactly 1.0 sec.
>
> I doubt that we want all the buildfarm members to switch over to
> doing that, since then we'd lose coverage of select_default_timezone.
> But it's interesting to speculate about having the buildfarm script
> extract the selected timezone name out of postgresql.conf after the
> first initdb it does, and then set TZ for remaining tests.  We surely
> don't need to test select_default_timezone more than once per
> buildfarm run.
>
>             



Yeah. The obvious place to do this would be the "make check" stage,
which runs very early. Unfortunately, pg_regress carefully removes its
data directory on success - kind of a pity that's not switchable.

Probably for now the simplest thing for buildfarm animals whose owners
are trying to squeeze every last second would be to put something like
   TZ => 'Us/Eastern',

in the build_env section of the config file. But as you say we don't
want everyone doing that.

Alternatively, we could have an initdb TAP test that explicitly removed
the environment setting so we'd get coverage of select_default_timezone,
and have the buildfarm set TZ to something if it's not already set.

cheers

andrew

-- 

Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Alvaro Herrera
Дата:
Andrew Dunstan wrote:

> Alternatively, we could have an initdb TAP test that explicitly removed
> the environment setting so we'd get coverage of select_default_timezone,
> and have the buildfarm set TZ to something if it's not already set.

What about having an initdb option that runs select_default_timezone
only and reports the result, so that it can be used in the buildfarm
script to set TZ in all the regular initdb calls?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andrew Dunstan
Дата:

On 04/15/2017 11:44 AM, Alvaro Herrera wrote:
> Andrew Dunstan wrote:
>
>> Alternatively, we could have an initdb TAP test that explicitly removed
>> the environment setting so we'd get coverage of select_default_timezone,
>> and have the buildfarm set TZ to something if it's not already set.
> What about having an initdb option that runs select_default_timezone
> only and reports the result, so that it can be used in the buildfarm
> script to set TZ in all the regular initdb calls?
>


Seems like more work.

What I had in mind was the attached plus roughly this in the buildfarm
client:

    $ENV{TZ} ||= 'US/Eastern';

or whatever zone we choose to use.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> What I had in mind was the attached plus roughly this in the buildfarm
> client:
>     $ENV{TZ} ||= 'US/Eastern';
> or whatever zone we choose to use.

How about letting the first "make check" run with whatever is in the
environment, and then forcing TZ to become set (much as above) for
all the remaining tests?  I'm afraid what you've got here might
encourage a certain sameness of the test environments.
        regards, tom lane



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andrew Dunstan
Дата:

On 04/15/2017 12:13 PM, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> What I had in mind was the attached plus roughly this in the buildfarm
>> client:
>>     $ENV{TZ} ||= 'US/Eastern';
>> or whatever zone we choose to use.
> How about letting the first "make check" run with whatever is in the
> environment, and then forcing TZ to become set (much as above) for
> all the remaining tests?  I'm afraid what you've got here might
> encourage a certain sameness of the test environments.
>
>             


Sure. Just means putting this code a bit later in the file. "make check"
is only one initdb, so it won't cost much. I'm still inclined to force a
TAP test for initdb with no TZ set, though.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> Sure. Just means putting this code a bit later in the file. "make check"
> is only one initdb, so it won't cost much. I'm still inclined to force a
> TAP test for initdb with no TZ set, though.

I'd like that to be an additional tweak for some existing test case
rather than expending a whole initdb run just for that --- but otherwise,
no objection.
        regards, tom lane



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andrew Dunstan
Дата:

On 04/15/2017 02:13 PM, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> Sure. Just means putting this code a bit later in the file. "make check"
>> is only one initdb, so it won't cost much. I'm still inclined to force a
>> TAP test for initdb with no TZ set, though.
> I'd like that to be an additional tweak for some existing test case
> rather than expending a whole initdb run just for that --- but otherwise,
> no objection.
>
>             


That's what my patch did.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Peter Eisentraut
Дата:
On 4/15/17 12:33, Andrew Dunstan wrote:
> Sure. Just means putting this code a bit later in the file. "make check"
> is only one initdb, so it won't cost much. I'm still inclined to force a
> TAP test for initdb with no TZ set, though.

How much is this going to buy overall?  Is it worth the complications?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andrew Dunstan
Дата:

On 04/16/2017 07:43 PM, Peter Eisentraut wrote:
> On 4/15/17 12:33, Andrew Dunstan wrote:
>> Sure. Just means putting this code a bit later in the file. "make check"
>> is only one initdb, so it won't cost much. I'm still inclined to force a
>> TAP test for initdb with no TZ set, though.
> How much is this going to buy overall?  Is it worth the complications?
>


I don't know, but it's a very small change.

I am going to spend some time instrumenting where the time goes in
various tests.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> There's certainly lots more that could be done in the genbki code,
> but I think all we can justify at this stage of the development
> cycle is to get the low-hanging fruit for testing speedups.

I threw Devel::NYTProf at it and picked some more low-hanging fruit.
Attached are separate patches for each change, and here are the runtimes
of genbki.pl and Gen_fmgrtab.pl, respectively, after each patch
(averages of 5 runs, in millseconds):

master (b6dd1271): 355, 182

1: Avoid unnecessary regex captures: 349, 183
2: Avoid repeated calls to SplitDataLine: 316, 158
3: Inline SplitDataLine: 291, 141
4: Inline check_natts: 287, 141

Together they shave 68ms or 19.2% off the runtime of genbki.pl and 41ms
or 22.5% off the runtime of Gen_fmgrtab.pl

Finally, one non-performance patch, which just removes the use of
Exporter in Catalog.pm, since none of the users actually import anything
from it.

- ilmari
-- 
"I use RMS as a guide in the same way that a boat captain would use
 a lighthouse.  It's good to know where it is, but you generally
 don't want to find yourself in the same spot." - Tollef Fog Heen

From 5d7f4b1e78243daa6939501b88b2644bfcf9bd77 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 13:26:35 +0100
Subject: [PATCH 1/8] Avoid unnecessary regex captures in Catalog.pm

---
 src/backend/catalog/Catalog.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index fa8de04..e7b647a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -54,7 +54,7 @@ sub Catalogs
         {
 
             # Strip C-style comments.
-            s;/\*(.|\n)*\*/;;g;
+            s;/\*(?:.|\n)*\*/;;g;
             if (m;/\*;)
             {
 
@@ -80,12 +80,12 @@ sub Catalogs
             {
                 $catalog{natts} = $1;
             }
-            elsif (/^DATA\(insert(\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
+            elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
             {
-                check_natts($filename, $catalog{natts}, $3,
+                check_natts($filename, $catalog{natts}, $2,
                             $input_file, $input_line_number);
 
-                push @{ $catalog{data} }, { oid => $2, bki_values => $3 };
+                push @{ $catalog{data} }, { oid => $1, bki_values => $2 };
             }
             elsif (/^DESCR\(\"(.*)\"\)$/)
             {
-- 
2.7.4

From 92402e0306eda209b1a2811acc41325d75add0cb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 14:04:20 +0100
Subject: [PATCH 2/8] Avoid repeated calls to Catalog::SplitDataLine

Both check_natts and the callers of Catalog::Catalogs were calling
Catalog::SplitDataLines, the former discarding the result.

Instead, have Catalog::Catalogs store the split fields directly and pass
the count to check_natts
---
 src/backend/catalog/Catalog.pm   | 14 +++++++-------
 src/backend/catalog/genbki.pl    |  3 +--
 src/backend/utils/Gen_fmgrtab.pl |  3 +--
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index e7b647a..0c508b0 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -82,10 +82,12 @@ sub Catalogs
             }
             elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
             {
-                check_natts($filename, $catalog{natts}, $2,
+                my @bki_values = SplitDataLine($2);
+
+                check_natts($filename, $catalog{natts}, scalar(@bki_values),
                             $input_file, $input_line_number);
 
-                push @{ $catalog{data} }, { oid => $1, bki_values => $2 };
+                push @{ $catalog{data} }, { oid => $1, bki_values => \@bki_values };
             }
             elsif (/^DESCR\(\"(.*)\"\)$/)
             {
@@ -254,17 +256,15 @@ sub RenameTempFile
 # verify the number of fields in the passed-in DATA line
 sub check_natts
 {
-    my ($catname, $natts, $bki_val, $file, $line) = @_;
+    my ($catname, $natts, $bki_vals, $file, $line) = @_;
 
     die "Could not find definition for Natts_${catname} before start of DATA() in $file\n"
         unless defined $natts;
 
-    my $nfields = scalar(SplitDataLine($bki_val));
-
     die sprintf
         "Wrong number of attributes in DATA() entry at %s:%d (expected %d but got %d)\n",
-        $file, $line, $natts, $nfields
-      unless $natts == $nfields;
+        $file, $line, $natts, $bki_vals
+    unless $natts == $bki_vals;
 }
 
 1;
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 198e072..8875f6c 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -161,9 +161,8 @@ foreach my $catname (@{ $catalogs->{names} })
         foreach my $row (@{ $catalog->{data} })
         {
 
-            # Split line into tokens without interpreting their meaning.
             my %bki_values;
-            @bki_values{@attnames} = Catalog::SplitDataLine($row->{bki_values});
+            @bki_values{@attnames} = @{$row->{bki_values}};
 
             # Perform required substitutions on fields
             foreach my $att (keys %bki_values)
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index 76bdf5c..d2c4617 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -58,9 +58,8 @@ foreach my $column (@{ $catalogs->{pg_proc}->{columns} })
 my $data = $catalogs->{pg_proc}->{data};
 foreach my $row (@$data)
 {
-    # Split line into tokens without interpreting their meaning.
     my %bki_values;
-    @bki_values{@attnames} = Catalog::SplitDataLine($row->{bki_values});
+    @bki_values{@attnames} = @{$row->{bki_values}};
 
     # Select out just the rows for internal-language procedures.
     # Note assumption here that INTERNALlanguageId is 12.
-- 
2.7.4

From c8e55bd9ff36029c3f4fe7053f54f9f862c79d7e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 14:47:06 +0100
Subject: [PATCH 3/8] Inline SplitDataLines into its only caller

---
 src/backend/catalog/Catalog.pm | 38 ++++++++++++++------------------------
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 0c508b0..9bd6263 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -82,12 +82,24 @@ sub Catalogs
             }
             elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
             {
-                my @bki_values = SplitDataLine($2);
+                # The field-splitting regex will clobber $1, save it
+                my $oid = $1;
+
+                # This handling of quoted strings might look too
+                # simplistic, but it matches what bootscanner.l does:
+                # that has no provision for quote marks inside quoted
+                # strings, either.  If we don't have a quoted string,
+                # just snarf everything till next whitespace.  That will
+                # accept some things that bootscanner.l will see as
+                # erroneous tokens; but it seems wiser to do that and
+                # let bootscanner.l complain than to silently drop
+                # non-whitespace characters.
+                my @bki_values = $2 =~ /"[^"]*"|\S+/g;
 
                 check_natts($filename, $catalog{natts}, scalar(@bki_values),
                             $input_file, $input_line_number);
 
-                push @{ $catalog{data} }, { oid => $1, bki_values => \@bki_values };
+                push @{ $catalog{data} }, { oid => $oid, bki_values => \@bki_values };
             }
             elsif (/^DESCR\(\"(.*)\"\)$/)
             {
@@ -218,28 +230,6 @@ sub Catalogs
     return \%catalogs;
 }
 
-# Split a DATA line into fields.
-# Call this on the bki_values element of a DATA item returned by Catalogs();
-# it returns a list of field values.  We don't strip quoting from the fields.
-# Note: it should be safe to assign the result to a list of length equal to
-# the nominal number of catalog fields, because check_natts already checked
-# the number of fields.
-sub SplitDataLine
-{
-    my $bki_values = shift;
-
-    # This handling of quoted strings might look too simplistic, but it
-    # matches what bootscanner.l does: that has no provision for quote marks
-    # inside quoted strings, either.  If we don't have a quoted string, just
-    # snarf everything till next whitespace.  That will accept some things
-    # that bootscanner.l will see as erroneous tokens; but it seems wiser
-    # to do that and let bootscanner.l complain than to silently drop
-    # non-whitespace characters.
-    my @result = $bki_values =~ /"[^"]*"|\S+/g;
-
-    return @result;
-}
-
 # Rename temporary files to final names.
 # Call this function with the final file name and the .tmp extension
 # Note: recommended extension is ".tmp$$", so that parallel make steps
-- 
2.7.4

From fc6c69a692002277378308c2f6ca019185ef9fbb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 14:38:22 +0100
Subject: [PATCH 4/8] Inline check_nattrs into its only caller

---
 src/backend/catalog/Catalog.pm | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 9bd6263..0aa3e0c 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -96,8 +96,14 @@ sub Catalogs
                 # non-whitespace characters.
                 my @bki_values = $2 =~ /"[^"]*"|\S+/g;
 
-                check_natts($filename, $catalog{natts}, scalar(@bki_values),
-                            $input_file, $input_line_number);
+                # Check that the DATA line matches the declared number of attributes
+                die "Could not find definition for Natts_${catname} before start of DATA() in $filename\n"
+                    unless defined $catalog{natts};
+
+                die sprintf
+                    "Wrong number of attributes in DATA() entry at %s:%d (expected %d but got %d)\n",
+                    $filename, $input_line_number, $catalog{natts}, scalar(@bki_values)
+                unless $catalog{natts} == @bki_values;
 
                 push @{ $catalog{data} }, { oid => $oid, bki_values => \@bki_values };
             }
@@ -243,18 +249,4 @@ sub RenameTempFile
     rename($temp_name, $final_name) || die "rename: $temp_name: $!";
 }
 
-# verify the number of fields in the passed-in DATA line
-sub check_natts
-{
-    my ($catname, $natts, $bki_vals, $file, $line) = @_;
-
-    die "Could not find definition for Natts_${catname} before start of DATA() in $file\n"
-        unless defined $natts;
-
-    die sprintf
-        "Wrong number of attributes in DATA() entry at %s:%d (expected %d but got %d)\n",
-        $file, $line, $natts, $bki_vals
-    unless $natts == $bki_vals;
-}
-
 1;
-- 
2.7.4

From 592dbc64d21066f5da43d535af2cb3780b95d006 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 15:57:18 +0100
Subject: [PATCH 6/8] Remove pointless Exporter usage in Catalog.pm

None of the users actually import any of the subroutines, they just call
them by their fully-qualified names.
---
 src/backend/catalog/Catalog.pm | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index b326427..d4b4170 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -16,11 +16,6 @@ package Catalog;
 use strict;
 use warnings;
 
-require Exporter;
-our @ISA       = qw(Exporter);
-our @EXPORT    = ();
-our @EXPORT_OK = qw(Catalogs SplitDataLine RenameTempFile);
-
 # Call this function with an array of names of header files to parse.
 # Returns a nested data structure describing the data in the headers.
 sub Catalogs
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andres Freund
Дата:
On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > There's certainly lots more that could be done in the genbki code,
> > but I think all we can justify at this stage of the development
> > cycle is to get the low-hanging fruit for testing speedups.
> 
> I threw Devel::NYTProf at it and picked some more low-hanging fruit.
> Attached are separate patches for each change, and here are the runtimes
> of genbki.pl and Gen_fmgrtab.pl, respectively, after each patch
> (averages of 5 runs, in millseconds):
> 
> master (b6dd1271): 355, 182
> 
> 1: Avoid unnecessary regex captures: 349, 183
> 2: Avoid repeated calls to SplitDataLine: 316, 158
> 3: Inline SplitDataLine: 291, 141
> 4: Inline check_natts: 287, 141
> 
> Together they shave 68ms or 19.2% off the runtime of genbki.pl and 41ms
> or 22.5% off the runtime of Gen_fmgrtab.pl

I'm a bit doubtful about improving the performance of genbki at the cost
of any sort of complication - it's only executed during the actual
build, not during initdb...  I don't see much point in doing things like
3) and 4), it's just not worth it imo.

- Andres



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
>> I threw Devel::NYTProf at it and picked some more low-hanging fruit.

> I'm a bit doubtful about improving the performance of genbki at the cost
> of any sort of complication - it's only executed during the actual
> build, not during initdb...  I don't see much point in doing things like
> 3) and 4), it's just not worth it imo.

Yeah, it's only run once per build, or probably even less often than that
considering we only re-run it when the input files change.  I could get
interested in another 20% reduction in initdb's time, but not genbki's.
        regards, tom lane



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andres Freund
Дата:
On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > There's certainly lots more that could be done in the genbki code,
> > but I think all we can justify at this stage of the development
> > cycle is to get the low-hanging fruit for testing speedups.
> 
> I threw Devel::NYTProf at it and picked some more low-hanging fruit.
> Attached are separate patches for each change, and here are the runtimes
> of genbki.pl and Gen_fmgrtab.pl, respectively, after each patch
> (averages of 5 runs, in millseconds):

Btw, I think Tom's "more that could be done" was referring more to doing
more upfront work, checking, easier input format, whatnot in the genbki,
not so much performance work...  Tom, correct me if I'm wrong.

- Andres



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> Btw, I think Tom's "more that could be done" was referring more to doing
> more upfront work, checking, easier input format, whatnot in the genbki,
> not so much performance work...  Tom, correct me if I'm wrong.

Yeah, as per what we were just saying, performance of that code isn't
really an issue right now.  Supporting a nicer input format is probably
the biggest step forward that could be taken, but that will require some
major work :-(.  We've also talked about things like supporting
regprocedure rather than only regproc (ie disambiguating overloaded
functions), likewise regoperator, and so on, with an eye to reducing
the number of places where people have to write numeric OIDs in the
input.  There's some threads on this in the archives, but I'm too
lazy to look.
        regards, tom lane



Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Andres Freund <andres@anarazel.de> writes:
>> On 2017-04-17 17:49:54 +0100, Dagfinn Ilmari Mannsåker wrote:
>>> I threw Devel::NYTProf at it and picked some more low-hanging fruit.
>
>> I'm a bit doubtful about improving the performance of genbki at the cost
>> of any sort of complication - it's only executed during the actual
>> build, not during initdb...  I don't see much point in doing things like
>> 3) and 4), it's just not worth it imo.
>
> Yeah, it's only run once per build, or probably even less often than that
> considering we only re-run it when the input files change.  I could get
> interested in another 20% reduction in initdb's time, but not genbki's.

Fair enough.  I still think 1), 2) and 5) are worthwile code cleanups
regardless of the performance impact.  In fact, if we don't care that
much about the performance, we can move the duplicated code in
Gen_fmgrmtab.pl and genbki.pl that turns bki_values into a hash into
Catalogs.pm.  That regresses genbki.pl time by ~30ms on my machine.

Revised patch series attached.

-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.               - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.                      - Calle Dybedahl

From f7b75bcbe993d4ddbc92da85c7148bf7fee143ee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 13:26:35 +0100
Subject: [PATCH 1/4] Avoid unnecessary regex captures in Catalog.pm

---
 src/backend/catalog/Catalog.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index fa8de04..e7b647a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -54,7 +54,7 @@ sub Catalogs
         {
 
             # Strip C-style comments.
-            s;/\*(.|\n)*\*/;;g;
+            s;/\*(?:.|\n)*\*/;;g;
             if (m;/\*;)
             {
 
@@ -80,12 +80,12 @@ sub Catalogs
             {
                 $catalog{natts} = $1;
             }
-            elsif (/^DATA\(insert(\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
+            elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
             {
-                check_natts($filename, $catalog{natts}, $3,
+                check_natts($filename, $catalog{natts}, $2,
                             $input_file, $input_line_number);
 
-                push @{ $catalog{data} }, { oid => $2, bki_values => $3 };
+                push @{ $catalog{data} }, { oid => $1, bki_values => $2 };
             }
             elsif (/^DESCR\(\"(.*)\"\)$/)
             {
-- 
2.7.4

From decd1b8c171cc508da5f79f01d2f0779a569a963 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 14:04:20 +0100
Subject: [PATCH 2/4] Avoid repeated calls to Catalog::SplitDataLine

Both check_natts and the callers of Catalog::Catalogs were calling
Catalog::SplitDataLines, the former discarding the result.

Instead, have Catalog::Catalogs store the split fields directly and pass
the count to check_natts
---
 src/backend/catalog/Catalog.pm   | 14 +++++++-------
 src/backend/catalog/genbki.pl    |  3 +--
 src/backend/utils/Gen_fmgrtab.pl |  3 +--
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index e7b647a..81513c7 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -82,10 +82,12 @@ sub Catalogs
             }
             elsif (/^DATA\(insert(?:\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
             {
-                check_natts($filename, $catalog{natts}, $2,
+                my @bki_values = SplitDataLine($2);
+
+                check_natts($filename, $catalog{natts}, scalar(@bki_values),
                             $input_file, $input_line_number);
 
-                push @{ $catalog{data} }, { oid => $1, bki_values => $2 };
+                push @{ $catalog{data} }, { oid => $1, bki_values => \@bki_values };
             }
             elsif (/^DESCR\(\"(.*)\"\)$/)
             {
@@ -254,17 +256,15 @@ sub RenameTempFile
 # verify the number of fields in the passed-in DATA line
 sub check_natts
 {
-    my ($catname, $natts, $bki_val, $file, $line) = @_;
+    my ($catname, $natts, $bki_vals, $file, $line) = @_;
 
     die "Could not find definition for Natts_${catname} before start of DATA() in $file\n"
         unless defined $natts;
 
-    my $nfields = scalar(SplitDataLine($bki_val));
-
     die sprintf
         "Wrong number of attributes in DATA() entry at %s:%d (expected %d but got %d)\n",
-        $file, $line, $natts, $nfields
-      unless $natts == $nfields;
+        $file, $line, $natts, $bki_vals
+      unless $natts == $bki_vals;
 }
 
 1;
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 198e072..8875f6c 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -161,9 +161,8 @@ foreach my $catname (@{ $catalogs->{names} })
         foreach my $row (@{ $catalog->{data} })
         {
 
-            # Split line into tokens without interpreting their meaning.
             my %bki_values;
-            @bki_values{@attnames} = Catalog::SplitDataLine($row->{bki_values});
+            @bki_values{@attnames} = @{$row->{bki_values}};
 
             # Perform required substitutions on fields
             foreach my $att (keys %bki_values)
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index 76bdf5c..d2c4617 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -58,9 +58,8 @@ foreach my $column (@{ $catalogs->{pg_proc}->{columns} })
 my $data = $catalogs->{pg_proc}->{data};
 foreach my $row (@$data)
 {
-    # Split line into tokens without interpreting their meaning.
     my %bki_values;
-    @bki_values{@attnames} = Catalog::SplitDataLine($row->{bki_values});
+    @bki_values{@attnames} = @{$row->{bki_values}};
 
     # Select out just the rows for internal-language procedures.
     # Note assumption here that INTERNALlanguageId is 12.
-- 
2.7.4

From a5b8e34bfab6a687d6b13293c27f1f1246bffacb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 15:19:33 +0100
Subject: [PATCH 3/4] Hashify bki values in Catlog::Catalogs

Both callers werere doing the same conversion of the bki_values array
into a hash, so do it just once in Catalog.pm
---
 src/backend/catalog/Catalog.pm   |  5 ++++-
 src/backend/catalog/genbki.pl    | 23 +++++++++++------------
 src/backend/utils/Gen_fmgrtab.pl | 18 ++++++------------
 3 files changed, 21 insertions(+), 25 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 81513c7..9c06ec1 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -87,7 +87,10 @@ sub Catalogs
                 check_natts($filename, $catalog{natts}, scalar(@bki_values),
                             $input_file, $input_line_number);
 
-                push @{ $catalog{data} }, { oid => $1, bki_values => \@bki_values };
+                my %bki_values;
+                @bki_values{map $_->{name}, @{$catalog{columns}}} = @bki_values;
+
+                push @{ $catalog{data} }, { oid => $oid, bki_values => \%bki_values };
             }
             elsif (/^DESCR\(\"(.*)\"\)$/)
             {
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 8875f6c..fba157e 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -161,24 +161,23 @@ foreach my $catname (@{ $catalogs->{names} })
         foreach my $row (@{ $catalog->{data} })
         {
 
-            my %bki_values;
-            @bki_values{@attnames} = @{$row->{bki_values}};
+            my $bki_values = $row->{bki_values};
 
             # Perform required substitutions on fields
-            foreach my $att (keys %bki_values)
+            foreach my $att (keys %$bki_values)
             {
                 # Substitute constant values we acquired above.
                 # (It's intentional that this can apply to parts of a field).
-                $bki_values{$att} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
-                $bki_values{$att} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;
+                $bki_values->{$att} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
+                $bki_values->{$att} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;
 
                 # Replace regproc columns' values with OIDs.
                 # If we don't have a unique value to substitute,
                 # just do nothing (regprocin will complain).
                 if ($bki_attr{$att}->{type} eq 'regproc')
                 {
-                    my $procoid = $regprocoids{$bki_values{$att}};
-                    $bki_values{$att} = $procoid
+                    my $procoid = $regprocoids{$bki_values->{$att}};
+                    $bki_values->{$att} = $procoid
                         if defined($procoid) && $procoid ne 'MULTIPLE';
                 }
             }
@@ -187,20 +186,20 @@ foreach my $catname (@{ $catalogs->{names} })
             # This relies on the order we process the files in!
             if ($catname eq 'pg_proc')
             {
-                if (defined($regprocoids{$bki_values{proname}}))
+                if (defined($regprocoids{$bki_values->{proname}}))
                 {
-                    $regprocoids{$bki_values{proname}} = 'MULTIPLE';
+                    $regprocoids{$bki_values->{proname}} = 'MULTIPLE';
                 }
                 else
                 {
-                    $regprocoids{$bki_values{proname}} = $row->{oid};
+                    $regprocoids{$bki_values->{proname}} = $row->{oid};
                 }
             }
 
             # Save pg_type info for pg_attribute processing below
             if ($catname eq 'pg_type')
             {
-                my %type = %bki_values;
+                my %type = %$bki_values;
                 $type{oid} = $row->{oid};
                 push @types, \%type;
             }
@@ -208,7 +207,7 @@ foreach my $catname (@{ $catalogs->{names} })
             # Write to postgres.bki
             my $oid = $row->{oid} ? "OID = $row->{oid} " : '';
             printf $bki "insert %s( %s )\n", $oid,
-              join(' ', @bki_values{@attnames});
+              join(' ', @{$bki_values}{@attnames});
 
             # Write comments to postgres.description and postgres.shdescription
             if (defined $row->{descr})
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index d2c4617..03a2ed5 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -49,28 +49,22 @@ my $catalogs = Catalog::Catalogs($infile);
 
 # Collect the raw data from pg_proc.h.
 my @fmgr = ();
-my @attnames;
-foreach my $column (@{ $catalogs->{pg_proc}->{columns} })
-{
-    push @attnames, $column->{name};
-}
 
 my $data = $catalogs->{pg_proc}->{data};
 foreach my $row (@$data)
 {
-    my %bki_values;
-    @bki_values{@attnames} = @{$row->{bki_values}};
+    my $bki_values = $row->{bki_values};
 
     # Select out just the rows for internal-language procedures.
     # Note assumption here that INTERNALlanguageId is 12.
-    next if $bki_values{prolang} ne '12';
+    next if $bki_values->{prolang} ne '12';
 
     push @fmgr,
       { oid    => $row->{oid},
-        strict => $bki_values{proisstrict},
-        retset => $bki_values{proretset},
-        nargs  => $bki_values{pronargs},
-        prosrc => $bki_values{prosrc}, };
+        strict => $bki_values->{proisstrict},
+        retset => $bki_values->{proretset},
+        nargs  => $bki_values->{pronargs},
+        prosrc => $bki_values->{prosrc}, };
 }
 
 # Emit headers for both files
-- 
2.7.4

From 036b1f6a01ac518501056b10b6dd011fda6815ed Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Mon, 17 Apr 2017 15:57:18 +0100
Subject: [PATCH 4/4] Remove pointless Exporter usage in Catalog.pm

None of the users actually import any of the subroutines, they just call
them by their fully-qualified names.
---
 src/backend/catalog/Catalog.pm | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 9c06ec1..662df3b 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -16,11 +16,6 @@ package Catalog;
 use strict;
 use warnings;
 
-require Exporter;
-our @ISA       = qw(Exporter);
-our @EXPORT    = ();
-our @EXPORT_OK = qw(Catalogs SplitDataLine RenameTempFile);
-
 # Call this function with an array of names of header files to parse.
 # Returns a nested data structure describing the data in the headers.
 sub Catalogs
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Resurrecting this old thread ...

I decided it'd be interesting to re-examine where initdb's runtime
is going, seeing that we just got done with a lot of bootstrap data
restructuring.  I stuck some timing code into initdb, and got
results like this:

creating directory /home/postgres/testversion/data ... ok
elapsed = 0.256 msec
creating subdirectories ... ok
elapsed = 2.385 msec
selecting default max_connections ... 100
elapsed = 13.528 msec
selecting default shared_buffers ... 128MB
elapsed = 13.699 msec
selecting dynamic shared memory implementation ... posix
elapsed = 0.129 msec
elapsed = 281.335 msec in select_default_timezone
creating configuration files ... ok
elapsed = 1.319 msec
running bootstrap script ... ok
elapsed = 162.143 msec
performing post-bootstrap initialization ... ok
elapsed = 832.569 msec

Sync to disk skipped.

real    0m1.316s
user    0m0.941s
sys     0m0.395s

(I'm using "initdb -N" because the cost of the sync step is so
platform-dependent, and it's not interesting anyway for buildfarm
or make check-world testing.  Also, I rearranged the code slightly
so that select_default_timezone could be timed separately from the
rest of the "creating configuration files" step.)

In trying to break down the "post-bootstrap initialization" step
a bit further, I soon realized that trying to time the sub-steps from
initdb is useless.  initdb is just shoving bytes down the pipe as
fast as the kernel will let it; it has no idea how long it's taking
the backend to do any one query or queries.  So I ended up adding
"-c log_statement=all -c log_min_duration_statement=0" to the
backend_options, and digging query durations out of the log output.
I got these totals for the major steps in the post-boot run:

pg_authid setup: 0.909 ms
pg_depend setup: 64.980 ms
system views: 106.221 ms
pg_description: 39.665 ms
pg_collation: 65.162 ms
conversions: 72.024 ms
text search: 29.454 ms
init-acl hacking: 14.339 ms
information schema: 188.497 ms
plpgsql: 2.531 ms
analyze/vacuum/additional db creation: 171.762 ms

So the conversions don't look nearly as interesting as Andreas
suggested upthread.  Pushing them into .bki format would at best
save ~ 70 ms out of 1300.  Which is not nothing, but it's not
going to change the world either.

Really the only thing here that jumps out as being unduly expensive for
what it's doing is select_default_timezone.  That is, and always has been,
a brute-force algorithm; I wonder if there's a way to do better?  We can
probably guess that every non-Windows platform is using the IANA timezone
data these days.  If there were some way to extract the name of the active
timezone setting directly, we wouldn't have to try to reverse-engineer it.
But I don't know of any portable way :-(

            regards, tom lane


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Robert Haas
Дата:
On Tue, May 8, 2018 at 4:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Really the only thing here that jumps out as being unduly expensive for
> what it's doing is select_default_timezone.  That is, and always has been,
> a brute-force algorithm; I wonder if there's a way to do better?  We can
> probably guess that every non-Windows platform is using the IANA timezone
> data these days.  If there were some way to extract the name of the active
> timezone setting directly, we wouldn't have to try to reverse-engineer it.
> But I don't know of any portable way :-(

Who says we need a portable way?  If we had something that worked on
Linux and macOS, it would cover most developer environments.  I wonder
if readlink("/etc/localtime", buf, sz) might be a viable approach.
You could, at least, try any time zone you can derive that way first,
before you try any others.

Also, how about having a --timezone option for initdb?  Then you could
determine the time zone just once and pass it down to subsequent
initdb invocations.  Or just hardcode the regression tests to use some
fixed time zone, like Antarctica/Troll.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, May 8, 2018 at 4:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Really the only thing here that jumps out as being unduly expensive for
>> what it's doing is select_default_timezone.  That is, and always has been,
>> a brute-force algorithm; I wonder if there's a way to do better?

> Who says we need a portable way?  If we had something that worked on
> Linux and macOS, it would cover most developer environments.  I wonder
> if readlink("/etc/localtime", buf, sz) might be a viable approach.

I wondered about that, but I'm afraid it's often a hardlink not a
symlink.  Still, we could try it.

> Also, how about having a --timezone option for initdb?

We already have that, it's called the TZ environment variable.

There was an effort awhile back to try to speed up the buildfarm
by having that get set automatically, but it failed for reasons
I don't recall ATM.

            regards, tom lane


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Tue, May 8, 2018 at 4:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Really the only thing here that jumps out as being unduly expensive for
> >> what it's doing is select_default_timezone.  That is, and always has been,
> >> a brute-force algorithm; I wonder if there's a way to do better?
> 
> > Who says we need a portable way?  If we had something that worked on
> > Linux and macOS, it would cover most developer environments.  I wonder
> > if readlink("/etc/localtime", buf, sz) might be a viable approach.
> 
> I wondered about that, but I'm afraid it's often a hardlink not a
> symlink.  Still, we could try it.

In Debian systems, it's a symlink.  Apparently in RHEL6 and older it's a
copy or hardlink, and the file /etc/sysconfig/clock contains a ZONE
variable that points to the right zone.  Maybe if we add enough
platform-dependent hacks, we would use the slow fallback only for rare
cases.  (Maybe have initdb emit a warning when the fallback is used, so
that we know what else to look for.)

This comment is insightful:
https://github.com/HowardHinnant/date/issues/269#issuecomment-353792132

It's talking about this code:
https://github.com/HowardHinnant/date/blob/master/src/tz.cpp#L3652

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Robert Haas
Дата:
On Wed, May 9, 2018 at 11:39 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> In Debian systems, it's a symlink.  Apparently in RHEL6 and older it's a
> copy or hardlink, and the file /etc/sysconfig/clock contains a ZONE
> variable that points to the right zone.  Maybe if we add enough
> platform-dependent hacks, we would use the slow fallback only for rare
> cases.  (Maybe have initdb emit a warning when the fallback is used, so
> that we know what else to look for.)

I just checked a couple of RHEL7 systems and it seems to be a symlink
there.  It's also a symlink on my laptop (macOS 10.13.3).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Who says we need a portable way?  If we had something that worked on
>>> Linux and macOS, it would cover most developer environments.  I wonder
>>> if readlink("/etc/localtime", buf, sz) might be a viable approach.

>> I wondered about that, but I'm afraid it's often a hardlink not a
>> symlink.  Still, we could try it.

> In Debian systems, it's a symlink.  Apparently in RHEL6 and older it's a
> copy or hardlink, and the file /etc/sysconfig/clock contains a ZONE
> variable that points to the right zone.

Yeah, on my RHEL6 box,

$ ls -l /etc/localtime 
-rw-r--r--. 1 root root 3519 May  4  2010 /etc/localtime

The mod date's a little astonishing, considering this system was built
in 2013.  It is *not* a hardlink, or even an exact match, to anything
under /usr/share/zoneinfo, though perhaps it was originally.  Also:

$ cat /etc/sysconfig/clock
# The time zone of the system is defined by the contents of /etc/localtime.
# This file is only for evaluation by system-config-date, do not rely on its
# contents elsewhere.
ZONE="America/New York"

I'm inclined to respect the comment, especially since I see they are not
spelling the zone name canonically anyway (note space not underscore);
so looking at this wouldn't work without some ill-defined heuristics.

However, more modern Red Hat systems seem to have /etc/localtime as
a symlink, so checking it would work there, and also macOS seems to
do it that way for as far back as I can check.

> This comment is insightful:
> https://github.com/HowardHinnant/date/issues/269#issuecomment-353792132
> It's talking about this code:
> https://github.com/HowardHinnant/date/blob/master/src/tz.cpp#L3652

Interesting.  Not sure if we want to try all those files.  But I'll
take a look at whipping up something that checks /etc/localtime.

            regards, tom lane


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
I wrote:
> ... I'll
> take a look at whipping up something that checks /etc/localtime.

Here's a draft patch.  It seems to do what I expect on a couple of
different macOS releases as well as recent Fedora.  Googling finds
some suggestions that there might be other locations for the symlink,
for instance it's alleged that glibc can be configured to look at
/usr/local/etc/localtime instead of /etc/localtime.  But AFAICS none
of the alternative locations are used on enough systems that it'd be
worth the cycles to check them.  Likewise, there seem to be some
systems with conventions about storing the zone name in a text file
someplace, but not really enough of them to make it worth our time
to support that.

The initdb speedup on the other boxes I checked seems to be in the
vicinity of 50 ms out of circa-1-sec total, which is less than I'd have
expected from my measurements on my RHEL6 box :-(.  Still, I think it's
worth doing, not least because (if it works) this fixes the problem that
you may get an unexpected alias for your system timezone, as in this
complaint awhile back:

https://www.postgresql.org/message-id/flat/4D7016AA.2090609%40agliodbs.com

I also changed initdb so that it reports what zone name it picked.
I have a vague recollection that it might've done that long ago, and
we suppressed it in a burst of enthusiasm about making initdb less
chatty.  That seems like something to undo, since we're tweaking what
are still basically heuristic rules for choosing the zone.

Next question is what to do with this.  Do we want to sit on it till
v12, or sneak it in now?

            regards, tom lane

diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c
index 4c3a91a..6901188 100644
--- a/src/bin/initdb/findtimezone.c
+++ b/src/bin/initdb/findtimezone.c
@@ -15,6 +15,7 @@
 #include <fcntl.h>
 #include <sys/stat.h>
 #include <time.h>
+#include <unistd.h>
 
 #include "pgtz.h"
 
@@ -126,12 +127,19 @@ pg_load_tz(const char *name)
  * On most systems, we rely on trying to match the observable behavior of
  * the C library's localtime() function.  The database zone that matches
  * furthest into the past is the one to use.  Often there will be several
- * zones with identical rankings (since the Olson database assigns multiple
+ * zones with identical rankings (since the IANA database assigns multiple
  * names to many zones).  We break ties arbitrarily by preferring shorter,
  * then alphabetically earlier zone names.
  *
+ * Many modern systems use the IANA database, so if we can determine the
+ * system's idea of which zone it is using and its behavior matches our zone
+ * of the same name, we can skip the rather-expensive search through all the
+ * zones in our database.  This short-circuit path also ensures that we spell
+ * the zone name the same way the system setting does, even in the presence
+ * of multiple aliases for the same zone.
+ *
  * Win32's native knowledge about timezones appears to be too incomplete
- * and too different from the Olson database for the above matching strategy
+ * and too different from the IANA database for the above matching strategy
  * to be of any use. But there is just a limited number of timezones
  * available, so we can rely on a handmade mapping table instead.
  */
@@ -150,6 +158,8 @@ struct tztry
     time_t        test_times[MAX_TEST_TIMES];
 };
 
+static bool check_system_link_file(const char *linkname, struct tztry *tt,
+                       char *bestzonename);
 static void scan_available_timezones(char *tzdir, char *tzdirsub,
                          struct tztry *tt,
                          int *bestscore, char *bestzonename);
@@ -299,12 +309,19 @@ score_timezone(const char *tzname, struct tztry *tt)
     return i;
 }
 
+/*
+ * Test whether given zone name is a perfect match to localtime() behavior
+ */
+static bool
+perfect_timezone_match(const char *tzname, struct tztry *tt)
+{
+    return (score_timezone(tzname, tt) == tt->n_test_times);
+}
+
 
 /*
  * Try to identify a timezone name (in our terminology) that best matches the
- * observed behavior of the system timezone library.  We cannot assume that
- * the system TZ environment setting (if indeed there is one) matches our
- * terminology, so we ignore it and just look at what localtime() returns.
+ * observed behavior of the system localtime() function.
  */
 static const char *
 identify_system_timezone(void)
@@ -339,7 +356,7 @@ identify_system_timezone(void)
      * way of doing things, but experience has shown that system-supplied
      * timezone definitions are likely to have DST behavior that is right for
      * the recent past and not so accurate further back. Scoring in this way
-     * allows us to recognize zones that have some commonality with the Olson
+     * allows us to recognize zones that have some commonality with the IANA
      * database, without insisting on exact match. (Note: we probe Thursdays,
      * not Sundays, to avoid triggering DST-transition bugs in localtime
      * itself.)
@@ -374,7 +391,18 @@ identify_system_timezone(void)
         tt.test_times[tt.n_test_times++] = t;
     }
 
-    /* Search for the best-matching timezone file */
+    /*
+     * Try to avoid the brute-force search by seeing if we can recognize the
+     * system's timezone setting directly.
+     *
+     * Currently we just check /etc/localtime; there are other conventions for
+     * this, but that seems to be the only one used on enough platforms to be
+     * worth troubling over.
+     */
+    if (check_system_link_file("/etc/localtime", &tt, resultbuf))
+        return resultbuf;
+
+    /* No luck, so search for the best-matching timezone file */
     strlcpy(tmptzdir, pg_TZDIR(), sizeof(tmptzdir));
     bestscore = -1;
     resultbuf[0] = '\0';
@@ -383,7 +411,7 @@ identify_system_timezone(void)
                              &bestscore, resultbuf);
     if (bestscore > 0)
     {
-        /* Ignore Olson's rather silly "Factory" zone; use GMT instead */
+        /* Ignore IANA's rather silly "Factory" zone; use GMT instead */
         if (strcmp(resultbuf, "Factory") == 0)
             return NULL;
         return resultbuf;
@@ -472,7 +500,7 @@ identify_system_timezone(void)
 
     /*
      * Did not find the timezone.  Fallback to use a GMT zone.  Note that the
-     * Olson timezone database names the GMT-offset zones in POSIX style: plus
+     * IANA timezone database names the GMT-offset zones in POSIX style: plus
      * is west of Greenwich.  It's unfortunate that this is opposite of SQL
      * conventions.  Should we therefore change the names? Probably not...
      */
@@ -487,6 +515,94 @@ identify_system_timezone(void)
 }
 
 /*
+ * Examine a system-provided symlink file to see if it tells us the timezone.
+ *
+ * Unfortunately, there is little standardization of how the system default
+ * timezone is determined in the absence of a TZ environment setting.
+ * But a common strategy is to create a symlink at a well-known place.
+ * If "linkname" identifies a readable symlink, and the tail of its contents
+ * matches a zone name we know, and the actual behavior of localtime() agrees
+ * with what we think that zone means, then we may use that zone name.
+ *
+ * We insist on a perfect behavioral match, which might not happen if the
+ * system has a different IANA database version than we do; but in that case
+ * it seems best to fall back to the brute-force search.
+ *
+ * linkname is the symlink file location to probe.
+ *
+ * tt tells about the system timezone behavior we need to match.
+ *
+ * If we successfully identify a zone name, store it in *bestzonename and
+ * return true; else return false.  bestzonename must be a buffer of length
+ * TZ_STRLEN_MAX + 1.
+ */
+static bool
+check_system_link_file(const char *linkname, struct tztry *tt,
+                       char *bestzonename)
+{
+#ifdef HAVE_READLINK
+    char        link_target[MAXPGPATH];
+    int            len;
+    const char *cur_name;
+
+    /*
+     * Try to read the symlink.  If not there, not a symlink, etc etc, just
+     * quietly fail; the precise reason needn't concern us.
+     */
+    len = readlink(linkname, link_target, sizeof(link_target));
+    if (len < 0 || len >= sizeof(link_target))
+        return false;
+    link_target[len] = '\0';
+
+#ifdef DEBUG_IDENTIFY_TIMEZONE
+    fprintf(stderr, "symbolic link \"%s\" contains \"%s\"\n",
+            linkname, link_target);
+#endif
+
+    /*
+     * The symlink is probably of the form "/path/to/zones/zone/name", or
+     * possibly it is a relative path.  Nobody puts their zone DB directly in
+     * the root directory, so we can definitely skip the first component; but
+     * after that it's trial-and-error to identify which path component begins
+     * the zone name.
+     */
+    cur_name = link_target;
+    while (*cur_name)
+    {
+        /* Advance to next segment of path */
+        cur_name = strchr(cur_name + 1, '/');
+        if (cur_name == NULL)
+            break;
+        /* If there are consecutive slashes, skip all, as the kernel would */
+        do
+        {
+            cur_name++;
+        } while (*cur_name == '/');
+
+        /*
+         * Test remainder of path to see if it is a matching zone name.
+         * Relative paths might contain ".."; we needn't bother testing if the
+         * first component is that.  Also defend against overlength names.
+         */
+        if (*cur_name && *cur_name != '.' &&
+            strlen(cur_name) <= TZ_STRLEN_MAX &&
+            perfect_timezone_match(cur_name, tt))
+        {
+            /* Success! */
+            strcpy(bestzonename, cur_name);
+            return true;
+        }
+    }
+
+    /* Couldn't extract a matching zone name */
+    return false;
+#else
+    /* No symlinks?  Forget it */
+    return false;
+#endif
+}
+
+/*
  * Recursively scan the timezone database looking for the best match to
  * the system timezone behavior.
  *
@@ -586,7 +702,7 @@ static const struct
      * HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time
      * Zones on Windows 10 and Windows 7.
      *
-     * The zones have been matched to Olson timezones by looking at the cities
+     * The zones have been matched to IANA timezones by looking at the cities
      * listed in the win32 display name (in the comment here) in most cases.
      */
     {
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ae22e7d..c8ba4b8 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -173,6 +173,7 @@ static char *pgdata_native;
 /* defaults */
 static int    n_connections = 10;
 static int    n_buffers = 50;
+static const char *default_timezone = NULL;
 static char *dynamic_shared_memory_type = NULL;
 
 /*
@@ -1047,6 +1048,11 @@ test_config_settings(void)
     else
         printf("%dkB\n", n_buffers * (BLCKSZ / 1024));
 
+    printf(_("selecting default timezone ... "));
+    fflush(stdout);
+    default_timezone = select_default_timezone(share_path);
+    printf("%s\n", default_timezone ? default_timezone : "GMT");
+
     printf(_("selecting dynamic shared memory implementation ... "));
     fflush(stdout);
     dynamic_shared_memory_type = choose_dsm_implementation();
@@ -1079,7 +1085,6 @@ setup_config(void)
     char      **conflines;
     char        repltok[MAXPGPATH];
     char        path[MAXPGPATH];
-    const char *default_timezone;
     char       *autoconflines[3];
 
     fputs(_("creating configuration files ... "), stdout);
@@ -1161,7 +1166,6 @@ setup_config(void)
                               "#default_text_search_config = 'pg_catalog.simple'",
                               repltok);
 
-    default_timezone = select_default_timezone(share_path);
     if (default_timezone)
     {
         snprintf(repltok, sizeof(repltok), "timezone = '%s'",

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Andres Freund
Дата:
Hi,

On 2018-05-10 12:18:19 -0400, Tom Lane wrote:
> Next question is what to do with this.  Do we want to sit on it till
> v12, or sneak it in now?

Is there a decent argument for sneaking it in? I don't really have an
opinion. I don't think it'd really be arguable that this'll make testing
meaningfully faster. OTOH, it's fresh in your mind (which can be said
about a lot of patches obviously).

Greetings,

Andres Freund


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-05-10 12:18:19 -0400, Tom Lane wrote:
>> Next question is what to do with this.  Do we want to sit on it till
>> v12, or sneak it in now?

> Is there a decent argument for sneaking it in? I don't really have an
> opinion. I don't think it'd really be arguable that this'll make testing
> meaningfully faster. OTOH, it's fresh in your mind (which can be said
> about a lot of patches obviously).

Yeah, I had hoped that this might make a noticeable difference on slower
buildfarm animals, but some testing shows that it's more likely to be
barely above the noise floor.

OTOH, in view of Josh's old gripe, maybe it could be argued to be a bug
fix, at least on platforms where it does anything.

            regards, tom lane


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Robert Haas
Дата:
On Thu, May 10, 2018 at 3:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, I had hoped that this might make a noticeable difference on slower
> buildfarm animals, but some testing shows that it's more likely to be
> barely above the noise floor.

Any idea why?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, May 10, 2018 at 3:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, I had hoped that this might make a noticeable difference on slower
>> buildfarm animals, but some testing shows that it's more likely to be
>> barely above the noise floor.

> Any idea why?

Well, maybe the noise floor is too high.  I experimented with "make check"
in src/bin/scripts (TAP tests enabled), which does 13 initdb's as of HEAD.
On dromedary's host (an older macOS release), 3 runs each:

HEAD:   1m47.017s 1m49.548s 1m46.805s
+patch: 1m47.188s 1m45.611s 1m51.520s

If you blame that last run on some background task chewing cycles, which
is certainly possible considering I'd neglected to turn off dromedary's
cron job, then there's some gain but not much.  There *is* a clearly
visible win if you measure "initdb -N" in isolation,

HEAD:   0m5.244s 0m5.025s 0m4.930s
+patch: 0m4.785s 0m4.706s 0m4.824s

but it's just not much compared to the other overhead of a TAP test.
These numbers would support the idea that we're saving about 250 ms
per initdb, which times 13 is circa 3 sec, which is comparable to the
inter-run variation I'm seeing in the scripts test.

I have no doubt that it'd add up to something over a lot of buildfarm
runs, but it's certainly no sort of game-changer.

Running on my RHEL6 dev machine, where I customarily use PROVE_FLAGS='-j4'
for check-world runs, src/bin/scripts takes maybe 14.9s on average.  The
patch itself does nothing for that machine of course, but if I simulate
its effect by setting the TZ environment variable, the runtime drops to
14.2s or so.  So again, there'd be some win for developer testing, but
not enough to get anybody excited.

            regards, tom lane


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
"Jonathan S. Katz"
Дата:
> On May 10, 2018, at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
>> On 2018-05-10 12:18:19 -0400, Tom Lane wrote:
>>> Next question is what to do with this.  Do we want to sit on it till
>>> v12, or sneak it in now?
>
>> Is there a decent argument for sneaking it in? I don't really have an
>> opinion. I don't think it'd really be arguable that this'll make testing
>> meaningfully faster. OTOH, it's fresh in your mind (which can be said
>> about a lot of patches obviously).
>
> Yeah, I had hoped that this might make a noticeable difference on slower
> buildfarm animals, but some testing shows that it's more likely to be
> barely above the noise floor.
>
> OTOH, in view of Josh's old gripe, maybe it could be argued to be a bug
> fix, at least on platforms where it does anything.

Read back to get some history/context on this, and from my vantage
point it sounds like this is fixing a bug (i.e. incorrect behavior). It also sounds
like based on the changes the earliest we’d be able to commit is is 11 and
not any further because people could be expecting the incorrect behavior to
happen, and thus we may break existing systems.

Jonathan

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
"Jonathan S. Katz" <jonathan.katz@excoventures.com> writes:
>> On May 10, 2018, at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> OTOH, in view of Josh's old gripe, maybe it could be argued to be a bug
>> fix, at least on platforms where it does anything.

> Read back to get some history/context on this, and from my vantage
> point it sounds like this is fixing a bug (i.e. incorrect behavior). It also sounds
> like based on the changes the earliest we’d be able to commit is is 11 and
> not any further because people could be expecting the incorrect behavior to
> happen, and thus we may break existing systems.

Yeah, given the small number of complaints, I doubt back-patching would
be a good idea.

Seems like we should just leave this for v12.

            regards, tom lane


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
I wrote:
>> ... I'll
>> take a look at whipping up something that checks /etc/localtime.

> Here's a draft patch.  It seems to do what I expect on a couple of
> different macOS releases as well as recent Fedora.

The cfbot points out that this has suffered bit-rot, so here's a rebased
version --- no substantive changes.

            regards, tom lane

diff --git a/src/bin/initdb/findtimezone.c b/src/bin/initdb/findtimezone.c
index 4c3a91a..6901188 100644
*** a/src/bin/initdb/findtimezone.c
--- b/src/bin/initdb/findtimezone.c
***************
*** 15,20 ****
--- 15,21 ----
  #include <fcntl.h>
  #include <sys/stat.h>
  #include <time.h>
+ #include <unistd.h>

  #include "pgtz.h"

*************** pg_load_tz(const char *name)
*** 126,137 ****
   * On most systems, we rely on trying to match the observable behavior of
   * the C library's localtime() function.  The database zone that matches
   * furthest into the past is the one to use.  Often there will be several
!  * zones with identical rankings (since the Olson database assigns multiple
   * names to many zones).  We break ties arbitrarily by preferring shorter,
   * then alphabetically earlier zone names.
   *
   * Win32's native knowledge about timezones appears to be too incomplete
!  * and too different from the Olson database for the above matching strategy
   * to be of any use. But there is just a limited number of timezones
   * available, so we can rely on a handmade mapping table instead.
   */
--- 127,145 ----
   * On most systems, we rely on trying to match the observable behavior of
   * the C library's localtime() function.  The database zone that matches
   * furthest into the past is the one to use.  Often there will be several
!  * zones with identical rankings (since the IANA database assigns multiple
   * names to many zones).  We break ties arbitrarily by preferring shorter,
   * then alphabetically earlier zone names.
   *
+  * Many modern systems use the IANA database, so if we can determine the
+  * system's idea of which zone it is using and its behavior matches our zone
+  * of the same name, we can skip the rather-expensive search through all the
+  * zones in our database.  This short-circuit path also ensures that we spell
+  * the zone name the same way the system setting does, even in the presence
+  * of multiple aliases for the same zone.
+  *
   * Win32's native knowledge about timezones appears to be too incomplete
!  * and too different from the IANA database for the above matching strategy
   * to be of any use. But there is just a limited number of timezones
   * available, so we can rely on a handmade mapping table instead.
   */
*************** struct tztry
*** 150,155 ****
--- 158,165 ----
      time_t        test_times[MAX_TEST_TIMES];
  };

+ static bool check_system_link_file(const char *linkname, struct tztry *tt,
+                        char *bestzonename);
  static void scan_available_timezones(char *tzdir, char *tzdirsub,
                           struct tztry *tt,
                           int *bestscore, char *bestzonename);
*************** score_timezone(const char *tzname, struc
*** 299,310 ****
      return i;
  }


  /*
   * Try to identify a timezone name (in our terminology) that best matches the
!  * observed behavior of the system timezone library.  We cannot assume that
!  * the system TZ environment setting (if indeed there is one) matches our
!  * terminology, so we ignore it and just look at what localtime() returns.
   */
  static const char *
  identify_system_timezone(void)
--- 309,327 ----
      return i;
  }

+ /*
+  * Test whether given zone name is a perfect match to localtime() behavior
+  */
+ static bool
+ perfect_timezone_match(const char *tzname, struct tztry *tt)
+ {
+     return (score_timezone(tzname, tt) == tt->n_test_times);
+ }
+

  /*
   * Try to identify a timezone name (in our terminology) that best matches the
!  * observed behavior of the system localtime() function.
   */
  static const char *
  identify_system_timezone(void)
*************** identify_system_timezone(void)
*** 339,345 ****
       * way of doing things, but experience has shown that system-supplied
       * timezone definitions are likely to have DST behavior that is right for
       * the recent past and not so accurate further back. Scoring in this way
!      * allows us to recognize zones that have some commonality with the Olson
       * database, without insisting on exact match. (Note: we probe Thursdays,
       * not Sundays, to avoid triggering DST-transition bugs in localtime
       * itself.)
--- 356,362 ----
       * way of doing things, but experience has shown that system-supplied
       * timezone definitions are likely to have DST behavior that is right for
       * the recent past and not so accurate further back. Scoring in this way
!      * allows us to recognize zones that have some commonality with the IANA
       * database, without insisting on exact match. (Note: we probe Thursdays,
       * not Sundays, to avoid triggering DST-transition bugs in localtime
       * itself.)
*************** identify_system_timezone(void)
*** 374,380 ****
          tt.test_times[tt.n_test_times++] = t;
      }

!     /* Search for the best-matching timezone file */
      strlcpy(tmptzdir, pg_TZDIR(), sizeof(tmptzdir));
      bestscore = -1;
      resultbuf[0] = '\0';
--- 391,408 ----
          tt.test_times[tt.n_test_times++] = t;
      }

!     /*
!      * Try to avoid the brute-force search by seeing if we can recognize the
!      * system's timezone setting directly.
!      *
!      * Currently we just check /etc/localtime; there are other conventions for
!      * this, but that seems to be the only one used on enough platforms to be
!      * worth troubling over.
!      */
!     if (check_system_link_file("/etc/localtime", &tt, resultbuf))
!         return resultbuf;
!
!     /* No luck, so search for the best-matching timezone file */
      strlcpy(tmptzdir, pg_TZDIR(), sizeof(tmptzdir));
      bestscore = -1;
      resultbuf[0] = '\0';
*************** identify_system_timezone(void)
*** 383,389 ****
                               &bestscore, resultbuf);
      if (bestscore > 0)
      {
!         /* Ignore Olson's rather silly "Factory" zone; use GMT instead */
          if (strcmp(resultbuf, "Factory") == 0)
              return NULL;
          return resultbuf;
--- 411,417 ----
                               &bestscore, resultbuf);
      if (bestscore > 0)
      {
!         /* Ignore IANA's rather silly "Factory" zone; use GMT instead */
          if (strcmp(resultbuf, "Factory") == 0)
              return NULL;
          return resultbuf;
*************** identify_system_timezone(void)
*** 472,478 ****

      /*
       * Did not find the timezone.  Fallback to use a GMT zone.  Note that the
!      * Olson timezone database names the GMT-offset zones in POSIX style: plus
       * is west of Greenwich.  It's unfortunate that this is opposite of SQL
       * conventions.  Should we therefore change the names? Probably not...
       */
--- 500,506 ----

      /*
       * Did not find the timezone.  Fallback to use a GMT zone.  Note that the
!      * IANA timezone database names the GMT-offset zones in POSIX style: plus
       * is west of Greenwich.  It's unfortunate that this is opposite of SQL
       * conventions.  Should we therefore change the names? Probably not...
       */
*************** identify_system_timezone(void)
*** 487,492 ****
--- 515,608 ----
  }

  /*
+  * Examine a system-provided symlink file to see if it tells us the timezone.
+  *
+  * Unfortunately, there is little standardization of how the system default
+  * timezone is determined in the absence of a TZ environment setting.
+  * But a common strategy is to create a symlink at a well-known place.
+  * If "linkname" identifies a readable symlink, and the tail of its contents
+  * matches a zone name we know, and the actual behavior of localtime() agrees
+  * with what we think that zone means, then we may use that zone name.
+  *
+  * We insist on a perfect behavioral match, which might not happen if the
+  * system has a different IANA database version than we do; but in that case
+  * it seems best to fall back to the brute-force search.
+  *
+  * linkname is the symlink file location to probe.
+  *
+  * tt tells about the system timezone behavior we need to match.
+  *
+  * If we successfully identify a zone name, store it in *bestzonename and
+  * return true; else return false.  bestzonename must be a buffer of length
+  * TZ_STRLEN_MAX + 1.
+  */
+ static bool
+ check_system_link_file(const char *linkname, struct tztry *tt,
+                        char *bestzonename)
+ {
+ #ifdef HAVE_READLINK
+     char        link_target[MAXPGPATH];
+     int            len;
+     const char *cur_name;
+
+     /*
+      * Try to read the symlink.  If not there, not a symlink, etc etc, just
+      * quietly fail; the precise reason needn't concern us.
+      */
+     len = readlink(linkname, link_target, sizeof(link_target));
+     if (len < 0 || len >= sizeof(link_target))
+         return false;
+     link_target[len] = '\0';
+
+ #ifdef DEBUG_IDENTIFY_TIMEZONE
+     fprintf(stderr, "symbolic link \"%s\" contains \"%s\"\n",
+             linkname, link_target);
+ #endif
+
+     /*
+      * The symlink is probably of the form "/path/to/zones/zone/name", or
+      * possibly it is a relative path.  Nobody puts their zone DB directly in
+      * the root directory, so we can definitely skip the first component; but
+      * after that it's trial-and-error to identify which path component begins
+      * the zone name.
+      */
+     cur_name = link_target;
+     while (*cur_name)
+     {
+         /* Advance to next segment of path */
+         cur_name = strchr(cur_name + 1, '/');
+         if (cur_name == NULL)
+             break;
+         /* If there are consecutive slashes, skip all, as the kernel would */
+         do
+         {
+             cur_name++;
+         } while (*cur_name == '/');
+
+         /*
+          * Test remainder of path to see if it is a matching zone name.
+          * Relative paths might contain ".."; we needn't bother testing if the
+          * first component is that.  Also defend against overlength names.
+          */
+         if (*cur_name && *cur_name != '.' &&
+             strlen(cur_name) <= TZ_STRLEN_MAX &&
+             perfect_timezone_match(cur_name, tt))
+         {
+             /* Success! */
+             strcpy(bestzonename, cur_name);
+             return true;
+         }
+     }
+
+     /* Couldn't extract a matching zone name */
+     return false;
+ #else
+     /* No symlinks?  Forget it */
+     return false;
+ #endif
+ }
+
+ /*
   * Recursively scan the timezone database looking for the best match to
   * the system timezone behavior.
   *
*************** static const struct
*** 586,592 ****
       * HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time
       * Zones on Windows 10 and Windows 7.
       *
!      * The zones have been matched to Olson timezones by looking at the cities
       * listed in the win32 display name (in the comment here) in most cases.
       */
      {
--- 702,708 ----
       * HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time
       * Zones on Windows 10 and Windows 7.
       *
!      * The zones have been matched to IANA timezones by looking at the cities
       * listed in the win32 display name (in the comment here) in most cases.
       */
      {
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 32746c7..cb8c745 100644
*** a/src/bin/initdb/initdb.c
--- b/src/bin/initdb/initdb.c
*************** static char *pgdata_native;
*** 174,179 ****
--- 174,180 ----
  static int    n_connections = 10;
  static int    n_buffers = 50;
  static const char *dynamic_shared_memory_type = NULL;
+ static const char *default_timezone = NULL;

  /*
   * Warning messages for authentication methods
*************** test_config_settings(void)
*** 1058,1063 ****
--- 1059,1069 ----
          printf("%dMB\n", (n_buffers * (BLCKSZ / 1024)) / 1024);
      else
          printf("%dkB\n", n_buffers * (BLCKSZ / 1024));
+
+     printf(_("selecting default timezone ... "));
+     fflush(stdout);
+     default_timezone = select_default_timezone(share_path);
+     printf("%s\n", default_timezone ? default_timezone : "GMT");
  }

  /*
*************** setup_config(void)
*** 1086,1092 ****
      char      **conflines;
      char        repltok[MAXPGPATH];
      char        path[MAXPGPATH];
-     const char *default_timezone;
      char       *autoconflines[3];

      fputs(_("creating configuration files ... "), stdout);
--- 1092,1097 ----
*************** setup_config(void)
*** 1168,1174 ****
                                "#default_text_search_config = 'pg_catalog.simple'",
                                repltok);

-     default_timezone = select_default_timezone(share_path);
      if (default_timezone)
      {
          snprintf(repltok, sizeof(repltok), "timezone = '%s'",
--- 1173,1178 ----

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Michael Paquier
Дата:
On Fri, Aug 10, 2018 at 02:31:25PM -0400, Tom Lane wrote:
> The cfbot points out that this has suffered bit-rot, so here's a rebased
> version --- no substantive changes.

+   /*
+    * Try to read the symlink.  If not there, not a symlink, etc etc, just
+    * quietly fail; the precise reason needn't concern us.
+    */
+   len = readlink(linkname, link_target, sizeof(link_target));

One thing that I can see changing with this patch is how timezone is set
in postgresql.conf.  For example, on HEAD I get 'Japan' while this patch
gives back 'Asia/Tokyo'.  Could it be an issue for countries with
multiple timezones?  I am not sure how Russian systems would react on
that for example.

The time cut for initdb is interesting for buildfarm members anyway, and
the patch looks in nice shape to me.
--
Michael

Вложения

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> One thing that I can see changing with this patch is how timezone is set
> in postgresql.conf.  For example, on HEAD I get 'Japan' while this patch
> gives back 'Asia/Tokyo'.  Could it be an issue for countries with
> multiple timezones?  I am not sure how Russian systems would react on
> that for example.

Interesting --- what platform were you testing on?

I believe that this patch will never make for any functional change,
it will only give you some other alias for the zone it would have
selected anyway.  This could only fail to be true if there are
distinct timezones that score_timezone() is failing to tell apart,
which would be a bug in score_timezone, not this patch.  (Presumably,
we could fix any such bug by increasing the number of dates that
score_timezone tests.)

Since the tzdb database is rather full of aliases, for instance the
one you mentioned

$ grep ^Li src/timezone/data/tzdata.zi
...
Li Asia/Tokyo Japan
...

there is certainly plenty of opportunity for this to change the
apparent value of TimeZone.  But I think it's for the better:
instead of choosing an alias that happens to be first according
to some unspecified search order, it will choose the alias that
somebody actually configured the operating system with.

            regards, tom lane


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Michael Paquier
Дата:
On Wed, Sep 12, 2018 at 10:00:21AM -0400, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
>> One thing that I can see changing with this patch is how timezone is set
>> in postgresql.conf.  For example, on HEAD I get 'Japan' while this patch
>> gives back 'Asia/Tokyo'.  Could it be an issue for countries with
>> multiple timezones?  I am not sure how Russian systems would react on
>> that for example.
>
> Interesting --- what platform were you testing on?

Debian SID, with Asia/Tokyo used for /etc/localtime.

> I believe that this patch will never make for any functional change,
> it will only give you some other alias for the zone it would have
> selected anyway.  This could only fail to be true if there are
> distinct timezones that score_timezone() is failing to tell apart,
> which would be a bug in score_timezone, not this patch.  (Presumably,
> we could fix any such bug by increasing the number of dates that
> score_timezone tests.)

Looking at the list of aliases, I am not seeing listed countries running
across multiple timezones, so that may be fine..  Anyway, your change,
and particularly the cut of time in running initdb, which matters for
TAP tests, are definitely good things in my opinion.  So I am switching
that as ready for committer.
--
Michael

Вложения

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Alvaro Herrera
Дата:
On 2018-Sep-13, Michael Paquier wrote:

> Looking at the list of aliases, I am not seeing listed countries running
> across multiple timezones, so that may be fine..

Well, mine is there, and it's correct:

Li America/Santiago Chile/Continental
Li Pacific/Easter Chile/EasterIsland

Laugh all you want about Chile of all countries having multiple
timezones ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> On Wed, Sep 12, 2018 at 10:00:21AM -0400, Tom Lane wrote:
>> I believe that this patch will never make for any functional change,
>> it will only give you some other alias for the zone it would have
>> selected anyway.

> Looking at the list of aliases, I am not seeing listed countries running
> across multiple timezones, so that may be fine.

Not sure what you're worried about.  "Linked" time zones are the *same
data*.  In an installed tzdb tree, the Japan file is either a hardlink or
symlink to the Asia/Tokyo one, so they can't differ.  What you seem to be
speculating about is actual errors in the tzdb data, ie not describing
the facts on the ground in particular places.  That's possible I suppose
but it's hardly our problem if it happens; it'd be theirs to fix.

            regards, tom lane


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Thomas Munro
Дата:
On Thu, Sep 13, 2018 at 11:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > On Wed, Sep 12, 2018 at 10:00:21AM -0400, Tom Lane wrote:
> >> I believe that this patch will never make for any functional change,
> >> it will only give you some other alias for the zone it would have
> >> selected anyway.
>
> > Looking at the list of aliases, I am not seeing listed countries running
> > across multiple timezones, so that may be fine.
>
> Not sure what you're worried about.  "Linked" time zones are the *same
> data*.  In an installed tzdb tree, the Japan file is either a hardlink or
> symlink to the Asia/Tokyo one, so they can't differ.  What you seem to be
> speculating about is actual errors in the tzdb data, ie not describing
> the facts on the ground in particular places.  That's possible I suppose
> but it's hardly our problem if it happens; it'd be theirs to fix.

I tested this on a system where /etc/localtime is not a symlink
(FreeBSD) and it worked fine, falling back to the old behaviour and
finding my timezone.  (Apparently the argument against a symlink
/etc/localtime -> /usr/share/tzinfo/...  is that new processes would
effectively switch timezone after /usr is mounted so your boot logs
would be mixed up.  Or something.  I bet that actually happens on
Linux too, but maybe no one does /usr as a mount point anymore...?)

I noticed that the patch does a bunch of s/Olson/IANA/.  That leaves
only one place in the tree that still refers to the "Olson" database:
dt_common.c.  Might want to change that too?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Michael Paquier
Дата:
On Wed, Sep 12, 2018 at 08:04:53PM -0300, Alvaro Herrera wrote:
> Well, mine is there, and it's correct:
>
> Li America/Santiago Chile/Continental
> Li Pacific/Easter Chile/EasterIsland
>
> Laugh all you want about Chile of all countries having multiple
> timezones ...

Impressed is a better word.  Really impressed seeing the shape of the
country ;)
--
Michael

Вложения

Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> I noticed that the patch does a bunch of s/Olson/IANA/.  That leaves
> only one place in the tree that still refers to the "Olson" database:
> dt_common.c.  Might want to change that too?

Ah, I didn't realize we were that close to wiping out the old
terminology.  Yeah, I'll get that one too.  Thanks for noticing.

            regards, tom lane