Обсуждение: [COMMITTERS] pgsql: Improve performance of timezone loading,especially pg_timezone_

Поиск
Список
Период
Сортировка

[COMMITTERS] pgsql: Improve performance of timezone loading,especially pg_timezone_

От
Tom Lane
Дата:
Improve performance of timezone loading, especially pg_timezone_names view.

tzparse() would attempt to load the "posixrules" timezone database file on
each call.  That might seem like it would only be an issue when selecting a
POSIX-style zone name rather than a zone defined in the timezone database,
but it turns out that each zone definition file contains a POSIX-style zone
string and tzload() will call tzparse() to parse that.  Thus, when scanning
the whole timezone file tree as we do in the pg_timezone_names view,
"posixrules" was read repetitively for each zone definition file.  Fix
that by caching the file on first use within any given process.  (We cache
other zone definitions for the life of the process, so there seems little
reason not to cache this one as well.)  This probably won't help much in
processes that never run pg_timezone_names, but even one additional SET
of the timezone GUC would come out ahead.

An even worse problem for pg_timezone_names is that pg_open_tzfile()
has an inefficient way of identifying the canonical case of a zone name:
it basically re-descends the directory tree to the zone file.  That's not
awful for an individual "SET timezone" operation, but it's pretty horrid
when we're inspecting every zone in the database.  And it's pointless too
because we already know the canonical spelling, having just read it from
the filesystem.  Fix by teaching pg_open_tzfile() to avoid the directory
search if it's not asked for the canonical name, and backfilling the
proper result in pg_tzenumerate_next().

In combination these changes seem to make the pg_timezone_names view
about 3x faster to read, for me.  Since a scan of pg_timezone_names
has up to now been one of the slowest queries in the regression tests,
this should help some little bit for buildfarm cycle times.

Back-patch to all supported branches, not so much because it's likely
that users will care much about the view's performance as because
tracking changes in the upstream IANA timezone code is really painful
if we don't keep all the branches in sync.

Discussion: https://postgr.es/m/27962.1493671706@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/af2c5aa88d38573724e40fa029499b4db20b0eb2

Modified Files
--------------
src/timezone/README      |  3 +++
src/timezone/localtime.c | 23 ++++++++++++++++++++++-
src/timezone/pgtz.c      | 42 +++++++++++++++++++++++++++++++++++-------
3 files changed, 60 insertions(+), 8 deletions(-)


Re: [COMMITTERS] pgsql: Improve performance of timezone loading,especially pg_timezone_

От
Amit Kapila
Дата:
On Wed, May 3, 2017 at 7:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Improve performance of timezone loading, especially pg_timezone_names view.
>

I am consistently getting below error after this commit in my Win7 machine:
running bootstrap script ... 2017-05-07 05:09:13.383 GMT [7908] LOG:
could not open directory "/installation/share/timezone/posixrules": No
such file or directory

This occurs both during initdb and server start.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Amit Kapila <amit.kapila16@gmail.com> writes:
> On Wed, May 3, 2017 at 7:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Improve performance of timezone loading, especially pg_timezone_names view.

> I am consistently getting below error after this commit in my Win7 machine:
> running bootstrap script ... 2017-05-07 05:09:13.383 GMT [7908] LOG:
> could not open directory "/installation/share/timezone/posixrules": No
> such file or directory

> This occurs both during initdb and server start.

You sure it wasn't there before, too?  That commit did not add any
file reads that didn't happen before.

            regards, tom lane


Re: [COMMITTERS] pgsql: Improve performance of timezone loading,especially pg_timezone_

От
David Rowley
Дата:
, On 7 May 2017 at 17:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> On Wed, May 3, 2017 at 7:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Improve performance of timezone loading, especially pg_timezone_names view.
>
>> I am consistently getting below error after this commit in my Win7 machine:
>> running bootstrap script ... 2017-05-07 05:09:13.383 GMT [7908] LOG:
>> could not open directory "/installation/share/timezone/posixrules": No
>> such file or directory
>
>> This occurs both during initdb and server start.
>
> You sure it wasn't there before, too?  That commit did not add any
> file reads that didn't happen before.

It didn't exist before either, but I don't think the error is what you
think. Please note the error message indicated it's looking for a
"directory" named posixrules, not the file.

The error seems to be due to this code:

if (canonname == NULL)
{
int result;

fullname[fullnamelen] = '/';
/* test above ensured this will fit: */
strcpy(fullname + fullnamelen + 1, name);
result = open(fullname, O_RDONLY | PG_BINARY, 0);
if (result >= 0)
return result;
/* If that didn't work, fall through to do it the hard way */
}

where we append the name, then try to open the file, but since the
file does not exist result is < 0 and we fall through to the for(;;)
loop and try to do scan_directory_ci() on the filename rather than the
directory.

On windows, where I also get the error, the stacktrace, as of
b58c433ef90b is as follows:

  postgres.exe!scan_directory_ci(const char * dirname, const char *
fname, int fnamelen, char * canonname, int canonnamelen) Line 162 C
> postgres.exe!pg_open_tzfile(const char * name, char * canonname) Line 127 C
  postgres.exe!tzloadbody(const char * name, char * canonname, state *
sp, char doextend, local_storage * lsp) Line 240 C
  postgres.exe!tzload(const char * name, char * canonname, state * sp,
char doextend) Line 564 C
  postgres.exe!tzparse(const char * name, state * sp, char lastditch) Line 959 C
  postgres.exe!tzloadbody(const char * name, char * canonname, state *
sp, char doextend, local_storage * lsp) Line 421 C
  postgres.exe!tzload(const char * name, char * canonname, state * sp,
char doextend) Line 564 C
  postgres.exe!pg_tzset(const char * name) Line 291 C
  postgres.exe!check_log_timezone(char * * newval, void * * extra,
GucSource source) Line 415 C
  postgres.exe!call_string_check_hook(config_string * conf, char * *
newval, void * * extra, GucSource source, int elevel) Line 9905 C
  postgres.exe!parse_and_validate_value(config_generic * record, const
char * name, const char * value, GucSource source, int elevel,
config_var_val * newval, void * * newextra) Line 5834 C
  postgres.exe!set_config_option(const char * name, const char *
value, GucContext context, GucSource source, GucAction action, char
changeVal, int elevel, char is_reload) Line 6437 C
  postgres.exe!ProcessConfigFileInternal(GucContext context, char
applySettings, int elevel) Line 441 C
  postgres.exe!ProcessConfigFile(GucContext context) Line 158 C
  postgres.exe!SelectConfigFiles(const char * userDoption, const char
* progname) Line 4810 C
  postgres.exe!PostmasterMain(int argc, char * * argv) Line 854 C
  postgres.exe!main(int argc, char * * argv) Line 229 C
  postgres.exe!__tmainCRTStartup() Line 536 C
  postgres.exe!mainCRTStartup() Line 377 C
  kernel32.dll!00007ffdab5e13d2() Unknown
  ntdll.dll!00007ffdadf754e4() Unknown

Perhaps we just need to put the NUL char back, to trim off the filename again:

/* If that didn't work, fall through to do it the hard way */
fullname[fullnamelen] = '\0';

but I've not yet looked into why the file is missing in the first place.


Re: [COMMITTERS] pgsql: Improve performance of timezone loading,especially pg_timezone_

От
David Rowley
Дата:
On 7 May 2017 at 21:03, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Perhaps we just need to put the NUL char back, to trim off the filename again:
>
> /* If that didn't work, fall through to do it the hard way */
> fullname[fullnamelen] = '\0';
>
> but I've not yet looked into why the file is missing in the first place.

OK, so it looks like GenerateTimezoneFiles in Install.pm for the MSVC
build does not quite do what make install does for src/timezone.
Nothing seems to pass the -p parameter as the following is doing:

install: all installdirs
ifeq (,$(with_system_tzdata))
$(ZIC) -d '$(DESTDIR)$(datadir)/timezone' -p '$(POSIXRULES)' $(TZDATAFILES)

I've attached a patch for review. My perl skills are at "trial and
error" level, so please review carefully.

The attached also adds the NUL char back to fullname in pg_open_tzfile().

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: [COMMITTERS] pgsql: Improve performance of timezone loading,especially pg_timezone_

От
David Rowley
Дата:
On 7 May 2017 at 21:33, David Rowley <david.rowley@2ndquadrant.com> wrote:
> I've attached a patch for review. My perl skills are at "trial and
> error" level, so please review carefully.

Actually, my Perl code contained a superfluous line to trim the
whitespace from $posixrules.

The attached is a version without it, which still appears to work.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения
David Rowley <david.rowley@2ndquadrant.com> writes:
> OK, so it looks like GenerateTimezoneFiles in Install.pm for the MSVC
> build does not quite do what make install does for src/timezone.
> Nothing seems to pass the -p parameter as the following is doing:

Ah-hah.  Still, the log message should have been there before.
Maybe just nobody noticed?

> The attached also adds the NUL char back to fullname in pg_open_tzfile().

Oooh ... I did not think that was necessary, but on closer inspection
you are right.

            regards, tom lane


Re: [COMMITTERS] pgsql: Improve performance of timezone loading,especially pg_timezone_

От
Amit Kapila
Дата:
On Sun, May 7, 2017 at 2:33 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> , On 7 May 2017 at 17:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Amit Kapila <amit.kapila16@gmail.com> writes:
>>> On Wed, May 3, 2017 at 7:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Improve performance of timezone loading, especially pg_timezone_names view.
>>
>>> I am consistently getting below error after this commit in my Win7 machine:
>>> running bootstrap script ... 2017-05-07 05:09:13.383 GMT [7908] LOG:
>>> could not open directory "/installation/share/timezone/posixrules": No
>>> such file or directory
>>
>>> This occurs both during initdb and server start.
>>
>> You sure it wasn't there before, too?

Yes and that is confirmed by David as well.

>>  That commit did not add any
>> file reads that didn't happen before.
>
> It didn't exist before either, but I don't think the error is what you
> think. Please note the error message indicated it's looking for a
> "directory" named posixrules, not the file.
>
> The error seems to be due to this code:
>
> if (canonname == NULL)
> {
> int result;
>
> fullname[fullnamelen] = '/';
> /* test above ensured this will fit: */
> strcpy(fullname + fullnamelen + 1, name);
> result = open(fullname, O_RDONLY | PG_BINARY, 0);
> if (result >= 0)
> return result;
> /* If that didn't work, fall through to do it the hard way */
> }
>
> where we append the name, then try to open the file, but since the
> file does not exist result is < 0 and we fall through to the for(;;)
> loop and try to do scan_directory_ci() on the filename rather than the
> directory.
>
>
> /* If that didn't work, fall through to do it the hard way */
> fullname[fullnamelen] = '\0';
>

Yes, that is one way of doing it, otherwise, we could save the
original fullname (before appending 'name' to it) and then use the
same in for loop.

> but I've not yet looked into why the file is missing in the first place.

I think it is better to consider that as a separate patch as that is a
different problem that exists before this commit as well.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [COMMITTERS] pgsql: Improve performance of timezone loading,especially pg_timezone_

От
Amit Kapila
Дата:
On Sun, May 7, 2017 at 8:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> OK, so it looks like GenerateTimezoneFiles in Install.pm for the MSVC
>> build does not quite do what make install does for src/timezone.
>> Nothing seems to pass the -p parameter as the following is doing:
>
> Ah-hah.  Still, the log message should have been there before.
>

No, it didn't exist before because dirname passed to scan_directory_ci
didn't contain the filename appended to it.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Amit Kapila <amit.kapila16@gmail.com> writes:
> On Sun, May 7, 2017 at 8:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Ah-hah.  Still, the log message should have been there before.

> No, it didn't exist before because dirname passed to scan_directory_ci
> didn't contain the filename appended to it.

Gotcha.  On closer look, the "fall through" path added by this commit
wasn't really exercised at all on Unix machines, because calls with
no canonname only occur when we expect to find the zone file with name
as stated.  It's accidental (and fortunate) that the bug in the MSVC
build support allowed us to find this bug promptly ;-)

I concur with your thought that these are separate bugs and deserve
separate commits.  Will do that in a minute.

            regards, tom lane


I wrote:
> I concur with your thought that these are separate bugs and deserve
> separate commits.  Will do that in a minute.

BTW, I wrote in the commit log message that the posixrules omission
had only minimal consequences, but on closer look that's wrong:
the hardwired default in the timezone code is

/*
 * The DST rules to use if TZ has no rules and we can't load TZDEFRULES.
 * We default to US rules as of 1999-08-17.
 * POSIX 1003.1 section 8.1.1 says that the default DST rules are
 * implementation dependent; for historical reasons, US rules are a
 * common default.
 */
#define TZDEFRULESTRING ",M4.1.0,M10.5.0"

that is, first Sunday of April and last Sunday of October.  The US
hasn't gone by that since 2006.  So actually, this change will have
a very visible impact on the behavior of POSIX-style zone names.

(I wonder why IANA hasn't updated this default?  Still, that's their
expertise not ours, so I'm disinclined to touch it.)

            regards, tom lane


Re: [COMMITTERS] pgsql: Improve performance of timezone loading,especially pg_timezone_

От
David Rowley
Дата:
On 8 May 2017 at 02:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> OK, so it looks like GenerateTimezoneFiles in Install.pm for the MSVC
>> build does not quite do what make install does for src/timezone.
>> Nothing seems to pass the -p parameter as the following is doing:
>
> Ah-hah.  Still, the log message should have been there before.
> Maybe just nobody noticed?

Thanks for pushing.

The error messages were not there before. I think just load_ok would
have ended up false in tzparse(). There seems to be code to handle
that without producing any error.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services