Обсуждение: Performance patch for Win32

Поиск
Список
Период
Сортировка

Performance patch for Win32

От
Mark Dilger
Дата:
This is a patch against src/backend/storage/file/fd.c
taken from 9.2beta1.

This patch is submitted for review and comments, not
for application to the code base.  *WIP*

This patch addresses a performance problem stemming
from the use of FindFirstFile() and FindNextFile() to
iterate over a directory in Windows.  These two functions
are used in the port of readdir for Windows.  Unfortunately,
unlike Linux, these Windows directory iteration functions
return the equivalent of a stat() call for each file iterated.
Hence, if a directory contains many tens of thousands of
files, the iteration can take several minutes to complete.

In RemovePgTempFile(), multiple directories are iterated
and all files found which match the pattern for a temporary
file are unlinked.  The pattern matching is performed
*outside* the directory iteration.  This patch uses a file
pattern like "t*" to match all temporary files, rather than
iterating over all files in the directory, thus pushing the
pattern match *inside* the directory iteration and gaining
significant startup time performance.

This is not theoretical.  The real-world database where I
found this problem is on a Windows 2003 server running
PostgreSQL 9.1.3 and having 56,000 tables.  I was able
to duplicate the problem on a Windows 2008 server.

To reproduce, you will need a database on Windows with
tens of thousands of tables and a recent version of
PostgreSQL.  Reboot the Windows server so that the
filesystem is guaranteed not to be in the filesystem cache.
Start postgres using pg_ctl, and note that it takes several
minutes to start.  After applying the patch and re-running
these steps, the server should not take so long to start.

I have the following reservations about my design, and
solicit comments and suggestions for improvement:

1)  The changes I made in fd.c pass a pattern rather than
a name into ReadDir *knowing the details* of how ReadDir
on Windows will use the port of readdir in src/port/dirent.c
and that in that code FindFirstFile() and FindNextFile() will
be called.  This knowledge about the inner workings of
the port of readdir() is not appropriate inside fd.c, IMHO.

2) I used a fair amount of #ifdef WIN32 to avoid adding
unnecessary variables or branches to the non-windows
code.  Since this code is probably not on the critical path
performancewise, this may be overkill.

3) The pattern passed to ReadDir of the form "t*" should
probably be something closer to (in pcre form):
m/^t\d+_\d+/, rather than m/^t.*/.  I am not sufficiently
familiar with how Windows interprets file patterns, and
whether it interprets them differently from one version of
Windows to another, to be comfortable making a more
precise pattern.

4) Other places in the PostgreSQL sources where directory
iteration is needed should probably use a pattern if possible
when running on Windows.  Thus, it might make more
sense to have a version of ReadDir that explicitly takes a
pattern, and use that version of ReadDir elsewhere in the
codebase.





Вложения

Re: Performance patch for Win32

От
Tom Lane
Дата:
Mark Dilger <markdilger@yahoo.com> writes:
> 4) Other places in the PostgreSQL sources where directory
> iteration is needed should probably use a pattern if possible
> when running on Windows.� Thus, it might make more
> sense to have a version of ReadDir that explicitly takes a
> pattern, and use that version of ReadDir elsewhere in the
> codebase.

Yeah, I think a separate argument passed to an AllocateDir variant
would be a less ugly way to deal with this.  For example, in place
of your first #ifdef block just write
       temp_dir = AllocateDirWithFilePattern(tmpdirname,
PG_TEMP_FILE_PREFIX"*");
 


What is not immediately obvious to me is whether we should try to make
the pattern argument do something useful on non-Windows platforms
(and thus allow removal of the ad-hoc pattern match code in the loops
where this is used); versus just treating it as a no-op on non-Windows.
If we did that, we'd have to consider that Windows gets to define what
the pattern language is and try to emulate that; which is likely to be
expensive enough to not be a win, not to mention that non-Windows
developers might not be terribly familiar with the finer points of the
pattern language.

I'm kind of inclined to consider that we should just treat the pattern
option as a Windows-specific wart and keep the ad-hoc matching code as
it is.
        regards, tom lane


Re: Performance patch for Win32

От
Mark Dilger
Дата:
I am hesitant to write a function like AllocateDirWithFilePattern
if the pattern is simply ignored on non-Windows.  In my patch,
the pattern underspecified the files, and the ad-hoc matching code
applied to all the returned files tightened that up.  But a person
could just as well overspecify the pattern and then they would get
different behavior on Windows vs. non-Windows, with fewer
files returned by FindNextFile() than would have matched the
ad-hoc pattern.

In this particular instance, AllocateDirWithFilePrefix would
work, and could be applied on all platforms, using something
like:

    temp_dir = AllocateDirWithFilePrefix(tmpdirname,
                                        PG_TEMP_FILE_PREFIX);

and that could be converted to PG_TEMP_FILE_PREFIX "*"
on Windows, and on non-Windows the function itself could
apply the prefix check easily enough.

Is AllocateDirWithFilePrefix overly specific?  Clearly we
could also take a Suffix argument, but if we go too far down
this road we just reinvent regular expressions....


mark


From: Tom Lane <tgl@sss.pgh.pa.us>
To: Mark Dilger <markdilger@yahoo.com>
Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Sent: Tuesday, May 29, 2012 2:30 PM
Subject: Re: [HACKERS] Performance patch for Win32

Mark Dilger <markdilger@yahoo.com> writes:
> 4) Other places in the PostgreSQL sources where directory
> iteration is needed should probably use a pattern if possible
> when running on Windows.  Thus, it might make more
> sense to have a version of ReadDir that explicitly takes a
> pattern, and use that version of ReadDir elsewhere in the
> codebase.

Yeah, I think a separate argument passed to an AllocateDir variant
would be a less ugly way to deal with this.  For example, in place
of your first #ifdef block just write

        temp_dir = AllocateDirWithFilePattern(tmpdirname,
                                              PG_TEMP_FILE_PREFIX "*");


What is not immediately obvious to me is whether we should try to make
the pattern argument do something useful on non-Windows platforms
(and thus allow removal of the ad-hoc pattern match code in the loops
where this is used); versus just treating it as a no-op on non-Windows.
If we did that, we'd have to consider that Windows gets to define what
the pattern language is and try to emulate that; which is likely to be
expensive enough to not be a win, not to mention that non-Windows
developers might not be terribly familiar with the finer points of the
pattern language.

I'm kind of inclined to consider that we should just treat the pattern
option as a Windows-specific wart and keep the ad-hoc matching code as
it is.

            regards, tom lane


Re: Performance patch for Win32

От
Tom Lane
Дата:
Mark Dilger <markdilger@yahoo.com> writes:
> I am hesitant to write a function like AllocateDirWithFilePattern
> if the pattern is simply ignored on non-Windows.� In my patch,
> the pattern underspecified the files, and the ad-hoc matching code
> applied to all the returned files tightened that up.� But a person
> could just as well overspecify the pattern and then they would get
> different behavior on Windows vs. non-Windows, with fewer
> files returned by FindNextFile() than would have matched the
> ad-hoc pattern.

Well, if you're imagining that we wouldn't need to test carefully on
both Windows and non-Windows, I think that's a pipe dream.  As an
example, your proposal of AllocateDirWithFilePrefix would only work
consistently across platforms if the prefix didn't contain anything
that Windows thought was a pattern metacharacter.  (This might never
come up, but I'm not too sure what the metacharacters are on Windows.)

Having said that, I have nothing particularly against the idea of
specifying a prefix rather than an arbitrary pattern.  I'm just
saying it'll still need testing.  Also, I wonder how many of the
potential stat-equivalent operations we'll be unable to optimize
away with the more restricted definition.  Using a tighter pattern
on Windows seems basically free (modulo testing) if we accept that
it's Windows-only.
        regards, tom lane


Re: Performance patch for Win32

От
Mark Dilger
Дата:
I was imagining that this would be a trap for linux developers
who saw nothing wrong with their code until it made it to the
build/test farm.  That's pretty far down the development
process.  Of course, it is also a trap in the other direction, for
Windows developers who use the pattern but do not include
anything equivalent for the non-Windows execution path.

On the whole, however, your argument in favor of tighter
patterns might be more convincing than my argument in favor
of catching bugs sooner.

I will start implementing your suggestion for patch v2.


From: Tom Lane <tgl@sss.pgh.pa.us>
To: Mark Dilger <markdilger@yahoo.com>
Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Sent: Tuesday, May 29, 2012 3:42 PM
Subject: Re: [HACKERS] Performance patch for Win32

Mark Dilger <markdilger@yahoo.com> writes:
> I am hesitant to write a function like AllocateDirWithFilePattern
> if the pattern is simply ignored on non-Windows.  In my patch,
> the pattern underspecified the files, and the ad-hoc matching code
> applied to all the returned files tightened that up.  But a person
> could just as well overspecify the pattern and then they would get
> different behavior on Windows vs. non-Windows, with fewer
> files returned by FindNextFile() than would have matched the
> ad-hoc pattern.

Well, if you're imagining that we wouldn't need to test carefully on
both Windows and non-Windows, I think that's a pipe dream.  As an
example, your proposal of AllocateDirWithFilePrefix would only work
consistently across platforms if the prefix didn't contain anything
that Windows thought was a pattern metacharacter.  (This might never
come up, but I'm not too sure what the metacharacters are on Windows.)

Having said that, I have nothing particularly against the idea of
specifying a prefix rather than an arbitrary pattern.  I'm just
saying it'll still need testing.  Also, I wonder how many of the
potential stat-equivalent operations we'll be unable to optimize
away with the more restricted definition.  Using a tighter pattern
on Windows seems basically free (modulo testing) if we accept that
it's Windows-only.

            regards, tom lane


Re: Performance patch for Win32

От
Bruce Momjian
Дата:
On Tue, May 29, 2012 at 03:54:59PM -0700, Mark Dilger wrote:
> I was imagining that this would be a trap for linux developers
> who saw nothing wrong with their code until it made it to the
> build/test farm.  That's pretty far down the development
> process.  Of course, it is also a trap in the other direction, for
> Windows developers who use the pattern but do not include
> anything equivalent for the non-Windows execution path.
> 
> On the whole, however, your argument in favor of tighter
> patterns might be more convincing than my argument in favor
> of catching bugs sooner.
> 
> I will start implementing your suggestion for patch v2.

Any progress on this?

---------------------------------------------------------------------------


> 
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Mark Dilger <markdilger@yahoo.com>
> Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Sent: Tuesday, May 29, 2012 3:42 PM
> Subject: Re: [HACKERS] Performance patch for Win32
> 
> Mark Dilger <markdilger@yahoo.com> writes:
> > I am hesitant to write a function like AllocateDirWithFilePattern
> > if the pattern is simply ignored on non-Windows.  In my patch,
> > the pattern underspecified the files, and the ad-hoc matching code
> > applied to all the returned files tightened that up.  But a person
> > could just as well overspecify the pattern and then they would get
> > different behavior on Windows vs. non-Windows, with fewer
> > files returned by FindNextFile() than would have matched the
> > ad-hoc pattern.
> 
> Well, if you're imagining that we wouldn't need to test carefully on
> both Windows and non-Windows, I think that's a pipe dream.  As an
> example, your proposal of AllocateDirWithFilePrefix would only work
> consistently across platforms if the prefix didn't contain anything
> that Windows thought was a pattern metacharacter.  (This might never
> come up, but I'm not too sure what the metacharacters are on Windows.)
> 
> Having said that, I have nothing particularly against the idea of
> specifying a prefix rather than an arbitrary pattern.  I'm just
> saying it'll still need testing.  Also, I wonder how many of the
> potential stat-equivalent operations we'll be unable to optimize
> away with the more restricted definition.  Using a tighter pattern
> on Windows seems basically free (modulo testing) if we accept that
> it's Windows-only.
> 
>             regards, tom lane
> 
> 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: Performance patch for Win32

От
Bruce Momjian
Дата:
On Thu, Aug 30, 2012 at 04:37:37PM -0400, Bruce Momjian wrote:
> On Tue, May 29, 2012 at 03:54:59PM -0700, Mark Dilger wrote:
> > I was imagining that this would be a trap for linux developers
> > who saw nothing wrong with their code until it made it to the
> > build/test farm.  That's pretty far down the development
> > process.  Of course, it is also a trap in the other direction, for
> > Windows developers who use the pattern but do not include
> > anything equivalent for the non-Windows execution path.
> > 
> > On the whole, however, your argument in favor of tighter
> > patterns might be more convincing than my argument in favor
> > of catching bugs sooner.
> > 
> > I will start implementing your suggestion for patch v2.
> 
> Any progress on this?

I have added this to the TODO list:
 Reduce file statistics overhead on directory reads
   http://www.postgresql.org/message-id/1338325561.82125.YahooMailNeo@web39304.mail.mud.yahoo.com

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +