Обсуждение: FATAL: lock file "postmaster.pid" already exists

Поиск
Список
Период
Сортировка

FATAL: lock file "postmaster.pid" already exists

От
deepak
Дата:
Hi,

On Windows 2008, sometimes the server fails to start due to an existing "postmaster.pid' file.

I tried rebooting a few times and even force shutting down the server, and it started up fine.
It seems to be a race-condition of sorts in the code that detects whether the process with PID
in the file is running or not.

Does any one have this same problem?  Any way to fix it besides removing the PID file
manually each time the server complains about this?



Thanks,
Deepak

Re: FATAL: lock file "postmaster.pid" already exists

От
Alban Hertroys
Дата:
On 8 May 2012, at 24:34, deepak wrote:

> Hi,
>
> On Windows 2008, sometimes the server fails to start due to an existing "postmaster.pid' file.
>
> I tried rebooting a few times and even force shutting down the server, and it started up fine.
> It seems to be a race-condition of sorts in the code that detects whether the process with PID
> in the file is running or not.

No, it means that postgres wasn't shut down properly when Windows shut down. Removing the pid-file is one of the last
thingsthe shut-down procedure does. The file is used to prevent 2 instances of the same server running on the same
data-directory.

If it's a race-condition, it's probably one in Microsoft's shutdown code. I've seen similar problems with Outlook
mailboxeson a network directory; Windows unmounts the remote file-systems before Outlook finished updating its files
underthat mount point, so Outlook throws an error message and Windows doesn't shut down because of that. 

I don't suppose that pid-file is on a remote file-system?

> Does any one have this same problem?  Any way to fix it besides removing the PID file
> manually each time the server complains about this?


You could probably script removal of the pid file if its creation date is before the time the system started booting
up.

Alban Hertroys

--
The scale of a problem often equals the size of an ego.



Re: FATAL: lock file "postmaster.pid" already exists

От
deepak
Дата:


On Tue, May 8, 2012 at 3:09 AM, Alban Hertroys <haramrae@gmail.com> wrote:
On 8 May 2012, at 24:34, deepak wrote:

> Hi,
>
> On Windows 2008, sometimes the server fails to start due to an existing "postmaster.pid' file.
>
> I tried rebooting a few times and even force shutting down the server, and it started up fine.
> It seems to be a race-condition of sorts in the code that detects whether the process with PID
> in the file is running or not.

No, it means that postgres wasn't shut down properly when Windows shut down. Removing the pid-file is one of the last things the shut-down procedure does. The file is used to prevent 2 instances of the same server running on the same data-directory.

If it's a race-condition, it's probably one in Microsoft's shutdown code. I've seen similar problems with Outlook mailboxes on a network directory; Windows unmounts the remote file-systems before Outlook finished updating its files under that mount point, so Outlook throws an error message and Windows doesn't shut down because of that.

I don't suppose that pid-file is on a remote file-system?

No, it's local.
 
> Does any one have this same problem?  Any way to fix it besides removing the PID file
> manually each time the server complains about this?


You could probably script removal of the pid file if its creation date is before the time the system started booting up.


Thanks, it looks like the code already seems to overwrite an old pid file if no other process is using it (if I understand the code correctly, it just echoes a byte onto a pipe to detect this).

Still, I can't see under what conditions this occurs, but I have seen it happen a couple of times, just that I don't know how to predictably reproduce the problem.


--
Deepak

Re: FATAL: lock file "postmaster.pid" already exists

От
deepak
Дата:
Hi!

We could reproduce the start-up problem on Windows 2003. After a reboot, postmaster, in its start-up sequence cleans up old temporary files, and this step used to take several minutes (a little over 4 minutes), delaying the writing of line 6 onwards into the PID file. This delay caused pg_ctl to timeout, leaving behind an orphaned postgres.exe process (which eventually forks off many other postgres.exe processes). But since pg_ctl itself isn't running after the timeout, Windows thinks the service isn't running. A subsequent attempt to start the service using pg_ctl now complains about the existing lock file still being used by one of the postgres.exe processes that was spawned before.

We have observed conclusively that file system cache is coming into play. We tested the scenario in which a reboot was followed by navigating the file system under the data directory using "find" Cygwin command, following which there was "no" timeout for pg_ctl and the server started up fine, suggesting that the clean up is way faster when the file system is cached.

Any ideas on fixing this start-up delay in postmaster? 

Could the task of cleanup move elsewhere, specifically to somewhere after the writing of PID file is complete so that pg_ctl doesn't timeout?

Any other suggestions for working around this problem?


Thanks,

Deepak



On Tue, May 8, 2012 at 12:13 PM, deepak <deepak.pn@gmail.com> wrote:


On Tue, May 8, 2012 at 3:09 AM, Alban Hertroys <haramrae@gmail.com> wrote:
On 8 May 2012, at 24:34, deepak wrote:

> Hi,
>
> On Windows 2008, sometimes the server fails to start due to an existing "postmaster.pid' file.
>
> I tried rebooting a few times and even force shutting down the server, and it started up fine.
> It seems to be a race-condition of sorts in the code that detects whether the process with PID
> in the file is running or not.

No, it means that postgres wasn't shut down properly when Windows shut down. Removing the pid-file is one of the last things the shut-down procedure does. The file is used to prevent 2 instances of the same server running on the same data-directory.

If it's a race-condition, it's probably one in Microsoft's shutdown code. I've seen similar problems with Outlook mailboxes on a network directory; Windows unmounts the remote file-systems before Outlook finished updating its files under that mount point, so Outlook throws an error message and Windows doesn't shut down because of that.

I don't suppose that pid-file is on a remote file-system?

No, it's local.
 
> Does any one have this same problem?  Any way to fix it besides removing the PID file
> manually each time the server complains about this?


You could probably script removal of the pid file if its creation date is before the time the system started booting up.


Thanks, it looks like the code already seems to overwrite an old pid file if no other process is using it (if I understand the code correctly, it just echoes a byte onto a pipe to detect this).

Still, I can't see under what conditions this occurs, but I have seen it happen a couple of times, just that I don't know how to predictably reproduce the problem.


--
Deepak


Re: FATAL: lock file "postmaster.pid" already exists

От
Tom Lane
Дата:
deepak <deepak.pn@gmail.com> writes:
> We could reproduce the start-up problem on Windows 2003. After a reboot,
> postmaster, in its start-up sequence cleans up old temporary files, and
> this step used to take several minutes (a little over 4 minutes), delaying
> the writing of line 6 onwards into the PID file. This delay caused pg_ctl
> to timeout, leaving behind an orphaned postgres.exe process (which
> eventually forks off many other postgres.exe processes).

Hmm.  It's easy enough to postpone temp file cleanup till after the
postmaster's PID file is completely written, so I've committed a patch
for that.  However, I find it mildly astonishing that such cleanup could
take multiple minutes.  What are you using for storage, a man with an
abacus?

            regards, tom lane

Re: FATAL: lock file "postmaster.pid" already exists

От
deepak
Дата:
Thanks, I have put one of the other developers working on this issue, to comment.

--
Deepak

On Mon, May 21, 2012 at 10:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
deepak <deepak.pn@gmail.com> writes:
> We could reproduce the start-up problem on Windows 2003. After a reboot,
> postmaster, in its start-up sequence cleans up old temporary files, and
> this step used to take several minutes (a little over 4 minutes), delaying
> the writing of line 6 onwards into the PID file. This delay caused pg_ctl
> to timeout, leaving behind an orphaned postgres.exe process (which
> eventually forks off many other postgres.exe processes).

Hmm.  It's easy enough to postpone temp file cleanup till after the
postmaster's PID file is completely written, so I've committed a patch
for that.  However, I find it mildly astonishing that such cleanup could
take multiple minutes.  What are you using for storage, a man with an
abacus?

                       regards, tom lane

Re: FATAL: lock file "postmaster.pid" already exists

От
Mark Dilger
Дата:
I tried moving the call to RemovePgTempFiles until
after the PID file is fully written, but it did not help.
pg_ctl attempts to connect to the database, and does
not report the database as running until that connection
succeeds.  I am not comfortable moving the call to
RemovePgTempFiles after the point in the postmaster
where child processes are spawned and connections
made available to clients because by that point the
temporary files encountered may be valid ones from
the current incarnation of Postgres and not from the
incarnation before the reboot.

I do not know precisely why the filesystem is so slow,
except to say that we have many relations:

xyzzy=# select count(*) from pg_catalog.pg_class;
 count
-------
 27340
(1 row)

xyzzy=# select count(*) from pg_catalog.pg_attribute;
 count 
--------
 236252
(1 row)

Running `find . | wc -l` on the data directory gives
55219


From: deepak <deepak.pn@gmail.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Alban Hertroys <haramrae@gmail.com>; pgsql-general@postgresql.org; markdilger@yahoo.com
Sent: Wednesday, May 23, 2012 9:03 AM
Subject: Re: [GENERAL] FATAL: lock file "postmaster.pid" already exists

Thanks, I have put one of the other developers working on this issue, to comment.

--
Deepak

On Mon, May 21, 2012 at 10:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
deepak <deepak.pn@gmail.com> writes:
> We could reproduce the start-up problem on Windows 2003. After a reboot,
> postmaster, in its start-up sequence cleans up old temporary files, and
> this step used to take several minutes (a little over 4 minutes), delaying
> the writing of line 6 onwards into the PID file. This delay caused pg_ctl
> to timeout, leaving behind an orphaned postgres.exe process (which
> eventually forks off many other postgres.exe processes).

Hmm.  It's easy enough to postpone temp file cleanup till after the
postmaster's PID file is completely written, so I've committed a patch
for that.  However, I find it mildly astonishing that such cleanup could
take multiple minutes.  What are you using for storage, a man with an
abacus?

                       regards, tom lane



Re: FATAL: lock file "postmaster.pid" already exists

От
Tom Lane
Дата:
Mark Dilger <markdilger@yahoo.com> writes:
> I tried moving the call to RemovePgTempFiles until
> after the PID file is fully written, but it did not help.

I wonder whether you correctly identified the source of the slowness.
The thing I would have suspected is identify_system_timezone(), which
will attempt to read every file in the timezone-database directory tree,
of which there are about 600.  It's not unusual for that to take several
seconds on a cold-started machine that doesn't have any of that tree in
filesystem cache.  It's still a stretch to believe that it'd take
several minutes on any storage system more advanced than a floppy disk;
but at least we'd only be trying to pin about one order of magnitude
slowdown on the filesystem, rather than several orders.

If that is what is causing it, there is a very simple workaround, which
is to set the timezone setting explicitly in postgresql.conf instead of
leaving the postmaster to try to figure it out from the environment.

(9.2 will use a better answer, which is for initdb to do this once and
store the result in postgresql.conf.)

            regards, tom lane

Re: FATAL: lock file "postmaster.pid" already exists

От
Mark Dilger
Дата:
Prior to posting to the mailing list, we made some
changes in postmaster.c to identify where time was
being spent.  Based on the elog(NOTICE,...) lines
we put in the file, we determined the time was spent
inside RemovePgTempFiles.

I then altered RemovePgTempFiles to take a starttime
parameter and, while recursing, to check if more than
5 seconds has passed since it started.  I did not want
to add the complexity of setting an alarm and catching
the signal, so I just made the code check the wallclock
time at each step of the recursion.  When more than
5 seconds has passed, it does not recurse further.
After making this change, we have not been able to
reproduce the slowness.

We do not consider this a fix to the problem.  It is just
a tool for verifying where the slowness comes from.



From: Tom Lane <tgl@sss.pgh.pa.us>
To: Mark Dilger <markdilger@yahoo.com>
Cc: deepak <deepak.pn@gmail.com>; Alban Hertroys <haramrae@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Wednesday, May 23, 2012 9:50 AM
Subject: Re: [GENERAL] FATAL: lock file "postmaster.pid" already exists

Mark Dilger <markdilger@yahoo.com> writes:
> I tried moving the call to RemovePgTempFiles until
> after the PID file is fully written, but it did not help.

I wonder whether you correctly identified the source of the slowness.
The thing I would have suspected is identify_system_timezone(), which
will attempt to read every file in the timezone-database directory tree,
of which there are about 600.  It's not unusual for that to take several
seconds on a cold-started machine that doesn't have any of that tree in
filesystem cache.  It's still a stretch to believe that it'd take
several minutes on any storage system more advanced than a floppy disk;
but at least we'd only be trying to pin about one order of magnitude
slowdown on the filesystem, rather than several orders.

If that is what is causing it, there is a very simple workaround, which
is to set the timezone setting explicitly in postgresql.conf instead of
leaving the postmaster to try to figure it out from the environment.

(9.2 will use a better answer, which is for initdb to do this once and
store the result in postgresql.conf.)

            regards, tom lane


Re: FATAL: lock file "postmaster.pid" already exists

От
Mark Dilger
Дата:
We tried setting the timezone, as:

     timezone = 'US/Eastern'

in postgresql.conf, but it did not help.


From: Tom Lane <tgl@sss.pgh.pa.us>
To: Mark Dilger <markdilger@yahoo.com>
Cc: deepak <deepak.pn@gmail.com>; Alban Hertroys <haramrae@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Wednesday, May 23, 2012 9:50 AM
Subject: Re: [GENERAL] FATAL: lock file "postmaster.pid" already exists

Mark Dilger <markdilger@yahoo.com> writes:
> I tried moving the call to RemovePgTempFiles until
> after the PID file is fully written, but it did not help.

I wonder whether you correctly identified the source of the slowness.
The thing I would have suspected is identify_system_timezone(), which
will attempt to read every file in the timezone-database directory tree,
of which there are about 600.  It's not unusual for that to take several
seconds on a cold-started machine that doesn't have any of that tree in
filesystem cache.  It's still a stretch to believe that it'd take
several minutes on any storage system more advanced than a floppy disk;
but at least we'd only be trying to pin about one order of magnitude
slowdown on the filesystem, rather than several orders.

If that is what is causing it, there is a very simple workaround, which
is to set the timezone setting explicitly in postgresql.conf instead of
leaving the postmaster to try to figure it out from the environment.

(9.2 will use a better answer, which is for initdb to do this once and
store the result in postgresql.conf.)

            regards, tom lane


Re: FATAL: lock file "postmaster.pid" already exists

От
Tom Lane
Дата:
Mark Dilger <markdilger@yahoo.com> writes:
> Prior to posting to the mailing list, we made some
> changes in postmaster.c to identify where time was
> being spent.� Based on the elog(NOTICE,...) lines
> we put in the file, we determined the time was spent
> inside RemovePgTempFiles.

> I then altered RemovePgTempFiles to take a starttime
> parameter and, while recursing, to check if more than
> 5 seconds has passed since it started.� I did not want
> to add the complexity of setting an alarm and catching
> the signal, so I just made the code check the wallclock
> time at each step of the recursion.� When more than
> 5 seconds has passed, it does not recurse further.
> After making this change, we have not been able to
> reproduce the slowness.

OK, so we're back to the original question: how could this possibly be
taking that long?  Have you got thousands of tablespaces (and if so why)?
Does your system have a habit of crashing at times when there are
thousands of temp files?  Maybe you're using IP over avian carriers to
access your SAN?  It just doesn't make any sense given the information
you've provided.

            regards, tom lane

Re: FATAL: lock file "postmaster.pid" already exists

От
Mark Dilger
Дата:
We do not use tablespaces at all.  We do use table
partitioning very heavily, with many check
constraints.  That is the only thing unusual about
the schema.

To my eyes, the birds appear to be flying pretty
darned fast, though we have not figured out how
to remove the message bands quickly without
cutting off their feet.

The server is a virtual machine, and at this point
I will ask the sys admins to get a non-virtual
server running to reconfirm the problem.

Thanks


From: Tom Lane <tgl@sss.pgh.pa.us>
To: Mark Dilger <markdilger@yahoo.com>
Cc: deepak <deepak.pn@gmail.com>; Alban Hertroys <haramrae@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Wednesday, May 23, 2012 11:17 AM
Subject: Re: [GENERAL] FATAL: lock file "postmaster.pid" already exists

Mark Dilger <markdilger@yahoo.com> writes:
> Prior to posting to the mailing list, we made some
> changes in postmaster.c to identify where time was
> being spent.  Based on the elog(NOTICE,...) lines
> we put in the file, we determined the time was spent
> inside RemovePgTempFiles.

> I then altered RemovePgTempFiles to take a starttime
> parameter and, while recursing, to check if more than
> 5 seconds has passed since it started.  I did not want
> to add the complexity of setting an alarm and catching
> the signal, so I just made the code check the wallclock
> time at each step of the recursion.  When more than
> 5 seconds has passed, it does not recurse further.
> After making this change, we have not been able to
> reproduce the slowness.

OK, so we're back to the original question: how could this possibly be
taking that long?  Have you got thousands of tablespaces (and if so why)?
Does your system have a habit of crashing at times when there are
thousands of temp files?  Maybe you're using IP over avian carriers to
access your SAN?  It just doesn't make any sense given the information
you've provided.

            regards, tom lane


Re: FATAL: lock file "postmaster.pid" already exists

От
Tom Lane
Дата:
Mark Dilger <markdilger@yahoo.com> writes:
> We do not use tablespaces at all.

[ scratches head... ]  If you aren't using any tablespaces, there should
be only *one* pgsql_tmp directory, which makes this even more confusing.

(Unless you're using a pre-8.3 release, in which case there would be one
per database, so maybe if you've got hundreds/thousands of databases in
the cluster that would explain it.  But I sure hope you're not still
using pre-8.3, especially not on Windows.)

            regards, tom lane

Re: FATAL: lock file "postmaster.pid" already exists

От
Mark Dilger
Дата:
We only use one database, not counting the
built-in template databases.  The server is
running 9.1.3.  We were running 9.1.1 until
fairly recently.

We are still getting set up to test this on
non-virtual hardware, but hope to have results
from that in a few hours or less.



From: Tom Lane <tgl@sss.pgh.pa.us>
To: Mark Dilger <markdilger@yahoo.com>
Cc: deepak <deepak.pn@gmail.com>; Alban Hertroys <haramrae@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Wednesday, May 23, 2012 12:23 PM
Subject: Re: [GENERAL] FATAL: lock file "postmaster.pid" already exists

Mark Dilger <markdilger@yahoo.com> writes:
> We do not use tablespaces at all.

[ scratches head... ]  If you aren't using any tablespaces, there should
be only *one* pgsql_tmp directory, which makes this even more confusing.

(Unless you're using a pre-8.3 release, in which case there would be one
per database, so maybe if you've got hundreds/thousands of databases in
the cluster that would explain it.  But I sure hope you're not still
using pre-8.3, especially not on Windows.)

            regards, tom lane


Re: FATAL: lock file "postmaster.pid" already exists

От
Tom Lane
Дата:
Mark Dilger <markdilger@yahoo.com> writes:
> We only use one database, not counting the
> built-in template databases.� The server is
> running 9.1.3.� We were running 9.1.1 until
> fairly recently.

OK.  I had forgotten that in recent versions, RemovePgTempFiles doesn't
only iterate through the pgsql_tmp directories; it scans the regular
database directories too, looking for possibly orphaned temp relations.
So if you had lots and lots of files in your regular database
directories, possibly scanning those could be slow.  Still, it's only
looking at the file names, not attempting to stat() them or anything,
so it would be a pretty shoddy filesystem that would take a really long
time for that.

            regards, tom lane

Re: FATAL: lock file "postmaster.pid" already exists

От
Mark Dilger
Дата:
I am running this code on Windows 2003.  It
appears that postgres has in src/port/dirent.c
a port of readdir() that internally uses the
WIN32_FIND_DATA structure, and the function
FindNextFile() to iterate through the directory.
Looking at the documentation, it seems that
this function does collect file creation time,
last access time, last write time, file size, etc.,
much like performing a stat.

In my case, the code is iterating through roughly
56,000 files.  Apparently, this is doing the
equivalent of a stat on each of them.

See http://msdn.microsoft.com/en-us/library/windows/desktop/aa365740%28v=vs.85%29.aspx




From: Tom Lane <tgl@sss.pgh.pa.us>
To: Mark Dilger <markdilger@yahoo.com>
Cc: deepak <deepak.pn@gmail.com>; Alban Hertroys <haramrae@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Wednesday, May 23, 2012 1:54 PM
Subject: Re: [GENERAL] FATAL: lock file "postmaster.pid" already exists

Mark Dilger <markdilger@yahoo.com> writes:
> We only use one database, not counting the
> built-in template databases.  The server is
> running 9.1.3.  We were running 9.1.1 until
> fairly recently.

OK.  I had forgotten that in recent versions, RemovePgTempFiles doesn't
only iterate through the pgsql_tmp directories; it scans the regular
database directories too, looking for possibly orphaned temp relations.
So if you had lots and lots of files in your regular database
directories, possibly scanning those could be slow.  Still, it's only
looking at the file names, not attempting to stat() them or anything,
so it would be a pretty shoddy filesystem that would take a really long
time for that.

            regards, tom lane


Re: FATAL: lock file "postmaster.pid" already exists

От
Tom Lane
Дата:
Mark Dilger <markdilger@yahoo.com> writes:
> I am running this code on Windows 2003.� It
> appears that postgres has in src/port/dirent.c
> a port of readdir() that internally uses the
> WIN32_FIND_DATA structure, and the function
> FindNextFile() to iterate through the directory.
> Looking at the documentation, it seems that
> this function does collect file creation time,
> last access time, last write time, file size, etc.,
> much like performing a stat.

> In my case, the code is iterating through roughly
> 56,000 files.  Apparently, this is doing the
> equivalent of a stat on each of them.

That would explain it all right.  I think you're basically screwed here,
because so far as I can see Windows doesn't provide any means to
enumerate a directory's contents without fetching that info; at least
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364232(v=vs.85).aspx
doesn't seem to offer any substitutes for FindFirstFile/FindNextFile.

It's barely possible that using FindFirstFileEx with fInfoLevelId =
FindExInfoBasic would save enough to be useful, except that that option
doesn't exist on Windows 2003 anyway.

Consider using another operating system ...

            regards, tom lane

Re: FATAL: lock file "postmaster.pid" already exists

От
Mark Dilger
Дата:
FindFirstFile can take a wildcard filename
pattern.  It appears that we are effectively
calling FindFirstFile without a pattern, getting
all 56000 file names with complete stat
information, doing a poor-man's regex on
those names, and matching just the temporary
files.

If RemovePgTempFiles were modified to
pass a filter, this code might perform better
on Windows.  I'll look into this.




From: Tom Lane <tgl@sss.pgh.pa.us>
To: Mark Dilger <markdilger@yahoo.com>
Cc: deepak <deepak.pn@gmail.com>; Alban Hertroys <haramrae@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Wednesday, May 23, 2012 4:25 PM
Subject: Re: [GENERAL] FATAL: lock file "postmaster.pid" already exists

Mark Dilger <markdilger@yahoo.com> writes:
> I am running this code on Windows 2003.  It
> appears that postgres has in src/port/dirent.c
> a port of readdir() that internally uses the
> WIN32_FIND_DATA structure, and the function
> FindNextFile() to iterate through the directory.
> Looking at the documentation, it seems that
> this function does collect file creation time,
> last access time, last write time, file size, etc.,
> much like performing a stat.

> In my case, the code is iterating through roughly
> 56,000 files.  Apparently, this is doing the
> equivalent of a stat on each of them.

That would explain it all right.  I think you're basically screwed here,
because so far as I can see Windows doesn't provide any means to
enumerate a directory's contents without fetching that info; at least
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364232(v=vs.85).aspx
doesn't seem to offer any substitutes for FindFirstFile/FindNextFile.

It's barely possible that using FindFirstFileEx with fInfoLevelId =
FindExInfoBasic would save enough to be useful, except that that option
doesn't exist on Windows 2003 anyway.

Consider using another operating system ...

            regards, tom lane


Re: FATAL: lock file "postmaster.pid" already exists

От
Magnus Hagander
Дата:
On Thu, May 24, 2012 at 12:47 AM, Mark Dilger <markdilger@yahoo.com> wrote:
> I am running this code on Windows 2003.  It
> appears that postgres has in src/port/dirent.c
> a port of readdir() that internally uses the
> WIN32_FIND_DATA structure, and the function
> FindNextFile() to iterate through the directory.
> Looking at the documentation, it seems that
> this function does collect file creation time,
> last access time, last write time, file size, etc.,
> much like performing a stat.
>
> In my case, the code is iterating through roughly
> 56,000 files.  Apparently, this is doing the
> equivalent of a stat on each of them.

how did you end up with 56,000 files? Lots and lots and lots of tables?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: FATAL: lock file "postmaster.pid" already exists

От
Magnus Hagander
Дата:
On Thu, May 24, 2012 at 2:42 AM, Mark Dilger <markdilger@yahoo.com> wrote:
> FindFirstFile can take a wildcard filename
> pattern.  It appears that we are effectively
> calling FindFirstFile without a pattern, getting
> all 56000 file names with complete stat
> information, doing a poor-man's regex on
> those names, and matching just the temporary
> files.
>
> If RemovePgTempFiles were modified to
> pass a filter, this code might perform better
> on Windows.  I'll look into this.

It might in that case be worthwhile looking at using scandir() on
platforms that support that as well, so that other platforms can
benefit from an optimization as well. Though I'm not sure how much
that would actually help - ISTM that one actually scans the whole
directory anyway, just you don't have to do it yourself...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: FATAL: lock file "postmaster.pid" already exists

От
Mark Dilger
Дата:
We have lots of partition tables that inherit
from a smaller number of parents.  Some,
but not all of these tables also have indexes.
The number actually varies depending on
the data loaded.  For some other database
instances, fortunately on Linux, the number
is in the millions.

I have been testing with passing FindFirstFile
a pattern to match the temporary file names,
rather than letting FindFirstFile/FindNextFile
return all names and then having postgres
do the pattern match itself.  So far, this looks
very promising, with a stand-alone program
that uses this technique cutting the runtime
from 4 minutes down to less than a second.

I have a fairly clean patch in the works that
I will submit after I have verified it on
Windows 2003, Windows 2008 and Linux.





From: Magnus Hagander <magnus@hagander.net>
To: Mark Dilger <markdilger@yahoo.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; deepak <deepak.pn@gmail.com>; Alban Hertroys <haramrae@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Thursday, May 24, 2012 3:58 AM
Subject: Re: [GENERAL] FATAL: lock file "postmaster.pid" already exists

On Thu, May 24, 2012 at 12:47 AM, Mark Dilger <markdilger@yahoo.com> wrote:
> I am running this code on Windows 2003.  It
> appears that postgres has in src/port/dirent.c
> a port of readdir() that internally uses the
> WIN32_FIND_DATA structure, and the function
> FindNextFile() to iterate through the directory.
> Looking at the documentation, it seems that
> this function does collect file creation time,
> last access time, last write time, file size, etc.,
> much like performing a stat.
>
> In my case, the code is iterating through roughly
> 56,000 files.  Apparently, this is doing the
> equivalent of a stat on each of them.

how did you end up with 56,000 files? Lots and lots and lots of tables?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/